0
我的應用程序連接到互聯網並刷新頁面以獲取html以獲取圖像和文本等內容。不過,我注意到一些標點符號實際上被轉換爲unicode十進制代碼,無論如何阻止它?將標點符號轉換爲Unicode的InputStream
public class DownloadPage extends AsyncTask<String, Void, String> {
public interface PageResponse {
void processFinish(String output);
}
private PageResponse delegate = null;
public DownloadPage(PageResponse delegate){
this.delegate = delegate;
}
@Override
protected String doInBackground(String... urls) {
URLConnection connection;
try {
URL url = new URL(urls[0]);
connection = url.openConnection();
String html;
InputStream inputStream = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream));
StringBuilder str = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
str.append(line);
}
inputStream.close();
html = str.toString();
return html;
} catch (MalformedURLException e) {
e.printStackTrace();
return "Failed";
} catch (IOException e) {
e.printStackTrace();
return "Failed";
}
}
@Override
protected void onPostExecute(String s) {
super.onPostExecute(s);
delegate.processFinish(s);
}
}
這是我從https://www.looemusic.co.uk/news/獲取信息的頁面。
This is what comes up with this code.
它不喜歡Charset.UTF-8。這不是一個選項 –