2017-02-22 88 views
0

我的應用程序連接到互聯網並刷新頁面以獲取html以獲取圖像和文本等內容。不過,我注意到一些標點符號實際上被轉換爲unicode十進制代碼,無論如何阻止它?將標點符號轉換爲Unicode的InputStream

public class DownloadPage extends AsyncTask<String, Void, String> { 

    public interface PageResponse { 
     void processFinish(String output); 
    } 

    private PageResponse delegate = null; 

    public DownloadPage(PageResponse delegate){ 
     this.delegate = delegate; 
    } 

    @Override 
    protected String doInBackground(String... urls) { 
     URLConnection connection; 
     try { 
      URL url = new URL(urls[0]); 

      connection = url.openConnection(); 

      String html; 
      InputStream inputStream = connection.getInputStream(); 
      BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream)); 
      StringBuilder str = new StringBuilder(); 
      String line; 
      while ((line = reader.readLine()) != null) { 
       str.append(line); 
      } 
      inputStream.close(); 
      html = str.toString(); 

      return html; 

     } catch (MalformedURLException e) { 
      e.printStackTrace(); 
      return "Failed"; 
     } catch (IOException e) { 
      e.printStackTrace(); 
      return "Failed"; 
     } 
    } 

    @Override 
    protected void onPostExecute(String s) { 
     super.onPostExecute(s); 
     delegate.processFinish(s); 
    } 
} 

這是我從https://www.looemusic.co.uk/news/獲取信息的頁面。

This is what comes up with this code.

回答

0

如果你確定問題是否與您的InputStream,而不是HTML渲染自己,那麼你可以設置你的InputStreamReader的字符集:

new InputStreamReader(inputStream, Charset.UTF-8); 

此charset是java.nio中.charset。

如果失敗,您可以檢查問題是否與客戶端的編碼不符。將這個標籤在您的HTML文件:

對於HTML 5:

<meta charset="UTF-8"> 

對於HTML 4:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> 

如果你想使用其他的字符集,而不是UTF-8的則只是改變代碼中的名字!

+0

它不喜歡Charset.UTF-8。這不是一個選項 –