將標點符號轉換爲Unicode的InputStream

我的應用程序連接到互聯網並刷新頁面以獲取html以獲取圖像和文本等內容。不過，我注意到一些標點符號實際上被轉換爲unicode十進制代碼，無論如何阻止它？將標點符號轉換爲Unicode的InputStream

public class DownloadPage extends AsyncTask<String, Void, String> { 

    public interface PageResponse { 
     void processFinish(String output); 
    } 

    private PageResponse delegate = null; 

    public DownloadPage(PageResponse delegate){ 
     this.delegate = delegate; 
    } 

    @Override 
    protected String doInBackground(String... urls) { 
     URLConnection connection; 
     try { 
      URL url = new URL(urls[0]); 

      connection = url.openConnection(); 

      String html; 
      InputStream inputStream = connection.getInputStream(); 
      BufferedReader reader = new BufferedReader(new InputStreamReader(inputStream)); 
      StringBuilder str = new StringBuilder(); 
      String line; 
      while ((line = reader.readLine()) != null) { 
       str.append(line); 
      } 
      inputStream.close(); 
      html = str.toString(); 

      return html; 

     } catch (MalformedURLException e) { 
      e.printStackTrace(); 
      return "Failed"; 
     } catch (IOException e) { 
      e.printStackTrace(); 
      return "Failed"; 
     } 
    } 

    @Override 
    protected void onPostExecute(String s) { 
     super.onPostExecute(s); 
     delegate.processFinish(s); 
    } 
}

這是我從https://www.looemusic.co.uk/news/獲取信息的頁面。

This is what comes up with this code.

來源

2017-02-22 James Green

如果你確定問題是否與您的InputStream，而不是HTML渲染自己，那麼你可以設置你的InputStreamReader的字符集：

new InputStreamReader(inputStream, Charset.UTF-8);

此charset是java.nio中.charset。

如果失敗，您可以檢查問題是否與客戶端的編碼不符。將這個標籤在您的HTML文件：

對於HTML 5：

<meta charset="UTF-8">

對於HTML 4：

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

如果你想使用其他的字符集，而不是UTF-8的則只是改變代碼中的名字！

來源

2017-02-22 11:55:26

它不喜歡Charset.UTF-8。這不是一個選項 –

將標點符號轉換爲Unicode的InputStream

回答

相關問題