如何將Net :: HTTP響應轉換爲Ruby 1.9.1中的某種編碼？

我有做以下如何將Net :: HTTP響應轉換爲Ruby 1.9.1中的某種編碼？

檢索一個HTML頁面西納特拉應用（http://analyzethis.espace-technologies.com）（通過網/ HTTP）
從response.body
創建一個引入nokogiri文檔中提取一些信息，併發送它回到了迴應。該反應應該是UTF-8編碼

我到了這個問題，而試圖讀取使用windows-1256編碼方式，如www.filfan.com或www.masrawy.com網站。

問題是編碼轉換的結果不正確，雖然沒有發生錯誤。

淨/ HTTP response.body.encoding給ASCII-8BIT不能轉換爲UTF-8

如果我做引入nokogiri :: HTML（response.body），並使用CSS選擇器來獲得某些來自頁面的內容 - 比如標題標籤的內容 - 例如，我得到一個字符串，當我調用string.encoding時，返回WINDOWS-1256。我使用string.encode（「utf-8」）並使用它發送響應，但是響應又不正確。

有關我的方法中出現問題的任何建議或想法？

來源

2009-07-30 humanzz

我發現下面的代碼爲我工作現在

def document 
    if @document.nil? && response 
    @document = if document_encoding 
        Nokogiri::HTML(response.body.force_encoding(document_encoding).encode('utf-8'),nil, 'utf-8') 
       else 
        Nokogiri::HTML(response.body) 
       end 
    end 
    @document 
end 

def document_encoding 
    return @document_encoding if @document_encoding 
    response.type_params.each_pair do |k,v| 
    @document_encoding = v.upcase if k =~ /charset/i 
    end 
    unless @document_encoding 
    #document.css("meta[http-equiv=Content-Type]").each do |n| 
    # attr = n.get_attribute("content") 
    # @document_encoding = attr.slice(/charset=[a-z1-9\-_]+/i).split("=")[1].upcase if attr 
    #end 
    @document_encoding = response.body =~ /<meta[^>]*HTTP-EQUIV=["']Content-Type["'][^>]*content=["'](.*)["']/i && $1 =~ /charset=(.+)/i && $1.upcase 
    end 
    @document_encoding 
end

來源

2009-08-02 00:43:15 humanzz

它很棒！ – 2016-10-28 13:32:02

由於網:: HTTP不正確處理編碼。見http://bugs.ruby-lang.org/issues/2567

您可以分析response['content-type']包含字符集的，而不是分析整個response.body。

然後用force_encoding()設置正確的編碼。

response.body.force_encoding("UTF-8")如果站點以UTF-8提供服務。

來源

2012-12-08 17:03:12

雖然這個解決方案確實有效，但這個問題只發生在某些網站上。也許當Content-Type包含'application/json'時，它會使用UTF-8編碼...？根據http://stackoverflow.com/questions/9254891/what-does-content-type-application-json-charset-utf-8-really-mean，application/json暗示UTF-8。 – 2014-05-28 14:40:20

如何將Net :: HTTP響應轉換爲Ruby 1.9.1中的某種編碼？

回答

相關問題