2010-09-12 81 views
3

我有一個小程序,需要從使用node.js的站點抓取圖標。這在大多數情況下,但與apple.com,我得到的錯誤,我無法理解或修復:用node.js抓取圖標

var sys= require('sys'); 
    var fs= require('fs'); 
    var http = require('http'); 

    var rurl = http.createClient(80, 'apple.com'); 
    var requestUrl = 'http://www.apple.com/favicon.ico'; 
    var request = rurl.request('GET', requestUrl, {"Content-Type": "image/x-icon", "host" : "apple.com" }); 
    request.end(); 

    request.addListener('response', function (response) 
    { 
      response.setEncoding('binary'); 
      var body = ''; 
      response.addListener('data', function (chunk) { 
        body += chunk; 
      }); 
      response.addListener("end", function() { 
      }); 
    }); 

當我提出這個要求的迴應是:

<head><body> This object may be found <a HREF="http://www.apple.com/favicon.ico">here</a> </body> 

作爲結果,我已經通過客戶端創建步驟中的主機名變體以及使用'www.apple.com'的url請求以各種方式修改了上述代碼,但通常我只是從節點獲取錯誤,如下所示:

node.js:63 
    throw e; 
    ^
Error: Parse Error 
    at Client.ondata (http:901:22) 
    at IOWatcher.callback (net:494:29) 
    at node.js:769:9 

此外,我不想使用谷歌服務來抓住圖標。

回答

1

此代碼似乎爲我

var sys = require("sys") 
    , fs = require("fs") 
    , http = require("http"); 

var client = http.createClient(80, "www.apple.com") // Change of hostname here and below 
    , req = client.request("GET" 
         , "http://www.apple.com/favicon.ico" 
         , {"Host": "www.apple.com"}); 

req.end(); 

req.addListener("response", function (res) { 
    var body = ""; 
    res.setEncoding('binary'); 
    res.addListener("data", function (c) { 
    body += c; 
    }); 
    res.addListener("end", function() { 
    // Do stuff with body 
    }); 
}); 
4

您在請求中的主機設置應該是www.apple.com(與www),您爲什麼在請求中包含Content-Type標頭?這是沒有意義的。相反,你應該使用接受:圖像/ X-圖標

我從這個URL這樣的響應:

$ curl -I http://www.apple.com/favicon.ico 
HTTP/1.1 200 OK 
Last-Modified: Thu, 12 Mar 2009 17:09:30 GMT 
ETag: "1036-464ef0c1c8680" 
Server: Apache/2.2.11 (Unix) 
X-Cache-TTL: 568 
X-Cached-Time: Thu, 21 Jan 2010 14:55:37 GMT 
Accept-Ranges: bytes 
Content-Length: 4150 
Content-Type: image/x-icon 
Cache-Control: max-age=463 
Expires: Sun, 12 Sep 2010 14:22:09 GMT 
Date: Sun, 12 Sep 2010 14:14:26 GMT 
Connection: keep-alive 

它不應該有分析任何問題......我得到的圖標數據量太大。

這是我用不帶www的主機頭得到的迴應:

$ curl -I http://www.apple.com/favicon.ico -H Host: apple.com 
HTTP/1.0 400 Bad Request 
Server: AkamaiGHost 
Mime-Version: 1.0 
Content-Type: text/html 
Content-Length: 208 
Expires: Sun, 12 Sep 2010 14:25:03 GMT 
Date: Sun, 12 Sep 2010 14:25:03 GMT 
Connection: close 

HTTP/1.1 302 Object Moved 
Location: http://www.apple.com/ 
Content-Type: text/html 
Cache-Control: private 
Connection: close 

順便兩者兼而有之,這意味着他們的服務器被不正確地工作,但這是另一個討論。

+0

更改主機和使用接受工作:圖像/ X-圖標,而不是,結果又是:這個對象,可以發現here jimt 2010-09-12 16:47:52

+1

@jimt井重要的一點是Host頭,而不是Accept的東西...... – 2010-09-12 18:13:16