擷取頁面使用luasocket和代理

到目前爲止，我有以下幾個部分：擷取頁面使用luasocket和代理

local socket = require "socket.http" 
client,r,c,h = socket.request{url = "http://example.com/", proxy="<my proxy and port here>"} 
for i,v in pairs(c) do 
    print(i, v) 
end

這給了我類似下面的輸出：

connection close 
content-type text/html; charset=UTF-8 
location http://www.iana.org/domains/example/ 
vary Accept-Encoding 
date Tue, 24 Apr 2012 21:43:19 GMT 
last-modified Wed, 09 Feb 2011 17:13:15 GMT 
transfer-encoding chunked 
server Apache/2.2.3 (CentOS)

這意味着連接建立公正完美。現在，我想使用這個socket.http來獲取我的url's的標題。我搜索了以前的SO問題和luasocket's http documentation。但是，我仍然不知道如何在變量中獲取/存儲整個頁面/部分頁面，並對其進行處理。

請幫助。

來源

2012-04-24 hjpotter92

您正在使用http.request（）的'generic'形式，它需要通過LTN12接收器存儲主體。它並不像複雜，因爲它的聲音，試試這個代碼：

local socket = require "socket.http" 
local ltn12 = require "ltn12"; -- LTN12 lib provided by LuaSocket 

-- This table will store the body (possibly in multiple chunks): 
local result_table = {}; 
client,r,c,h = socket.request{ 
    url = "http://example.com/", 
    sink = ltn12.sink.table(result_table), 
    proxy="<my proxy and port here>" 
} 
-- Join the chunks together into a string: 
local result = table.concat(result_table); 
-- Hacky solution to extract the title: 
local title = result:match("<[Tt][Ii][Tt][Ll][Ee]>([^<]*)<"); 
print(title);

如果您的代理是整個應用程序常量，那麼一個更簡單的解決方案是使用http.request（）的簡單形式，並指定代理通過http.PROXY：

local http = require "socket.http" 
http.PROXY="<my proxy and port here>" 

local result = http.request("http://www.youtube.com/watch?v=_eT40eV7OiI") 
local title = result:match("<[Tt][Ii][Tt][Ll][Ee]>([^<]*)<"); 
print(title);

輸出：

Flanders and Swann - A song of the weather 
    - YouTube

來源

2012-04-25 02:43:58 MattJ

謝謝！這對所有類型的頁面都很有用。 :)但是，試圖獲取youtube鏈接的標題時，'result'變量只有[** 404錯誤**]（http://www.hastebin.com/gikavorone.xml）頁面。我嘗試了兩種方法。第二個更快地獲取頁面。 :) – hjpotter92 2012-04-25 03:33:03

我剛剛更新了示例YouTube鏈接和我得到的輸出。這一切對我來說都很好。標題中有空格填充，有時也可能是HTML實體。你可能會想通過剝離和轉換來標準化它。 – MattJ 2012-04-25 04:16:20

不，還沒有工作。我正在SciTe中運行該文件（名爲'02.lua'）。以下是輸出和代碼的截圖（我使用了4個不同的網頁，2個在我自己的網絡服務器上）。檢查：http://i.stack.imgur.com/XkQQj.jpg – hjpotter92 2012-04-25 04:36:46

擷取頁面使用luasocket和代理

回答

相關問題