正在搜索一個網站

import urllib 
import re 
import os 
search = (raw_input('[!]Search: ')) 
site = "http://www.exploit-db.com/list.php?description="+search+"&author=&platform=&type=&port=&osvdb=&cve=" 
print site 
source = urllib.urlopen(site).read() 
founds = re.findall("href='/exploits/\d+",source) 
print "\n[+]Search",len(founds),"Results\n" 
if len(founds) >=1: 
     for found in founds: 
       found = found.replace("href='","") 
       print "http://www.exploit-db.com"+found 
else: 
     print "\nCouldnt find anything with your search\n"

當我搜索exploit-db.com網站時，我只得到了25個結果，我怎樣才能使它進入其他頁面或者傳遞25個結果。正在搜索一個網站

來源

2010-02-19 sourD

使用正則表達式來解析HTML是錯誤的。請參閱http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags以及討論此主題的其他許多主題。 – 2010-02-19 19:35:03

只需訪問該網站並通過手動頁面查看網址即可輕鬆檢查：只需在網址page=1&的?之後查看結果的第二頁或page=2&即可查看第三頁，等等。

這是一個Python問題？這是一個（非常基本的）「屏幕抓取」問題。

來源

2010-02-19 16:27:09

亞歷克斯，我ment，雖然正在尋找第1頁的結果或一般它不跳轉到第二頁或它不通過25結果..不知道最新情況 – sourD 2010-02-19 16:31:51

我想我應該**粗體**「因爲你接受了以後的答案（毫無疑問，我的答案是「我跨越了網絡」，因爲它們彼此貼得太近了，所以我的答案是「page = 1＆'看看結果**第二個**頁面」），給出了這些信息（但增加了「注意」一詞;-)。 – 2010-02-19 16:54:27

謝謝亞歷克斯;） – sourD 2010-02-19 17:08:25

顯然exploit-db.com網站不允許擴展頁面大小。因此，您需要通過重複urllib.urlopen（）來獲取後續頁面，從而通過結果列表「手動」頁面。該URL與最初使用的URL相同，再加上&page=n參數。注意，這個n值似乎是基於0的（即& page = 1將給出秒頁）

來源

2010-02-19 16:27:19 mjv

oooh好的謝謝男人 – sourD 2010-02-19 16:33:26

正在搜索一個網站

回答

相關問題