2016-05-12 73 views
-2
focus_Search = raw_input("Focus Search ") 
    url = "https://www.google.com/search?q=" 
    res = requests.get(url + focus_Search) 
    print("You Just Searched") 
    res_String = res.text 
    #Now I must get ALL the sections of code that start with "<a href" and end with "/a>" 

我試圖從谷歌搜索網頁上刮掉所有鏈接。我可以逐一提取每個鏈接,但我相信這是一個更好的方法。Python鏈接刮板

+0

使用HTML解析器,也有對SO無數例子 –

回答

0

這就造成了一些代碼的搜索頁面所有鏈接的列表,沒有進入BeautifulSoup

import requests 
import lxml.html 

focus_Search = input("Focus Search ") 
url = "https://www.google.com/search?q=" 
#focus_Search 
res = requests.get(url + focus_Search).content 
# res 

dom = lxml.html.fromstring(res) 
links = [x for x in dom.xpath('//a/@href')] # Borrows from cheekybastard in link below 
# http://stackoverflow.com/questions/1080411/retrieve-links-from-web-page-using-python-and-beautifulsoup 
links