Python - 如何在沒有課程的網頁上找到鏈接？

我是一名初學者python程序員，我正在嘗試將webcrawler作爲練習。目前我正面臨一個問題，我找不到合適的解決方案。問題是，我試圖從沒有課程的頁面獲取鏈接地址/地址，因此我不知道如何過濾該特定鏈接。這可能是更好的展示給你。
The page I am trying to get the link from.
正如您所看到的，我試圖獲取「歷史價格」鏈接的href屬性中的內容。這裏是我的Python代碼：Python - 如何在沒有課程的網頁上找到鏈接？

import requests 
from bs4 import BeautifulSoup 

def find_historicalprices_link(url): 
    source = requests.get(url) 
    text = source.text 
    soup = BeautifulSoup(text, 'html.parser') 
    link = soup.find_all('li', 'fjfe-nav-sub') 
    href = str(link.get('href')) 
    find_spreadsheet(href) 

def find_spreadsheet(url): 
    source = requests.get(url) 
    text = source.text 
    soup = BeautifulSoup(text, 'html.parser') 
    link = soup.find('a', {'class' : 'nowrap'}) 
    href = str(link.get('href')) 
    download_spreadsheet(href) 

def download_spreadsheet(url): 
    response = requests.get(url) 
    text = response.text 
    lines = text.split("\\n") 
    filename = r'google.csv' 
    file = open(filename, 'w') 
    for line in lines: 
     file.write(line + "\n") 
    file.close() 

find_historicalprices_link('https://www.google.com/finance?q=NASDAQ%3AGOOGL&ei=3lowWYGRJNSvsgGPgaywDw')

在功能「find_spreadsheet（URL）」，我可以很容易地通過尋找所謂的「NOWRAP」類過濾器的鏈接。不幸的是，歷史價格的鏈接沒有這樣的類而現在我的劇本只是給了我以下錯誤：

AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

如何確保我的爬蟲只需要在href從「歷史價格」？
預先感謝您。

更新：
我找到了辦法。通過只查找附加了特定文本的鏈接，我可以找到我需要的href。
解決方案：
soup.find（ 'A'，字符串= 「歷史價格」）

來源

2017-06-01 Jonathan van de Groep

你看了你的錯誤？此行在這裏引起你的問題：link = soup.find_all（'li'，'fjfe-nav-sub'） href = str（link.get（'href'））鏈接是一個列表，而不是一個元素 – jarcobi889

@ jarcobi889好的，那麼我需要做些什麼來解決這個問題呢？我已經改變find_all（）找到（），現在它只是返回「無」 –

執行以下代碼sniplet可以幫助您？我想你可以解決下面的代碼你的問題，因爲我希望：

from bs4 import BeautifulSoup 

html = """<a href='http://www.google.com'>Something else</a> 
      <a href='http://www.yahoo.com'>Historical prices</a>""" 

soup = BeautifulSoup(html, "html5lib") 

urls = soup.find_all("a") 

print(urls) 

print([a["href"] for a in urls if a.text == "Historical prices"])

來源

2017-06-01 19:43:33 M14

不，不幸的是沒有。但我找到了辦法。我這樣做的方式是隻查找與特定文本的鏈接。 soup.find（'a'，string =「歷史價格」）。我剛剛發現如何使用這個 –

Thx，這對我也有幫助。我不知道這種可能性！ – M14

Python - 如何在沒有課程的網頁上找到鏈接？

回答

相關問題