用美麗的湯和蟒蛇3.x刮臉表

所以我對python非常陌生，我仍然試圖讓我的頭腦如何運作，現在我正在使用美麗的湯爲數據刮表。我可以使用美麗的湯導航到我想要的特定表格，但是將實際數據拉出來困擾了我，我嘗試的所有事情都失敗了。用美麗的湯和蟒蛇3.x刮臉表

這是我當前的代碼：

sauce = requests.get('https://www.investsmart.com.au/managed-funds/fund/cromwell-phoenix-opportunities-fund/40665') 
soup = BeautifulSoup(sauce.text, 'html.parser') 
tables = soup.findChildren('table') 
my_table = tables[1] 
rows = my_table.findChildren(['tr']) 

for tds in rows[1]: 
    print(tds)

這讓我與輸出

<td class="text-left">Total return</td> 


<td>-2.79</td> 


<td>-2.61</td> 


<td>11.22</td> 


<td>24.6</td> 


<td>19.18</td> 


<td>18.65</td> 


<td>21.44</td> 


<td>-</td>

我想要的是td標籤，最終我想整理到內的實際數字他們各自的月份並將其輸出到excel文件中。

，但是我真的不知道如何抓住剛沒有標籤的回報，當我嘗試：

for tds in rows[1]: 
    print(tds.text)

我得到這個錯誤：AttributeError的：「NavigableString」對象有沒有屬性「文本」

那麼我該如何去抓住這些數據，以便我可以將它們排序並輸出到excel中，因爲我不知道下一步該怎麼做。

來源

2017-08-05 Jisket

sauce = requests.get('https://www.investsmart.com.au/managed-funds/fund/cromwell-phoenix-opportunities-fund/40665') 
soup = bs4.BeautifulSoup(sauce.text, 'html.parser') 
#this gets all the tables in the page, we need the second table 
table = soup.findAll('table')[1] 
#gets all the rows in that table 
rows = table.findAll('tr') 
#since the first row contains all column titles 
column_heads = [i.text.encode('utf-8') for i in rows[0].findAll('th')[1:]] 
#r will hold all the rows in the form of lists 
r = [] 
for i in rows[1:]: 
    r.append([k.text.encode('utf-8') for k in i.findAll('td') ])

所有你需要做的使用瀏覽器的查看源代碼的工具，它會給你一個基於結構的思路上，您可以針對你需要

輸出的標籤被仔細檢查HTML頁面供大家參考：

column_heads = ['1 Month %','3 Month %','6 Month %','1 Year % p.a.','2 Year % p.a.','3 Year % p.a.','5 Year % p.a.','10 Year % p.a.']

功能編碼（）將所有這些是Unicode格式如文本：u'Hello」爲字符串的

打印第一單[R

r[0] = ['Total return','-2.79','-2.61','11.22','24.6','19.18','18.65','21.44','-']

我希望這是你在找什麼

來源

2017-08-05 13:02:31

謝謝感謝，我發現了.string參數並使其以這種方式工作，但這似乎是一種更好的做法，所以我會玩弄它並看看我能做些什麼。作爲編程新手（正如我上週開始的），這非常有幫助。 – Jisket

我有一招，沒有beautifulsoup。安裝pandas。然後在

import pandas as pd 
tables = pd.read_html("http:...")

,tables現在是頁面上的表的列表。

來源

2017-08-05 11:49:31 Gijs

感謝您的建議，其實我是想進入熊貓，但我已經決定先學習BS4，試圖瞭解更多關於Python的知識，因爲我是新手。但是，當我對bs4感到滿意後，我會採取這種方法。 – Jisket

如果要導出到Excel我想一個CSV將工作：

import requests 
from bs4 import BeautifulSoup 

sauce = requests.get('https://www.investsmart.com.au/managed-funds/fund/cromwell-phoenix-opportunities-fund/40665') 
soup = BeautifulSoup(sauce.text, 'html.parser') 
tables = soup.find_all('table') 
with open('csvfile.csv','w') as csv: 
    for row in tables[1].find_all('tr'): 
     line = "" 
     for td in row.find_all(['td', 'th']): 
      line += '"' + td.text + '",' 
     csv.write(line + '\n')

來源

2017-08-05 19:36:39

用美麗的湯和蟒蛇3.x刮臉表

回答

相關問題