提取頁面介紹信息與美麗的湯

我是新來美麗的湯，我試圖提取出現在頁面上的信息。此信息包含在div class =「_ 50f3」中，根據用戶可以包含多個信息（研究，學習，作品，工作，生活等）。所以，到目前爲止，我已成功雖然下面的代碼來解析DIV類，但我不知道如何提取我從想要的信息..提取頁面介紹信息與美麗的湯

table = soup.findAll('div', {'class': '_50f3'}) 

[<div class="_50f3">Lives in <a class="profileLink" data-hovercard="/ajax/hovercard/page.php?id=114148045261892" href="/Fort-Worth-Texas/114148045261892?ref=br_rs">Fort Worth, Texas</a></div>, 
<div class="_50f3">From <a class="profileLink" data-hovercard="/ajax/hovercard/page.php?id=111762725508574" href="/Dallas-Texas/111762725508574?ref=br_rs">Dallas, Texas</a></div>]

例如，在上面，我想存儲「生活在」：「德克薩斯州的沃斯堡」和「來自」：「德克薩斯州達拉斯」。但在最一般的情況下，我想存儲那裏的任何信息。

任何幫助非常感謝！

來源

2016-07-15 morfara

在一般的情況下，這只是你需要get_text() - 這將構建一個單一的元素文本字符串通過子節點遞歸去：

table = soup.find_all('div', {'class': '_50f3'}) 
print([item.get_text(strip=True) for item in table])

但是，你也可以單獨提取的標籤和值：

d = {} 
for item in table: 
    label = item.find(text=True) 
    value = label.next_sibling 

    d[label.strip()] = value.get_text() 

print(d)

打印：

{'From': 'Dallas, Texas', 'Lives in': 'Fort Worth, Texas'}

來源

2016-07-15 14:38:37 alecxe

for i in range(len(table)): 
    print(table[i].text)

應該工作

來源

2016-07-15 14:39:29

提取頁面介紹信息與美麗的湯

回答

相關問題