獲取與BeautifulSoup和Python

我想使用Python和美麗的湯，提取下面的標籤的內容部分meta標籤的內容屬性：獲取與BeautifulSoup和Python

<meta property="og:title" content="Super Fun Event 1" /> 
<meta property="og:url" content="http://superfunevents.com/events/super-fun-event-1/" />

我越來越BeautifulSoup加載頁面就好了找到其他的東西（這也抓住了隱藏在源代碼中的ID標籤的文章ID），但我不知道正確的方式來搜索HTML和找到這些位，我試過變種find和findAll無濟於事。該代碼遍歷當前的URL列表...

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

#importing the libraries 
from urllib import urlopen 
from bs4 import BeautifulSoup 

def get_data(page_no): 
    webpage = urlopen('http://superfunevents.com/?p=' + str(i)).read() 
    soup = BeautifulSoup(webpage, "lxml") 
    for tag in soup.find_all("article") : 
     id = tag.get('id') 
     print id 
# the hard part that doesn't work - I know this example is well off the mark!   
    title = soup.find("og:title", "content") 
    print (title.get_text()) 
    url = soup.find("og:url", "content") 
    print (url.get_text()) 
# end of problem 

for i in range (1,100): 
    get_data(i)

如果有人能幫助我整理了一下，找到了OG：標題和OG：內容會是太棒了！

來源

2016-04-21 the_t_test_1

作爲第一個參數find()提供的meta標籤名。然後，使用關鍵字參數來檢查的特定屬性：如果你知道的標題和URL元屬性將始終存在

title = soup.find("meta", property="og:title") 
url = soup.find("meta", property="og:url") 

print(title["content"] if title else "No meta title given") 
print(url["content"] if url else "No meta url given")

的if/else這裏檢查將是可選的。

來源

2016-04-21 11:42:10 alecxe

有沒有內置的獲取內容，否則退回到默認？ –

@ChristopheRoussy是的，這正是答案中所顯示的。另外，你可以通過使用'soup.find（「meta」，property =「og：title」，content = True）'來加強'content'屬性的存在。謝謝。 – alecxe

試試這個：

soup = BeautifulSoup(webpage) 
for tag in soup.find_all("meta"): 
    if tag.get("property", None) == "og:title": 
     print tag.get("content", None) 
    elif tag.get("property", None) == "og:url": 
     print tag.get("content", None)

來源

2016-04-21 11:37:18 Hackaholic

請問後續問題？

我想用bs4得到<meta name='keywords' content=''></>，而是得到一行結果我得到了整個元塊。你碰巧知道爲什麼？

解析的網站：https://www.bilibili.com/video/av6862467/#page=4

目標塊：

<meta name="keywords" content="【SNH48】20161028 原創公演 TeamX《夢想的旗幟》首演 全場 CUT,娛樂,明星,SNH48-TeamX應援會,,嗶哩嗶哩,Bilibili,B站,彈幕" />

代碼：

metatags = soup.find_all('meta',attrs={'name':'keywords'})                
for tag in metatags: 
    print(tag)

來源

2017-12-20 05:19:30 CrazyFrog

獲取與BeautifulSoup和Python

回答

相關問題