2011-05-22 72 views
1

我想創建一個函數,它將從給定的URL中提取元關鍵字並將其返回。然而,無論我傳遞給它什麼URL,它總是會失敗。BeautifulSoup無法提取元數據

def GetKeywords(url): 
    soup = BeautifulSoup(url) 
    keywords = soup.findAll('meta', attrs={'name':re.compile("^keywords$", re.I)}) #Find all meta keywords on that page 
    if len(keywords) == 0: #Check to see if that page has any meta keywords to begin with 
    print "No meta keywords for: " + str(url) 
    return -1 
    else: #If so then return them 
    return keywords 

回答

3

哪裏的BeautifulSoup狀態將接受獲取的網址是什麼?

soup = BeautifulSoup(url) 

很抱歉,但閱讀BeautifulSoup文檔第一自己而不是試圖和猜測API方法..

http://www.crummy.com/software/BeautifulSoup/documentation.html#Parsing文檔

你想很可能是使用Python的urllib2的模塊獲取數據是什麼你自己 餵養之前,它進入BeautifulSoup或你看看像scrapy模塊。

+0

感謝您的答覆,我今天剛剛開始學習python,我嘗試閱讀BS文檔,但並不完全理解它。再次感謝,非常感謝。 – Mhoad 2011-05-22 11:37:47