在Python中使用scrapy獲取'全局名稱未定義'錯誤

-2

我一直在從Ryan Mitchell的一本名爲「Web Scraping with Python」的書中學習scrapy。本書中有一個代碼可以從網站獲取外部鏈接。儘管我使用的是與本書相同的代碼（我唯一做的就是將'urllib.request'更改爲'urllib2'），但我仍然得到相同的錯誤。 Python版本是2.7.12。這是錯誤：在Python中使用scrapy獲取'全局名稱未定義'錯誤

File "test.py", line 28, in <module> 
getAllExternalLinks("http://www.oreilly.com") 
File "test.py", line 16, in getAllExternalLinks 
internalLinks = getInternalLinks(bsObj, splitAddress(siteUrl)[0]) 
NameError: global name 'getInternalLinks' is not defined

這是我正在使用的代碼。

from urllib2 import urlopen 
from urlparse import urlparse 
from bs4 import BeautifulSoup 
import re 
allExtLinks = set() 

allIntLinks = set() 

def getAllExternalLinks(siteUrl): 

    html = urlopen(siteUrl) 

    bsObj = BeautifulSoup(html) 

    internalLinks = getInternalLinks(bsObj,splitAddress(siteUrl)[0]) 

    externalLinks = getExternalLinks(bsObj,splitAddress(siteUrl)[0]) 

    for link in externalLinks: 

     if link not in allExtLinks: 

      allExtLinks.add(link) 

      print(link) 

    for link in internalLinks: 

     if link not in allIntLinks: 

      print("About to get link: "+link) 

      allIntLinks.add(link) 

      getAllExternalLinks(link) 

getAllExternalLinks("http://www.oreilly.com")

來源

2016-11-13 Aniya

我不知道你正在使用哪本書的版本，或者你在哪裏設法複製這段代碼 - 但它缺少一半的代碼。完整的例子是[在github上可用]（https://github.com/REMitchell/python-scraping/blob/master/chapter3/5-getAllExternalLinks.py）。 –

在編譯之前仔細閱讀示例代碼。看，代碼中沒有getInternalLinks()函數。

來源

2016-11-13 04:40:17

在Python中使用scrapy獲取'全局名稱未定義'錯誤

回答

相關問題