在for語句中，我能夠獲得預期的結果。但爲什麼我不能用while語句得到預期的結果？

我想用網絡瀏覽器檢查'Web Scraping with Pytho code'的操作。在for語句中，我能夠獲得預期的結果。但是，儘管如此，我無法獲得預期的結果。在for語句中，我能夠獲得預期的結果。但爲什麼我不能用while語句得到預期的結果？

刮通過跟蹤維基百科

的URL

環境

·的Python 3.6.0

·瓶0.13-dev的

·mod_wsgi的-4.5.15

Apache錯誤日誌

無輸出

ERR_EMPTY_RESPONSE。

刮痧沒有完成處理

index.py

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
from bottle import route, view 
import datetime 
import random 
import re 

@route('/') 
@view("index_template") 

def index(): 
    random.seed(datetime.datetime.now()) 
    html = urlopen("https://en.wikipedia.org/wiki/Kevin_Bacon") 
    internalLinks=[] 
    links = getLinks("/wiki/Kevin_Bacon") 
    while len(links) > 0: 
     newArticle = links[random.randint(0, len(links)-1)].attrs["href"] 
     internalLinks.append(newArticle) 
     links = getLinks(newArticle) 
    return dict(internalLinks=internalLinks) 

def getLinks(articleUrl): 
    html = urlopen("http://en.wikipedia.org"+articleUrl) 
    bsObj = BeautifulSoup(html, "html.parser") 
    return bsObj.find("div", {"id":"bodyContent"}).findAll("a", href=re.compile("^(/wiki/)((?!:).)*$"))

在for語句中，我能得到預期的結果。

結果Web瀏覽器輸出的

['/wiki/Michael_C._Hall', '/wiki/Elizabeth_Perkins', 
'/wiki/Paul_Erd%C5%91s', '/wiki/Geoffrey_Rush', 
'/wiki/Virtual_International_Authority_File']

index.py

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
from bottle import route, view 
import datetime 
import random 
import re 
@route('/') 
@view("index_template") 
def index(): 
    random.seed(datetime.datetime.now()) 
    html = urlopen("https://en.wikipedia.org/wiki/Kevin_Bacon") 
    internalLinks=[] 
    links = getLinks("/wiki/Kevin_Bacon") 
    for i in range(5): 
     newArticle = links[random.randint(0, len(links)-1)].attrs["href"] 
     internalLinks.append(newArticle) 
    return dict(internalLinks=internalLinks) 
def getLinks(articleUrl): 
    html = urlopen("http://en.wikipedia.org"+articleUrl) 
    bsObj = BeautifulSoup(html, "html.parser") 
    return bsObj.find("div", {"id":"bodyContent"}).findAll("a", href=re.compile("^(/wiki/)((?!:).)*$"))

來源

2017-10-16 re1

你有沒有嘗試添加一個斷點，並跟蹤你的代碼，看看它能走多遠？或者至少添加一些'print'語句來查看它提取的結果是什麼？ – Soviut

另外，請刪除與您的問題無關的所有代碼。 wsgi代碼，視圖等等。他們很難弄清楚應該關注什麼。 – Soviut

我刪除了wsgi代碼。 – re1

您links列表的長度從未到達0因此它會繼續運行，而循環，直到連接時間出。

您的for循環的工作原理是迭代range，所以一旦達到最大範圍，它就會退出。

您從來沒有解釋過爲什麼要使用while循環，但是如果您希望在經過一定次數的迭代後退出，您需要使用計數器。

counter = 0 

# this will exit on the 5th iteration 
while counter < 5: 
    print counter # do something 

    counter += 1 # increment the counter after each iteration

前面將打印

0 1 2 3 4

來源

2017-10-17 04:21:03 Soviut

我誤解了由於跟蹤鏈接導致鏈表長度達到0 – re1

只是要清楚你沒有鏈表，你有一個鏈接列表;） – Soviut

在for語句中，我能夠獲得預期的結果。但爲什麼我不能用while語句得到預期的結果？

回答

相關問題