美麗的湯作品有時

這裏我有一個項目的名稱，一個很簡單的刮刀上LLBEAN美麗的湯作品有時

import urllib2 
from bs4 import BeautifulSoup 

def mainTest(): 
    url = "http://www.llbean.com/llb/shop/43281?feat=506697-GN2&page=women-s-l-l-bean-boots-10-shearling-lined&attrValue_0=Brown/Brown&productId=732934" 
    page=urllib2.urlopen(url) 
    soup = BeautifulSoup(page.read(), "html5lib") 
    name = soup.find('h1', attrs={'itemprop':'name'}).text 
    print name 
    print str(soup)[:100] 

mainTest()

這刮刀通常工作。它通常打印什麼，我想：

       Women's Bean Boots® by L.L.Bean, 10" Shearling-Lined 

<!DOCTYPE html> 
<html class="no-js" lang="en"><head> 
     <meta charset="utf-8"/> 
     <meta c

但有時它會打印：

None 
<html><head></head><body></body></html>

這是一個相當困難的錯誤重複，我感到抱歉。我相信這個問題可能出在llbean上，有時加載速度比我的刮板還要快，有時我的刮板會先刮。

有沒有人知道一種方法來減緩我的刮板可能？

這也可能是一個完全不同的問題。

來源

2015-11-02 Rorschach

你檢查HTTP狀態代碼？您確定該網站偶爾不會返回錯誤嗎？ – FatalError

你有沒有考慮過使用scrapy？如果你要刮蹭llbean，你可以使用爬行器 –

在循環中執行它幾百次 - 總是得到「女士的豆豆靴......」。 – alecxe

給我以下代碼工作 - 護理編碼。我成功嘗試了10次以上。

import urllib2 
from bs4 import BeautifulSoup 

def mainTest(): 
    url = "http://www.llbean.com/llb/shop/43281?feat=506697-GN2&page=women-s-l-l-bean-boots-10-shearling-lined&attrValue_0=Brown/Brown&productId=732934" 
    page=urllib2.urlopen(url) 
    page=page.read() 
    soup = BeautifulSoup(page, "html5lib") 
    name = soup.find('h1', attrs={'itemprop':'name'}).text.encode('utf-8') 
    print name 
    print unicode(soup)[:100] 

mainTest()

它prints-

       Women's Bean Boots┬« by L.L.Bean, 10" Shearling-Lined 

<!DOCTYPE html> 
<html class="no-js" lang="en"> 
<head> 
<meta charset="utf-8"/> 
<meta content="IE=edge

來源

2015-11-02 05:42:22 SIslam

美麗的湯作品有時

回答

相關問題