2013-05-08 61 views
-1

下面的HTML代碼:通過標籤的內容Beautifulsoup搜索標籤

<div class="rating-list"> 
<ul class="recommend"> 
<li> 
<span class="recommend-titleInline">Stayed April 2013, traveled as a couple</span> 
<ul class="recommend-column first"> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Value</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Location</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Sleep Quality</li> 
</ul> 
<ul class="recommend-column"> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Rooms</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Cleanliness</li> 
<li class="recommend-answer"> 
<span class="rate rate_ss ss50"> 
<img class="sprite-ratings" src="http://c1.tacdn.com/img2/x.gif" alt="5 of 5 stars" content="5.0"/> 
</span> 
Service</li> 
</ul> 
</li> 
</ul> 
</div> 

現在我已經使用Beautifulsoup得到整個標籤的話,我想這樣的「禮」標籤:

valueRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Value') 
locationRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Location') 
sleepRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Sleep Quality') 
     roomRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Rooms') 
     cleanRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Cleanliness') 
     serviceRatingTag = subRatingListTags[i].find(name = 'li', attrs = { 'class' : 'recommend-answer' }, text = 'Service') 

但似乎fail.the六個變量都沒有,這是不是我expect.what我應該做

回答

0

會使用一個正則表達式作爲參數傳遞給text幫助?

subRatingListTags[i].find(text=re.compile("Location")) 

換行符可能導致確切的文本匹配在這裏失敗。

+0

這樣一來,我只能得到字符串 '位置',而不是標籤 5 of 5 stars 位置 – haipeng31 2013-05-08 06:53:19

0

你不清楚你想要什麼。總之:

>>> lis = [t for t in soup.find_all('li', 'recommend-answer')] 
>>> lis[0].text 
'\n\n\n\nValue' 
>>> lis[1].text 
'\n\n\n\nLocation' 
>>> lis[0].img['alt'] 
'5 of 5 stars' 

你一定要預處理的HTML開始分析它之前刪除所有換行符。