Python的 - 與BeautifulSoup 4

展望挑選出從網站如價格，公司信息等具體數據中提取特定註釋節點之間的數據幸運的是，網站設計師已經把大量的標記，如Python的 - 與BeautifulSoup 4

<!-- Begin Services Table --> 
' desired data 
<!-- End Services Table -->

我需要什麼樣的代碼才能使BS4在給定標籤之間返回字符串？

import requests 
from bs4 import BeautifulSoup 

url = "http://www.100ll.com/searchresults.phpclear_previous=true&searchfor="+'KPLN'+"&submit.x=0&submit.y=0" 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

text_list = soup.find(id="framediv").find_all(text=True) 
start_index = text_list.index(' Begin Fuel Information Table ') + 1 
end_index = text_list.index(' End Fuel Information Table ') 
for item in text_list[start_index:end_index]: 
    print(item)

這裏是有問題的網站：

http://www.100ll.com/showfbo.php?HashID=cf5f18404c062da6fa11e3af41358873

來源

2017-02-18 Seth Killian

等都不是免費的編碼服務。你必須嘗試自己解決問題。如果無法正常工作，請發佈您嘗試的內容，我們會幫助您解決問題。 – Barmar

對不起@Barmar我忘了發佈我的原始代碼！ –

如果要選擇那些具體的意見（S）後table元素，那麼你可以選擇所有的註釋節點，它們進行過濾以上所需的文本，然後選擇下一個同級元素table：

import requests 
from bs4 import BeautifulSoup 
from bs4 import Comment 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

comments = soup.find_all(string=lambda text:isinstance(text,Comment)) 

for comment in comments: 
    if comment.strip() == 'Begin Services Table': 
     table = comment.find_next_sibling('table') 
     print(table)

可替換地，我F你想獲得這兩種意見之間的所有數據，那麼你可以找到的第一個註釋，然後遍歷所有的下一個兄弟姐妹，直到找到結束註釋：

import requests 
from bs4 import BeautifulSoup 
from bs4 import Comment 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

data = [] 

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)): 
    if comment.strip() == 'Begin Services Table': 
     next_node = comment.next_sibling 

     while next_node and next_node.next_sibling: 
      data.append(next_node) 
      next_node = next_node.next_sibling 

      if not next_node.name and next_node.strip() == 'End Services Table': break; 

print(data)

來源

2017-02-18 02:18:49

Python的 - 與BeautifulSoup 4

回答

相關問題