2017-02-18 63 views
0

展望挑選出從網站如價格,公司信息等具體數據中提取特定註釋節點之間的數據幸運的是,網站設計師已經把大量的標記,如Python的 - 與BeautifulSoup 4

<!-- Begin Services Table --> 
' desired data 
<!-- End Services Table --> 

我需要什麼樣的代碼才能使BS4在給定標籤之間返回字符串?

import requests 
from bs4 import BeautifulSoup 

url = "http://www.100ll.com/searchresults.phpclear_previous=true&searchfor="+'KPLN'+"&submit.x=0&submit.y=0" 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

text_list = soup.find(id="framediv").find_all(text=True) 
start_index = text_list.index(' Begin Fuel Information Table ') + 1 
end_index = text_list.index(' End Fuel Information Table ') 
for item in text_list[start_index:end_index]: 
    print(item) 

這裏是有問題的網站:

http://www.100ll.com/showfbo.php?HashID=cf5f18404c062da6fa11e3af41358873

+4

等都不是免費的編碼服務。你必須嘗試自己解決問題。如果無法正常工作,請發佈您嘗試的內容,我們會幫助您解決問題。 – Barmar

+1

對不起@Barmar我忘了發佈我的原始代碼! –

回答

1

如果要選擇那些具體的意見(S)後table元素,那麼你可以選擇所有的註釋節點,它們進行過濾以上所需的文本,然後選擇下一個同級元素table

import requests 
from bs4 import BeautifulSoup 
from bs4 import Comment 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

comments = soup.find_all(string=lambda text:isinstance(text,Comment)) 

for comment in comments: 
    if comment.strip() == 'Begin Services Table': 
     table = comment.find_next_sibling('table') 
     print(table) 

可替換地,我F你想獲得這兩種意見之間的所有數據,那麼你可以找到的第一個註釋,然後遍歷所有的下一個兄弟姐妹,直到找到結束註釋:

import requests 
from bs4 import BeautifulSoup 
from bs4 import Comment 

response = requests.get(url) 
soup = BeautifulSoup(response.content, "lxml") 

data = [] 

for comment in soup.find_all(string=lambda text:isinstance(text, Comment)): 
    if comment.strip() == 'Begin Services Table': 
     next_node = comment.next_sibling 

     while next_node and next_node.next_sibling: 
      data.append(next_node) 
      next_node = next_node.next_sibling 

      if not next_node.name and next_node.strip() == 'End Services Table': break; 

print(data)