2017-09-05 76 views
0

解析在Python 3字符串搜索詞,當我想用​​我感興趣的術語只返回字符串,我可以這樣做:Python中沒有找到由BeautifulSoup

phrases = ["1. The cat was sleeping", 
     "2. The dog jumped over the cat", 
     "3. The cat was startled"] 

for phrase in phrases: 
    if "dog" in phrase: 
     print(phrase) 

這當然版畫「2 。狗跳過貓「

現在我想要做的是使相同的概念與分析字符串在BeautifulSoup工作。例如,Craigslist擁有大量的A標籤,但只有A標籤中有「hdrlnk」的標籤對我們很有幫助。所以我:

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a") 

for link in links: 
    if "hdrlnk" in link: 
     print(link) 

問題是不是打印所有A標籤與「hdrlnk」裏面,Python的打印什麼。我不確定發生了什麼問題。

+0

我訪問了鏈接,但無法找到包含「hdrlink」的文本的任何鏈接。 –

回答

4

「hdrlnk」 是鏈接一個類屬性。正如你說你是隻有在這些環節感興趣的只是找到基於類像這樣的鏈接:

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a", {"class": "hdrlnk"}) 

for link in links: 
    print(link) 

輸出:

<a class="result-title hdrlnk" data-id="6293679332" href="/chc/apa/d/high-rise-2-bedroom-heated/6293679332.html">High-Rise 2 Bedroom Heated Pool Indoor Parking Fire Pit Pet Friendly!</a> 
<a class="result-title hdrlnk" data-id="6285069993" href="/chc/apa/d/new-beautiful-studio-in/6285069993.html">NEW-Beautiful Studio in Uptown/free heat</a> 
<a class="result-title hdrlnk" data-id="6293694090" href="/chc/apa/d/albany-park-2-bed-1-bath/6293694090.html">Albany Park 2 Bed 1 Bath Dishwasher W/D &amp; Heat + Parking Incl Pets ok</a> 
<a class="result-title hdrlnk" data-id="6282289498" href="/chc/apa/d/north-center-2-bed-1-bath/6282289498.html">NORTH CENTER: 2 BED 1 BATH HDWD AC UNITS PROVIDE W/D ON SITE PRK INCLU</a> 
<a class="result-title hdrlnk" data-id="6266583119" href="/chc/apa/d/beautiful-2bed-1bath-in-the/6266583119.html">Beautiful 2bed/1bath in the heart of Wrigleyville</a> 
<a class="result-title hdrlnk" data-id="6286352598" href="/chc/apa/d/newly-rehabbed-2-bedroom-unit/6286352598.html">Newly Rehabbed 2 Bedroom Unit! Section 8 OK! Pets OK! (NHQ)</a> 

要獲得鏈接的href或文字用途:

print(link["href"]) 
print(link.text) 
0

嘗試:

for link in links: 
    if "hdrlnk" in link["href"]: 
     print(link) 
0

只是在鏈接內容的搜索詞,否則你的代碼似乎很好

import requests 
from bs4 import BeautifulSoup 

url = "https://chicago.craigslist.org/search/apa" 
r = requests.get(url) 

soup = BeautifulSoup(r.content, "html.parser") 
links = soup.find_all("a") 

for link in links: 
    if "hdrlnk" in link.contents[0]: 
     print(link) 

或者,如果你想裏面的href或標題進行搜索,使用link['href']link['title']

0

爲了得到所需的鏈接,就可以使用腳本中選擇,使刮板強大和簡潔。

import requests 
from bs4 import BeautifulSoup 

base_link = "https://chicago.craigslist.org" 
res = requests.get("https://chicago.craigslist.org/search/apa").text 
soup = BeautifulSoup(res, "lxml") 
for link in soup.select(".hdrlnk"): 
    print(base_link + link.get("href"))