通過python字符串函數刪除字符串附加字符

這是我想從中提取位置信息的Web CSS。通過python字符串函數刪除字符串附加字符

<div class="location"> 
    <div class="listing-location">Location</div> 
    <div class="location-areas"> 
    <span class="location">Al Bayan</span> 
    ‪,‪ 
    <span class="location">Nepal</span> 
    </div> 
    <div class="area-description"> 3.3 km from Mall of the Emirates </div> 
    </div>

的Python Beautuifulsoup4我使用的代碼是：

try: 
      title= soup.find('span',{'id':'listing-title-wrap'}) 
      title_result= str(title.get_text().strip()) 
      print "Title: ",title_result 
    except StandardError as e: 
      title_result="Error was {0}".format(e) 
      print title_result

輸出：

"Al Bayanأ¢â‚¬آھ,أ¢â‚¬آھ 

          Nepal"

我怎麼能轉換格式爲以下

['Al Bayan', 'Nepal']

什麼應該是代碼的第二行以獲得此輸出

來源

2016-06-01 Panetta

生成此輸出的HTML是什麼？ – 2016-06-01 07:01:47

他們都是那種格式嗎？一些jbberish然後2個換行符然後是真正的文本？ – Keatinge

試試這個解決方案http://stackoverflow.com/a/2743163/524743 – Samuel

你讀錯了，只是閱讀類位置的跨度

soup = BeautifulSoup(html, "html.parser") 
locList = [loc.text for loc in soup.find_all("span", {"class" : "location"})] 
print(locList)

此打印你想要什麼：

['Al Bayan', 'Nepal']

來源

2016-06-01 07:15:41 Keatinge

[u'Al Bayan'，'u'Nepal]這是輸出。 – Panetta

用字符串映射。這會給你預期的結果。 'map（str，output_list）' –

@Panetta我稍微改了一下，現在就運行它。沒有理由使用地圖時，已經有一個列表補償 – Keatinge

有一個單線解決方案。考慮將a作爲您的字符串。

In [38]: [i.replace(" ","") for i in filter(None,(a.decode('unicode_escape').encode('ascii','ignore')).split('\n'))] 
Out[38]: ['Al Bayan,', 'Nepal']

來源

2016-06-01 07:15:09

asci編解碼器不能編碼字符u'\ u202a'。試過了，這是錯誤 – Panetta

@Panetta你確切的錯誤是什麼。並且你給了什麼作爲輸入。這對我很有用。 –

您可以使用正則表達式只能過濾字母和空格：

>>> import re 
>>> re.findall('[A-Za-z ]+', area_result) 
['Al Bayan', ' Nepal']

希望它會有所幫助。

來源

2016-06-01 07:19:24 3kt

通過python字符串函數刪除字符串附加字符

回答

相關問題