刮表

2017-04-11 85 views
0

我想提取一個房子的屬性和相應的值。我有興趣獲得{鍵:{物業類型:商業物業,購買價格:475,000瑞士法郎等}刮表

我能夠逐一提取值,但不是作爲更新我的字典的循環。

<dl class="row xsmall-up-2 medium-up-3 large-up-4 attributes-grid"> 
    <div class="column"> 
     <dt class="label-text"> 
      Property type 
     </dt> 
     <dd> 
Commercial property   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Purchase price 
     </dt> 
     <dd> 
CHF 475,000   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Floor space 
     </dt> 
     <dd> 
114 m&sup2;   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Floor 
     </dt> 
     <dd> 
1. floor    </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Year of construction 
     </dt> 
     <dd> 
1989   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Balcony/ies 
     </dt> 
     <dd> 
       <i class="fa fa-check text-green" aria-hidden="true"></i> 
     </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Indoor parking 
     </dt> 
     <dd> 
       <i class="fa fa-check text-green" aria-hidden="true"></i> 
     </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Outdoor parking 
     </dt> 
     <dd> 
       <i class="fa fa-check text-green" aria-hidden="true"></i> 
     </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Lift 
     </dt> 
     <dd> 
       <i class="fa fa-check text-green" aria-hidden="true"></i> 
     </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Cable TV 
     </dt> 
     <dd> 
       <i class="fa fa-check text-green" aria-hidden="true"></i> 
     </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Public transport stop 
     </dt> 
     <dd> 
150 m   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Motorway 
     </dt> 
     <dd> 
500 m   </dd> 
    </div> 
    <div class="column"> 
     <dt class="label-text"> 
      Shops 
     </dt> 
     <dd> 
300 m   </dd> 
    </div> 
</dl> 
+0

你在問你的問題之前完成了什麼?給我們你的代碼片段! –

回答

1

考慮到您提供的html文本,它以字符串形式存儲在table_text中。

from bs4 import BeautifulSoup 
soup = BeautifulSoup(table_text,"lxml") 
temp_dict = {} 
for d in soup.find_all("div",{"class":"column"}): 
    temp_dict[d.find("dt").text.strip()] = d.find("dd").text.strip() 
print(temp_dict) 

我猜你所提供的HTML文本是爲表中只有一行,如果你想這對於所有行,在他們循環,並保持一個父字典,你更新的行作爲一個鍵,temp_dict作爲每次迭代的值。這會給你你想要的結構。