2017-07-27 113 views
2

我試圖從這個html塊中取出'id'或'data-value',並將它們分配給一個列表。似乎並不像我指定正確的目標。我哪裏錯了?最終我希望針對is_in_stock部分中的各個產品ID。Python:從html獲取「id」或「data-value」?

我的代碼 -

import requests 
from bs4 import BeautifulSoup as bs 

response = session.get(product_url) 
soup = bs(response.text,'lxml') 
div = soup.find("div",{"class":"item"}) 
all_sizes = div.find_all("data") 

HTML的

             <div class="product-options" id="product-options-wrapper"> 
<script type="text/javascript"> 
        try { 
         var changeConfigurableStatus = true; 
         var stStatus = new StockStatus({"242":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92964"},"246":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92965"},"363":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92966"},"248":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92967"},"243":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92968"},"368":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92969"},"244":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92970"},"247":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92971"},"79":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92972"},"249":{"is_in_stock":true,"custom_status_icon":"","custom_status":"","product_id":"92973"}}); 
        } 
         catch(ex){} 
       </script> 
     <div class="configurable-product-option no-display"> 
     <div class="configurable-product-option-wrapper"> 
      <h2>Please select your size</h2> 
      <div class="drop-select"> 
       <label for="attribute139"></label> 
       <select name="super_attribute[139]" 
         id="attribute139" 
         class="required-entry super-attribute-select"> 
        <option>Choose an Option...</option> 
       </select> 
      </div> 
     </div> 
    </div> 
    <script type="text/javascript"> 
    var spConfig = new Product.Config({"attributes":{"139":{"id":"139","code":"eu_size","label":"EU ","options":[{"id":"242","label":"EU 40 2\/3 \/ US 7.5","price":"0","oldPrice":"0","products":["92964"]},{"id":"246","label":"EU 41 1\/3 \/ US 8","price":"0","oldPrice":"0","products":["92965"]},{"id":"363","label":"EU 42 \/ US 8.5","price":"0","oldPrice":"0","products":["92966"]},{"id":"248","label":"EU 42 2\/3 \/ US 9","price":"0","oldPrice":"0","products":["92967"]},{"id":"243","label":"EU 43 1\/3 \/ US 9.5","price":"0","oldPrice":"0","products":["92968"]},{"id":"368","label":"EU 44 \/ US 10","price":"0","oldPrice":"0","products":["92969"]},{"id":"244","label":"EU 44 2\/3 US 10.5","price":"0","oldPrice":"0","products":["92970"]},{"id":"247","label":"EU 45 1\/3 \/ US 11","price":"0","oldPrice":"0","products":["92971"]},{"id":"79","label":"EU 46 \/ US 11.5","price":"0","oldPrice":"0","products":["92972"]},{"id":"249","label":"EU 46 2\/3 \/ US 12","price":"0","oldPrice":"0","products":["92973"]}]}},"template":"\u20ac#{price}","basePrice":"89","oldPrice":"89","productId":"90522","chooseText":"Choose an Option...","taxConfig":{"includeTax":true,"showIncludeTax":true,"showBothPrices":false,"defaultTax":19,"currentTax":19,"inclTaxTitle":"Incl. Tax"}}); 
</script> 

<h3>Choose size</h3> 
<div class="clearfix " data-attribute="attribute139" > 
       <div class="attribute-item " 
     data-value="242"> 
     EU 40 2/3/US 7.5  </div> 
       <div class="attribute-item " 
     data-value="246"> 
     EU 41 1/3/US 8  </div> 
       <div class="attribute-item " 
     data-value="363"> 
     EU 42/US 8.5  </div> 
       <div class="attribute-item " 
     data-value="248"> 
     EU 42 2/3/US 9  </div> 
       <div class="attribute-item " 
     data-value="243"> 
     EU 43 1/3/US 9.5  </div> 
       <div class="attribute-item " 
     data-value="368"> 
     EU 44/US 10  </div> 
       <div class="attribute-item " 
     data-value="244"> 
     EU 44 2/3 US 10.5  </div> 
       <div class="attribute-item " 
     data-value="247"> 
     EU 45 1/3/US 11  </div> 
       <div class="attribute-item " 
     data-value="79"> 
     EU 46/US 11.5  </div> 
       <div class="attribute-item " 
     data-value="249"> 
     EU 46 2/3/US 12  </div> 
    </div> 

回答

1

你在正確的軌道上,但是您需要tag.find_all而不是find

ids = [] 
for div in soup.find_all("div", {"class":"attribute-item"}): 
    ids.append(x['data-value']) 
+0

感謝您的冷速,我將如何獲取「產品」ID在「var spConfig = new Product.Config({」attributes「:{」139「:{」id 「:」139「,」code「:」eu_size「,」label「:」EU「,」options「:[{」id「:」242「,」label「:」EU 40 2 \/3 \/US 7.5「,」price「:」0「,」oldPrice「:」0「,」products「:[」92964「]},}})」string –

+0

@duchathaway'soup.find('script').text' –

+0

@duchathaway如果這個答案有幫助,你可以_accept_它。點擊幫助答案旁邊的灰色檢查,它會變成綠色。它幫助每個人:) –

1

這應該爲你工作。

import requests 
from bs4 import BeautifulSoup as bs 

response = session.get(product_url) 
soup = bs(response.text,'lxml') 

div = soup.find_all("div",{"class":"attribute-item"}) # Select the divs with .attribute-item class 
all_sizes = [x['data-value'] for x in div] # Extract the 'data-value' attribute from all the divs with .attribute-item 
+0

謝謝你magoon,我會怎麼瞄準獲得「產品」 ID在the-「變種spConfig = new Product.Config({「attributes」:{「139」:{「id」:「139」,「code」:「eu_size」,「label」:「EU」,「options」:[{{ 「id」:「242」,「label」:「EU 40 2 \/3 \/US 7.5」,「price」:「0」,「oldPrice」:「0」,「products」:[「92964」] },}})「string? –

+1

@duchathaway你需要從javascript的那一行中獲得它,或者你可以從它的'