如何使用BeautifulSoup和Python獲取屬性值？

我無法用BeautifulSoup和Python獲得屬性值。下面是XML是如何構成的：如何使用BeautifulSoup和Python獲取屬性值？

... 
</total> 
<tag> 
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat> 
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat> 
    ... 
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat> 
</tag> 
<suite> 
...

什麼我試圖得到的是pass值，但對我的生活中，我只是不明白怎麼做了。我檢查了BeautifulSoup，似乎我應該使用類似stat['pass']的東西，但這似乎不起作用。

這裏是我的代碼：

with open('../results/output.xml') as raw_resuls: 
results = soup(raw_resuls, 'lxml') 
for stat in results.find_all('tag'): 
      print stat['pass']

如果我做results.stat['pass']它返回的值是另一個標籤內，在XML BLOB一路上漲。

如果我打印stat變量，我得到以下幾點：

<stat fail="0" pass="1">TR=787878 Sandbox=3000614</stat> 
... 
<stat fail="0" pass="1">TR=888888 Sandbox=3000610</stat>

這似乎是好的。

我很確定我錯過了什麼或做錯了什麼。我應該在哪裏看？我採取了錯誤的做法嗎？

任何意見或指導將不勝感激！謝謝

來源

2017-04-03 Xour

請考慮這種方法：

from bs4 import BeautifulSoup 

with open('test.xml') as raw_resuls: 
    results = BeautifulSoup(raw_resuls, 'lxml') 

for element in results.find_all("tag"): 
    for stat in element.find_all("stat"): 
     print(stat['pass'])

您的解決方案的問題是，通包含在統計而不是在標籤，你搜索。

該解決方案搜索所有標籤在這些標籤它搜索統計。從這些結果中得到通過。

XML文件

<tag> 
    <stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat> 
    <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat> 
    <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat> 
</tag>

上面的腳本得到的輸出

1 
1 
1

加成

由於一些detailes似乎仍然不清楚（見註釋）考慮這個完整解決方法是使用BeautifulSoup來獲得所需的一切。如果您遇到性能問題，則使用字典作爲列表元素的解決方案可能並不完美。但是，由於您似乎在使用Python和Soup時遇到了一些麻煩，因此我認爲通過提供按名稱而不是索引訪問所有相關信息的可能性，儘可能簡化了此示例。

from bs4 import BeautifulSoup 

# Parses a string of form 'TR=abc123 Sandbox=abc123' and stores it in a dictionary with the following 
# structure: {'TR': abc123, 'Sandbox': abc123}. Returns this dictionary. 
def parseTestID(testid): 
    dict = {'TR': testid.split(" ")[0].split("=")[1], 'Sandbox': testid.split(" ")[1].split("=")[1]} 
    return dict 

# Parses the XML content of 'rawdata' and stores pass value, TR-ID and Sandbox-ID in a dictionary of the 
# following form: {'Pass': pasvalue, TR': TR-ID, 'Sandbox': Sandbox-ID}. This dictionary is appended to 
# a list that is returned. 
def getTestState(rawdata): 
    # initialize parser 
    soup = BeautifulSoup(rawdata,'lxml') 
    parsedData= [] 

    # parse for tags 
    for tag in soup.find_all("tag"): 
     # parse tags for stat 
     for stat in tag.find_all("stat"): 
      # store everthing in a dictionary 
      dict = {'Pass': stat['pass'], 'TR': parseTestID(stat.string)['TR'], 'Sandbox': parseTestID(stat.string)['Sandbox']} 
      # append dictionary to list 
      parsedData.append(dict) 

    # return list 
    return parsedData

您可以使用上面如下做任何你想要的腳本（如剛剛打印出來）

# open file 
with open('test.xml') as raw_resuls: 
    # get list of parsed data 
    data = getTestState(raw_resuls) 

# print parsed data 
for element in data: 
    print("TR = {0}\tSandbox = {1}\tPass = {2}".format(element['TR'],element['Sandbox'],element['Pass']))

輸出看起來像這樣

TR = 111111 Sandbox = 3000613 Pass = 1 
TR = 121212 Sandbox = 3000618 Pass = 1 
TR = 222222 Sandbox = 3000612 Pass = 1 
TR = 232323 Sandbox = 3000618 Pass = 1 
TR = 333333 Sandbox = 3000605 Pass = 1 
TR = 343434 Sandbox = ZZZZZZ Pass = 1 
TR = 444444 Sandbox = 3000604 Pass = 1 
TR = 454545 Sandbox = 3000608 Pass = 1 
TR = 545454 Sandbox = XXXXXX Pass = 1 
TR = 555555 Sandbox = 3000617 Pass = 1 
TR = 565656 Sandbox = 3000615 Pass = 1 
TR = 626262 Sandbox = 3000602 Pass = 1 
TR = 666666 Sandbox = 3000616 Pass = 1 
TR = 676767 Sandbox = 3000599 Pass = 1 
TR = 737373 Sandbox = 3000603 Pass = 1 
TR = 777777 Sandbox = 3000611 Pass = 1 
TR = 787878 Sandbox = 3000614 Pass = 1 
TR = 828282 Sandbox = 3000600 Pass = 1 
TR = 888888 Sandbox = 3000610 Pass = 1 
TR = 999999 Sandbox = 3000617 Pass = 1

讓我們summerize的核心要素被使用：

查找XML標記 要查找使用soup.find("tag")的XML標籤，將返回第一個匹配的標籤或soup.find_all("tag")，該標籤會查找所有匹配的標籤並將它們存儲在列表中。通過迭代列表可以輕鬆訪問單個標籤。

查找嵌套標籤 要發現你可以通過將其應用到第一find_all()的結果再次使用find()或find_all()嵌套的標籤。

訪問標籤 的內容要訪問標籤的內容應用string到單個標籤。例如，如果tag = <tag>I love Soup!</tag>tag.string = "I love Soup!"。

查找屬性值 要獲取屬性的值，可以使用下標符號。例如，如果tag = <tag color=red>I love Soup!</tag>tag['color']="red"。

用於解析表格"TR=abc123 Sandbox=abc123"的字符串我使用了常見的Python字符串分割。你可以在這裏閱讀更多關於它：How can I split and parse a string in Python?

來源

2017-04-03 21:14:44 datell

我明白了，我現在明白了，完全合理！它現在工作得很好，感謝它！如果可以問，我還有一個問題：因爲我只有一個'tag'屬性，是否需要for循環？如果不是，我該如何直接去看那個'tag'屬性？謝謝！ – Xour

我可以幫助你！你可以通過upvoting和接受它作爲正確答案來顯示這個答案滿足你的需求http://stackoverflow.com/help/someone-answers – datell

不能upvote，沒有足夠的代表:( – Xour

你的「標籤」可以有多個「stat」條目。你只有一個「標籤」條目？

如果是這樣，那麼首先找到「標籤」，然後遍歷「標籤」條目中包含的「stat」條目。喜歡的東西：

for stat in soup.find("tag").find_all("stat"): 
    print(stat["pass"])

來源

2017-04-03 21:13:50 RobertB

嗨。只有一個「標籤」條目。但是，由於某種原因，當我運行你的代碼時，它不會返回任何東西。如果我刪除'.find_all（「stat」）'部分（僅用於調試），它將返回第一個_stat_標記。感謝您的回覆！ – Xour

基於@ Aaron3468和我的文章，你應該可以把它弄出來。在「標籤」上做一個「查找」應該返回「標籤」的所有「統計」的全部內容。不知道如何解釋你所看到的。 – RobertB

我不確定是否在我的XML文件或什麼東西上，但我嘗試了這種方法（與上面的datell建議的相同），但它什麼也沒有返回。如果我做的： '開放的（ '../結果/的Output.xml'）作爲raw_resuls： \t結果=湯（raw_resuls， 'LXML'） \t在results.find STAT（「標籤「）.find_all（」 STAT 「）： \t \t打印 '測試' \t \t打印（STAT [」通「]）' 無打印，即使是_test_串，不知道爲什麼。 PS：對不起，我無法正確格式化代碼！ – Xour

的問題是，find_all('tag')回報整個 HTML塊題爲tag：

>>> results.find_all('tag')                  
[<tag>                      
<stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>         
<stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>         
<stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>         
</tag>]

你的意圖是收集各stat塊，所以你應該用results.find_all('stat')：

>>> stat_blocks = results.find_all('stat')                  
[<stat fail="0" pass="1">TR=111111 Sandbox=3000613</stat>, <stat fail="0" pass="1">TR=121212 Sandbox=3000618</stat>, <stat fail="0" pass="1">TR=999999 Sandbox=3000617</stat>]

從那裏，修復代碼是微不足道的凝聚「通行證」到一個列表：

>>> passes = [s['pass'] if s is not None else None for s in stat_blocks]     
>>> passes                     
['1', '1', '1']

或打印：

>>> for s in stat_blocks:                 
...  print(s['pass'])                 
...                       
1                       
1                       
1

在蟒蛇，它測試的結果是非常重要的，因爲打字方式太動態信任你的記憶。我經常在類和模塊中包含一個靜態的test函數，以確保返回類型和值是我所期望的。

來源

2017-04-03 21:14:01 Aaron3468

謝謝，這很有道理。我不應該在XML文件中提到更多'stats'屬性，但我只關心'tag'節點內的屬性。感謝您的回覆，非常感謝！ – Xour

@Xour啊，夠公平的，那麼你只需使用'results.find_all（'tag'）。find_all（'stat'）'。 Upvote任何你發現有幫助和信息的答案，並仔細檢查你選擇了一個最佳答案。乾杯! – Aaron3468

如何使用BeautifulSoup和Python獲取屬性值？

回答

相關問題