Python和Pandas：XML - > DataFrame

我現在正在研究一個數據庫，並且我想從XML到一個Pandas DataFrame，而且我現在已經陷入了很長時間。我不知道如何解決這個問題。Python和Pandas：XML - > DataFrame

j=0 
for rows in root.findall('row'): 
    i=0 
    for cells in root.findall('cell') in rows: 
     if i==0: 
      #Name of the country is on the 0-th tag "cell" of each "row" 
      country[j]=cells.text 
     elif i==17: 
      #Number of students is on the 17-th tag "cell" of each "row" 
      numberStudent[j]=cells.text 
     i=i+1 
    j=j+1 
Data=pd.DataFrame({'country': [country], 'number of student': [numberStudent]})

當我試圖讀取數據時，只有一個數據框，其國家值爲0，numberStudent爲0。我不明白什麼是錯的。我一直在尋找答案已經在這個論壇上，但我仍然堅持。

此外，我不確定我是否正確。我想在每個父標籤「行」中找到第0個和第17個標籤「單元格」。在第二次申報中「in」使用兩次是否正確？

感謝您的幫助，

西里爾

來源

2016-11-08 Cyril Schmitt

要找到一個行內的所有單元格，你應該調用的findAll該行的內部循環，而不是根源。

country = [] 
numberStudent = [] 
for row in root.findall('row'): 
    i=0 
    for cell in row.findall('cell'): 
     if i==0: 
      country.append(cell.text) 
     if i==17: 
      numberStudent.append(cell.text) 
     i=i+1 
data=pd.DataFrame({'country': country, 'number of student': numberStudent})

但是，正如你寫的代碼應該產生一個錯誤，所以我懷疑你沒有找到任何行節點。如果你的行節點不是root的子節點，你需要調用root.findall('.//row')，雖然沒有看到你的xml，但是不可能知道這是你的問題。其他

另一種方式是ElementTree的有尋找編號元素的支持，所以你也可以做

country = [cell.text for cell in root.findall('.//row/cell[1]')] 
numberStudent = [cell.text for cell in root.findall('.//row/cell[18]')] 
data=pd.DataFrame({'country': country, 'number of student': numberStudent})

root.findall('.//row/cell[n]')會發現這是一個行元素的第n個子任何電池元件。請注意，ElementTree使用基於一個索引而不是標準的基於Python的索引。

來源

2016-11-08 23:36:27 Jesse

Python和Pandas：XML - > DataFrame

回答

相關問題