2017-04-26 151 views
0

我不斷收到錯誤:AttributeError: 'NodeList' object has no attribute 'data',但我只是想檢查該節點是否爲空,如果是,只需傳遞-1而不是值。我的理解是temp_pub.getElementsByTagName("pages").data應該返回None。我該如何解決?檢查ElementTree節點是否爲空失敗

(PS-我試過!= Noneis None

xmldoc = minidom.parse('pubsClean.xml') 

#loop through <pub> tags to find number of pubs to grab 
root = xmldoc.getElementsByTagName("root")[0] 
pubs = [a.firstChild.data for a in root.getElementsByTagName("pub")] 
num_pubs = len(pubs) 
count = 0 

while(count < num_pubs): 

    temp_pages = 0 
    #get data from each <pub> tag 
    temp_pub = root.getElementsByTagName("pub")[count] 
    temp_ID = temp_pub.getElementsByTagName("ID")[0].firstChild.data 
    temp_title = temp_pub.getElementsByTagName("title")[0].firstChild.data 
    temp_year = temp_pub.getElementsByTagName("year")[0].firstChild.data 
    temp_booktitle = temp_pub.getElementsByTagName("booktitle")[0].firstChild.data 
    #handling no value 
    if temp_pub.getElementsByTagName("pages").data != None: 
     temp_pages = temp_pub.getElementsByTagName("pages")[0].firstChild.data 
    else: 
     temp_pages = -1 

    temp_authors = temp_pub.getElementsByTagName("authors")[0] 
    temp_author_array = [a.firstChild.data for a in temp_authors.getElementsByTagName("author")] 
    num_authors = len(temp_author_array) 
    count = count + 1 

XML正在處理

<pub> 
    <ID>5010</ID> 
    <title>Model-Checking for L<sub>2</sub</title> 
    <year>1997</year> 
    <booktitle>Universit&auml;t Trier, Mathematik/Informatik, Forschungsbericht</booktitle> 
    <pages></pages> 
    <authors> 
     <author>Helmut Seidl</author> 
    </authors> 
</pub> 
<pub> 
    <ID>5011</ID> 
    <title>Locating Matches of Tree Patterns in Forest</title> 
    <year>1998</year> 
    <booktitle>Universit&auml;t Trier, Mathematik/Informatik, Forschungsbericht</booktitle> 
    <pages></pages> 
    <authors> 
     <author>Andreas Neumann</author> 
     <author>Helmut Seidl</author> 
    </authors> 
</pub> 

從編輯的完整代碼(有將ElementTree)

#for execute command to work 
import sqlite3 
import xml.etree.ElementTree as ET 
con = sqlite3.connect("publications.db") 
cur = con.cursor() 

from xml.dom import minidom 
#use this to clean the foreign characters 
import re 

def anglicise(matchobj): 
    if matchobj.group(0) == '&amp;': 
     return matchobj.group(0) 
    else: 
     return matchobj.group(0)[1] 

outputFilename = 'pubsClean.xml' 

with open('test.xml') as inXML, open(outputFilename, 'w') as outXML: 
    outXML.write('<root>\n') 
    for line in inXML.readlines(): 
     if (line.find("<sub>") or line.find("</sub>")): 
      newline = line.replace("<sub>", "") 
      newLine = newline.replace("</sub>", "") 
     outXML.write(re.sub('&[a-zA-Z]+;',anglicise,newLine)) 
    outXML.write('\n</root>') 


tree = ET.parse('pubsClean.xml') 
root = tree.getroot() 

xmldoc = minidom.parse('pubsClean.xml') 
#loop through <pub> tags to find number of pubs to grab 
root2 = xmldoc.getElementsByTagName("root")[0] 
pubs = [a.firstChild.data for a in root2.getElementsByTagName("pub")] 
num_pubs = len(pubs) 
count = 0 

while(count < num_pubs): 

    temp_pages = 0 
    #get data from each <pub> tag 

    temp_ID = root.find(".//ID").text 
    temp_title = root.find(".//title").text 
    temp_year = root.find(".//year").text 
    temp_booktitle = root.find(".//booktitle").text 
    #handling no value 
    if root.find(".//pages").text: 
     temp_pages = root.find(".//pages").text 
    else: 
     temp_pages = -1 

    temp_authors = root.find(".//authors") 
    temp_author_array = [a.text for a in temp_authors.findall(".//author")] 
    num_authors = len(temp_author_array) 
    count = count + 1 

    #process results into sqlite 
    pub_params = (temp_ID, temp_title) 
    cur.execute("INSERT OR IGNORE INTO publication (id, ptitle) VALUES (?, ?)", pub_params) 
    cur.execute("INSERT OR IGNORE INTO journal (jtitle, pages, year, pub_id, pub_title) VALUES (?, ?, ?, ?, ?)", (temp_booktitle, temp_pages, temp_year, temp_ID, temp_title)) 
    x = 0 
    while(x < num_authors): 
     cur.execute("INSERT OR IGNORE INTO authors (name, pub_id, pub_title) VALUES (?, ?, ?)", (temp_author_array[x],temp_ID, temp_title)) 
     cur.execute("INSERT OR IGNORE INTO wrote (name, jtitle) VALUES (?, ?)", (temp_author_array[x], temp_booktitle)) 
     x = x + 1 


con.commit() 
con.close()  

print("\nNumber of entries processed: ", count)  
+0

如果你給我們完成它會更容易幫助,複製pastable代碼+在XML正在處理中。總之,提供一個[mcve]。 – mzjn

+0

@mzjn好的,抱歉,網站是新的,我不希望它太長。我現在就添加它。 – douglasrcjames

+0

對不起,這是一個嘮叨,但你應該付出更多的努力,使例子最小和完成。例如,sqlite的東西似乎不相關。並請確定您是否在詢問有關miniidom或ElementTree。 「正在處理的XML」不是真正的XML,因爲沒有根元素。我注意到你在你的代碼中添加了根元素,但是這與你得到的錯誤似乎無關。 – mzjn

回答

0

可以使用attributes方法得到一個類似於字典的對象(Doc),然後查詢字典:

if temp_pub.getElementsByTagName("pages").attributes.get('data'): 
+0

原始錯誤沒有變化:/ – douglasrcjames

0

由於錯誤信息提示getElementsByTagName()回報既不單節點也不None,但`節點列表。所以,你應該檢查長度,看看如果返回的列表中包含的任何項目:

if len(temp_pub.getElementsByTagName("pages")) > 0: 
    temp_pages = temp_pub.getElementsByTagName("pages")[0].firstChild.data 

,或者你可以直接通過列表if因爲空列表falsy

if temp_pub.getElementsByTagName("pages"): 
    temp_pages = temp_pub.getElementsByTagName("pages")[0].firstChild.data 

注意,儘管標題和這個問題的標籤,您的代碼建議您使用minidom而不是ElementTree。你的代碼可以使用ElementTree,例如更簡單:

# minidom 
temp_ID = temp_pub.getElementsByTagName("ID")[0].firstChild.data 
# finding single element can be using elementtree's `find()` 
temp_ID = temp_pub.find(".//ID").text 
.... 
# minidom 
temp_author_array = [a.firstChild.data for a in temp_authors.getElementsByTagName("author")] 
# finding multiple elements using elementtree's `find_all()` 
temp_author_array = [a.text for a in temp_authors.find_all(".//author")] 
+0

TypeError:'>'不支持'NodeList'和'int''實例之間的第一個錯誤,與第二個原始錯誤相同。 當我試圖打印節點時,我得到了一個<0x0008HXX>的值,那是列表的開始嗎? – douglasrcjames

+0

對不起復制粘貼錯誤,應該比較'NodeList'的len()'而不是... – har07

+0

即使使用len()編輯 – douglasrcjames