越過NoneType在BeautifulSoup

屬性我使用beautifulstonesoup和python解析來自谷歌的XML飼料，它的偉大工程。我也在創建一個csv並將其上傳到Google Docs，這也很好。問題是當我在xml中遇到空的文本屬性時，解析器停止。現在不是問題，因爲所有的屬性都有數據，但是第一次沒有，它會中斷。越過NoneType在BeautifulSoup

代碼：

import atom 
import gdata.auth 
import gdata.contacts 
import gdata.contacts.client 
import gdata.docs.service 
import gdata.docs.data 
from BeautifulSoup import BeautifulStoneSoup as Soup 
import csv 

email = '[email protected]' 
password = 'password' 
domain = 'domain.com' 

ms_client = gdata.docs.service.DocsService() 
gd_client = gdata.contacts.client.ContactsClient(domain=domain) 
gd_client.ClientLogin(email, password, 'profileFeedAPI') 
ms_client.ClientLogin(email, password, 'peopleCSVupload') 

profiles_feed = gd_client.GetProfilesFeed('https://www.google.com/m8/feeds/profiles/domain/domain.com/full?max-results=300') 

soup = Soup(str(profiles_feed), selfClosingTags=['ns0:category','ns3:status', 'ns0:link','ns1:email']) 

a = soup.findAll('ns0:entry') 
f = open('C:\\people.csv', 'wb') 

writer = csv.writer(f, quoting=csv.QUOTE_NONE, escapechar =' ') 

for entry in a: 
    writer.writerow([entry.find('ns1:familyname').text + ',' + entry.find('ns1:givenname').text + ',' + entry.find('ns1:fullname').text + ',' + entry.find('ns1:orgtitle').text + ',' + entry.find('ns1:orgdepartment').text + ',' + entry.find('ns1:orgname').text + ',' + entry.find('ns1:email',primary=True)['address']]) 

f.close() 

ms = gdata.data.MediaSource(file_path="C:\\people.csv", content_type=gdata.docs.service.SUPPORTED_FILETYPES['CSV']) 
csv_entry = ms_client.Upload(ms, "People File")

我知道我可以做到這一點：

for entry in a: 
    if entry.find('ns1:orgtitle') != None: 
     print entry.find('ns1:orgtitle').text 
    elif entry.find('ns1:orgtitle') == None: 
     print('') 
    if entry.find('ns1:familyname') != None: 
     print entry.find('ns1:familyname').text 
    elif entry.find('ns1:familyname') == None: 
     print('') 
     etc...

但它是非常長的，我不知道如何集中的數據出現在一排。任何幫助，非常感謝。

來源

2012-02-08 Kevin

你可以包裝找到這樣：

def findnonempty(entry, arg): 
    result = entry.find(arg): 
    if result: 
     return result.text 
    else: 
     return ""

的你可以做7調用一個後對方也可以使用地圖（），像

tags = ['ns1:familyname', 'ns1:givenname', ... ] # your tags 
s = map(lambda tag: findnonempty(entry, tag), tags) 
"".join(s)

來源

2012-02-08 20:58:16

拜爾抱歉在這個看着這麼晚了，我是去度週末。這是一個很棒的解決方案，但我只有一個問題。最後一項，電子郵件地址位於屬性中，而不是文本字段。我將如何能夠提取這些數據？ – Kevin 2012-02-13 16:19:50

沒關係，我剛宣佈被entry.find一個變量t（「NS1：電子郵件」，初級= TRUE）[「地址」]，並沒有在s.append（t）中用於在一個塊條目，再次感謝這個。 Kevin，歡迎來到 – Kevin 2012-02-13 16:39:54

。 – 2012-02-13 19:47:03

這是很容易將價值獲取和打印功能封裝起來。

def find(entry, spec, default=None): 
    value = entry.find(spec) 
    return default if value is None else value.text 

def findandprint(entry, spec, default=None, newline=True): 
    value = find(entry, spec, default) 
    if value is not None: # if we still don't have a value even after 
     print value,   # considering default, don't print anything 
     if newline: 
      print

那麼你可以：

for entry in a: 
    findandprint(entry, 'ns1:orgtitle', default="") 
    findandprint(entry, 'ns1:familyname', default="")

如果你有很多的屬性，並希望處理它們都是一樣的，然後遍歷這些呢：

for entry in a: 
    for attribute in ('ns1:orgtitle', 'ns1:familyname', ...): 
     findandprint(entry, attribute, default="")

來源

2012-02-08 21:39:52 kindall

在首先，我不明白爲什麼你會認爲它會破壞......你沒有「侵犯」數據片段。 BeautifulSoup很樂意返回空串。

在後，你的「必須那邊看看它滾動」行，它終於清楚，你是（因爲你是在你的介紹說）尋找一個屬性。

entry.find('ns1:email',primary=True)['address']

空屬性不會像空白文本節點那樣靜靜地返回（例如entry.find('ns1:familyname').text）。

不要害怕，剛剛替補的['address']符號與.get('address','')，如果空它會返回一個空字符串，而不是拋出一個KeyError異常

來源

2012-02-09 00:02:44

的XML飼料不會產生沒有地址屬性存在的電子郵件條目。因爲我解析爲具有NoneType值的字段當它發生像ORGNAME文本字段爲空。 – Kevin 2012-02-13 14:49:07

越過NoneType在BeautifulSoup

回答

相關問題