Scrapy - 如何防止帶空白元素的輸出行？

使用非常基本的Scrapy腳本，我想確保我的輸出行中沒有包含空白項目。Scrapy - 如何防止帶空白元素的輸出行？

也就是說，說我有標準

items = [] 
    for list in lists: 
     item = TypeItem() 
     item['thing1'] = list.select('h1/text()').extract() 
     item['thing2'] = list.select('h2/text()').extract() 
     item['thing3'] = list.select('h3/text()').extract() 
     items.append(item) 
    return(items)

我想，以防止任何CSV行，上面寫着「thing1，thing3」或「thing2」等。

（我是新來的StackOverflow，所以我不知道這是否是合適的時間問多個問題，但因爲他們是相關的，如果我能：

Q2：如果我把在在item.append（項目）之前檢查「如果項目不在項目中」，它會停止任何重複的完整行，還是隻複製單個項目？如果後者，我如何防止重複行？）

來源

2013-10-20 Xodarap777

對於您的Q2，我認爲它不會停止重複，因爲它們是對象（類的實例），而且都是不同的。你應該繼承它並實現__eq__()。

您可以在使用csv解析器檢索所有元素後實現該目標，對不對？

此外，您可以在xpath結果保存到一個變量，並檢查它是否是空白，如：

thing1 = list.select('h1/text()').extract()[0] 
if thing1.strip(): 
    ...

此外，您還可以使用額外的xpath表達檢查關你的文本將是空白，如：

items = [] 
for list in lists: 
    if list.select('.[h1[text()] and h2[text()] and h3[text()]]'): 
     item = TypeItem() 
     item['thing1'] = list.select('h1/text()').extract() 
     item['thing2'] = list.select('h2/text()').extract() 
     item['thing3'] = list.select('h3/text()').extract() 
     items.append(item) 
return(items)

來源

2013-10-20 13:32:50 Birei

Scrapy - 如何防止帶空白元素的輸出行？

回答

相關問題