選擇和修改的XPath節點

我使用此代碼來獲取所有的名字：

def parse_authors(self, root): 
    author_nodes = root.xpath('//a[@class="booklink"][contains(@href,"/author/")]/text()') 
    if author_nodes: 
     return [unicode(author) for author in author_nodes]

但我想，如果有任何翻譯添加「（譯）」旁邊的名稱：

example:translator1(translation)

來源

2016-12-28 wrangly

您可以使用translation:文本節點從譯者區分作者 - 作者是「翻譯：」文本節點的前面兄弟姐妹，翻譯者 - 跟隨兄弟姐妹。

著者：

//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()

翻譯：

//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()

工作示例代碼：

from lxml.html import fromstring 

data = """ 
<td> 
    <a class="booklink" href="/author/43710/Author 1">Author 1</a> 
    , 
    <a class="booklink" href="/author/46907/Author 2">Author 2</a> 
    <br> 
    translation: 
    <a class="booklink" href="/author/47669/translator 1">Translator 1</a> 
    , 
    <a class="booklink" href="/author/9382/translator 2">Translator 2</a> 
</td>""" 

root = fromstring(data) 

authors = root.xpath("//text()[contains(., 'translation:')]/preceding-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()") 
translators = root.xpath("//text()[contains(., 'translation:')]/following-sibling::a[@class='booklink' and contains(@href, '/author/')]/text()") 

print(authors) 
print(translators)

打印：

來源

2016-12-28 23:52:04 alecxe

選擇和修改的XPath節點

回答

相關問題