網絡與xpathSApply（R）刮痧 - 只有無類文本

我試圖提取結構如下文字：網絡與xpathSApply（R）刮痧 - 只有無類文本

<p class="id1"> Title or something </p>  
<p> Text text text </p> 
<p> More text </p> 
<p class="id2"> Something else </p>

當我使用：

text_info <- xpathSApply(PARSED, "//p", xmlValue)

結果是：

[1] 'Title or something' 
[2] 'Text text text' 
[3] 'More text' 
[4] 'Something else'

我只想裏面<p>文無類：

[1] 'Text text text' 
[2] 'More text'

我用下面的代碼，但它需要很長的時間，我有很多文本：

text_info <- setdiff(xpathSApply(PARSED, "//p", xmlValue), xpathSApply(PARSED, "//p[@class]", xmlValue))

有沒有辦法只提取那些誰使用只是一個xpathSApply沒課？

來源

2016-11-17 Aleharu

您可以在XPath中使用not()。

xpathSApply(doc, "//p[not(@class)]", xmlValue, trim = TRUE) 
# [1] "Text text text" "More text"

此選擇的元素而不類屬性。

數據：

library(XML) 
doc <- htmlParse('<p class="id1"> Title or something </p>  
<p> Text text text </p> 
<p> More text </p> 
<p class="id2"> Something else </p>')

來源

2016-11-17 20:29:06

非常_classy_回答豐富;-)我相信它會成爲一個_class_ IC – hrbrmstr

網絡與xpathSApply（R）刮痧 - 只有無類文本

回答

相關問題