嘗試使用|創建字符串的值或運營商

我想刮一個網站的鏈接。到目前爲止，我下載了文本並將其設置爲數據框。我有下面的問題。嘗試使用|創建字符串的值或運營商

keywords <- c(credit | model) 

text_df <- as.data.frame.table(text_df) 
text_df %>% 
    filter(str_detect(text, keywords))

其中信貸和模型兩個值我想搜索的網站，即回報排在字信用卡或模型。

我收到以下錯誤filter_impl

錯誤（.data，dots）：找不到對象「信用」

該代碼只返回單詞「model」中的結果並忽略單詞「credit」。

我怎麼能去返回所有結果與「信用」或「模式」要麼字。

我的計劃是在提前keywords <- c(credit | model | more_key_words | something_else | many values)

感謝。

編輯：

text_df: 
    Var 1 text 
    1  Here is some credit information 
    2  Some text which does not expalin any keywords but messy <li> text9182edj </i> 
    3  This line may contain the keyword model 
    4  another line which contains nothing of use

所以我想只提取行1和3

來源

2017-10-05 user113156

現在不能檢查，但'過濾器_（）'應該工作 – MikolajM

尋求幫助時，你應該提供一個[可重現的例子]（https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example）與樣品輸入和所需的輸出。一般來說，您需要在data.frames中搜索數值的特定列，而不是整行，因此最好在這裏更具體。 – MrFlick

如果有幫助，我已經創建了一個簡化的示例。 – user113156

好，我檢查了它，我認爲它不會工作，你的方式，你必須使用或|運營商內部filter()內部沒有str_detect()

所以它的工作是這樣的：

keywords <- c("virg", "tos") 

library(dplyr) 
library(stringr) 

iris %>% 
     filter(str_detect(Species, keywords[1]) | str_detect(Species, keywords[2]))

爲keywords[1]等，你必須指定每個從這個變量

來源

2017-10-05 19:59:31 MikolajM

我認爲'iris％>％filter（str_detect（Species，paste（keywords，collapse =「|」）））'會達到同樣的效果。 – markdly

感謝您的回覆，我運行這個版本替換名稱以符合我的數據集的名稱，它給出了相當不錯的結果，它需要更多的工作，指定關鍵字[3]，關鍵字[4]，關鍵字[x]等，但它的作品。再次感謝！ – user113156

「關鍵字」我認爲這個問題是你需要將一個字符串作爲參數傳遞給str_detect。要檢查「信用」或「模型」，可以將它們粘貼到由|分隔的單個字符串中。

library(tidyverse) 
library(stringr) 
text_df <- read_table("Var 1 text 
1  Here is some credit information 
2  Some text which does not expalin any keywords but messy <li> text9182edj </i> 
3  This line may contain the keyword model 
4  another line which contains nothing of use") 


keywords <- c("credit", "model") 
any_word <- paste(keywords, collapse = "|") 
text_df %>% filter(str_detect(text, any_word)) 
#> # A tibble: 2 x 3 
#>  Var `1`         text 
#> <int> <chr>         <chr> 
#> 1  1    Here is some credit information 
#> 2  3  This line may contain the keyword model

來源

2017-10-05 20:24:32 markdly

感謝您的回覆！我在我的文本文件上運行了你的代碼，它運行起來了，但是我所擁有的文本文件比我在這裏放置的文件要混亂得多，所以我得到了正確的結果，但輸出中還有一些額外的噪音。（抱歉，這是我的錯！）但它仍然有效。 – user113156

@ user113156，我不完全確定輸出中額外噪聲的含義。你可能會更嚴格的搜索。例如'any_word < - paste0（「\\ b（?:」，粘貼（關鍵字，摺疊=「|」），「）\\ b」）'我認爲只有當關鍵字是獨立詞時才匹配。 – markdly

它給了我正確的行中有關鍵字，但我的意思是額外的噪音是它也給了我額外的HTML輸出（在不同的行），沒有所需的關鍵字，我不明白爲什麼.. 。 – user113156

我會建議從正則避而遠之，當你處理的話。有些包可以根據您的特定任務量身定製，您可以使用它們。例如，請嘗試以下內容：

library(corpus) 
text <- readLines("http://norvig.com/big.txt") # sherlock holmes 
terms <- c("watson", "sherlock holmes", "elementary") 
text_locate(text, terms) 
## text   before    instance    after    
## 1 1 …Book of The Adventures of Sherlock Holmes        
## 2 27  Title: The Adventures of Sherlock Holmes        
## 3 40 … EBOOK, THE ADVENTURES OF SHERLOCK HOLMES ***       
## 4 50        SHERLOCK HOLMES        
## 5 77       To Sherlock Holmes she is always the woman. I… 
## 6 85 …," he remarked. "I think,  Watson  , that you have put on seve… 
## 7 89 …t a trifle more, I fancy,  Watson  . And in practice again, I … 
## 8 145 …ere's money in this case,  Watson  , if there is nothing else.… 
## 9 163 …friend and colleague, Dr.  Watson  , who is occasionally good … 
## 10 315 … for you. And good-night,  Watson  ," he added, as the wheels … 
## 11 352 …s quite too good to lose,  Watson  . I was just balancing whet… 
## 12 422 …as I had pictured it from Sherlock Holmes ' succinct description, but… 
## 13 504   "Good-night, Mister Sherlock Holmes ."       
## 14 515 …t it!" he cried, grasping Sherlock Holmes by either shoulder and loo… 
## 15 553      "Mr. Sherlock Holmes , I believe?" said she.  
## 16 559      "What!" Sherlock Holmes staggered back, white with… 
## 17 565 …tter was superscribed to " Sherlock Holmes , Esq. To be left till call… 
## 18 567    "MY DEAR MR. SHERLOCK HOLMES ,--You really did it very w… 
## 19 569 …est to the celebrated Mr. Sherlock Holmes . Then I, rather imprudentl… 
## 20 571 …s; and I remain, dear Mr. Sherlock Holmes ,       
## ⋮ (189 rows total)

請注意，無論大小寫如何，這都與該術語匹配。

爲了您的具體使用情況，做

ix <- text_detect(text, terms)

或

matches <- text_subset(text, terms)

來源

2017-10-06 18:18:07

嘗試使用|創建字符串的值或運營商

回答

相關問題