查找，使用R

我編輯一些文字，不知道我是否可以編程方式搜索特定的單詞的組合。查找，使用R

這些話：差不多，幾乎，相當，接近和非常不這話未來工作：一定的，完整的，死的，全緣，必不可少的滅絕。

比方說我有這樣的特徵向量：

text <- c("R is a very essential tool for data analysis. While it is regarded as domain specific, it is a very complete programming language. Almost certainly, many people who would benefit from using R, do not use it")

我能得到R返回一個數字矢量，給人行號（或句子號），這些話是放在旁邊對方？

請注意，我用「肯定」，所以最好我需要R鍵搜索包含「一定」或其他單詞，而不是整個單詞「一定」或其他的話。

來源

2013-04-29 luciano

雞蛋裏挑骨頭星期一：刪除最後一個逗號。 :-) – 2013-04-29 13:14:54

用於該用途grep，在句子的界限區分你的文本之後使用strsplit：

stext <- strsplit(text, split="\\.")[[1]] 
grep("certain", stext) 
[1] 3

來源

2013-04-29 11:40:45 Andrie

是不是這樣的問題是，如何找到包含單詞的某種組合的句子？我想，這使得它更有點棘手。不幸的是我沒有解決辦法。 – 2013-04-29 13:24:26

@DanielFischer，在這種情況下，它確實需要一些grep的福，或者也可以是等同於'grep的「某些」 | grep'words'| grep'combined'' – 2013-04-29 15:35:58

此代碼做到了：'combo < - list（「very essential」，「very complete」，「幾乎可以肯定」） stext < - strsplit（text，split =「\\ 「）[[1]] 庫（plyr） laply（combo，function（x）grep（x，stext））' – luciano 2013-04-29 21:16:13

Andrie的解決方案是非常適合您的需求。然而，我爲那些未來搜索第二個解決方案尋找解析成績單更好。

library(qdap) 
stext <- c("R is a very essential tool for data analysis. While it is regarded 
    as domain specific, it is a very complete programming language. Almost 
    certainly, many people who would benefit from using R, do not use it.") 

dat <- sentSplit(data.frame(dialogue=stext), "dialogue") 
with(dat, termco(dialogue, tot, "certain")) 

## tot word.count certain 
## 1 1.1   9  0 
## 2 2.2   14  0 
## 3 3.3   14 1(7.14%)

請注意，標點符號很重要，我需要在最後一句話的缺失期間添加。

要獲得的載體中的句子含有「一定」：

which(with(dat, termco(dialogue, tot, "certain"))$raw$certain > 0) 
## [1] 3

來源

2013-04-29 13:22:46

查找，使用R

回答

相關問題