2015-02-23 68 views
1

我有一個字符列表。我有一個字符列表。我想在列表中的每個元素使用grep的精確匹配R

mylist <- list(c("apple", "banana", "cat", "dog", "elephant", "fish"), 
       c("apple", "banana", "camel", "doll", "egg"), 
       c("apple", "bag", "cat", "donkey", "elephant", "frog", "gun"), 
       c("apple", "ball", "cage", "dolphin", "doggy", "fishy"), 
       c("apple", "baggy", "catty", "doggy", "eggie", "gun_powder")) 

我想我的清單與其他元素使用grep的R.功能,但我所得到的是部分匹配每一個元件均精確匹配。

這是我寫

matched <- vector("list", length(mylist)) 
    for(i in 1:length(mylist)) 
    { 
    index <- NULL 
    indexx <- vector("list", length(mylist[[i]])) 
    for(j in 1:length(mylist[[i]])) 
    { 
     dummy <- NULL 
     for(k in 1:length(mylist)) 
     { 
     c <- grep(mylist[[i]][j], mylist[[k]], value = TRUE, fixed = TRUE) 
     ind <- c(dummy, c) 
     dummy <- ind 
     } 
     indexx[[j]] <- ind 
    } 
    matched[[i]] <- indexx 
    } 

請幫助我的代碼。

+1

什麼是期望的輸出? – A5C1D2H2I1M1N2O1R2T1 2015-02-23 07:11:21

回答

2

不公開您的列表

ulist = unlist(mylist) 

對於ulist每個元素,發現在所有的ulist精確匹配。使用等效==而不是grep()來執行此操作,並「比較」向量。

matches0 = lapply(ulist, function(elt) ulist[ulist == elt]) 

最後,再列表中匹配到原來的形狀

relist(matches0, mylist) 

我覺得奇怪,總結這樣的結果;也許不是計數的次數每個單詞出現

tbl = table(ulist) 

,並使用這些算作項

relist(tbl[ulist], mylist) 

一些整理是消除由table()返回dimname的名稱,

names(dimnames(tbl)) = NULL 
+0

非常感謝 – Rajan 2015-02-23 09:41:44

+0

我有一個像這樣的大列表:「Gautam Gambhir_India_KKR」,「Robin Uthappa_India_KKR」,「Manish Pandey_India_KKR」,「John_Hastings_Australia _CSK」,「Mahendra Singh Dhoni_India_CSK」。我想將所有國家名稱和團隊名稱(KKR/CSK)替換爲「」。你可以幫我嗎? – Rajan 2015-02-23 13:46:04

0

如果我理解正確,你想實現什麼:

mylist <- list(c("apple", "banana", "cat", "dog", "elephant", "fish"), 
      c("apple", "banana", "camel", "doll", "egg"), 
      c("apple", "bag", "cat", "donkey", "elephant", "frog", "gun"), 
      c("apple", "ball", "cage", "dolphin", "doggy", "fishy"), 
      c("apple", "baggy", "catty", "doggy", "eggie", "gun_powder")) 

    ulist <- unique(unlist(mylist)) 
    matched <- vector("list", length(ulist)) 
    names(matched) <- ulist 

    ### Counting every fruit 
    countList = function(ls, container) { 
     sapply(ls, function(elem) { 
        isEmpty = is.null(container[[elem]]) 
        container[[elem]] <<- ifelse(isEmpty, 1, container[[elem]] + 1) 
       }) 
     container 
    } 
    counted = countList(unlist(mylist), matched) 
    lapply(names(counted), function(lab) rep(lab, counted[[lab]])) 

輸出看起來像這樣

[[1]] 
[1] "apple" "apple" "apple" "apple" "apple" 

[[2]] 
[1] "banana" "banana" 

[[3]] 
[1] "cat" "cat" 

[[4]] 
[1] "dog" 

[[5]] 
[1] "elephant" "elephant" 

[[6]] 
[1] "fish" 

[[7]] 
[1] "camel" 

[[8]] 
[1] "doll" 

[[9]] 
[1] "egg" 

[[10]] 
[1] "bag" 

[[11]] 
[1] "donkey" 

[[12]] 
[1] "frog" 

[[13]] 
[1] "gun" 

[[14]] 
[1] "ball" 

[[15]] 
[1] "cage" 

[[16]] 
[1] "dolphin" 

[[17]] 
[1] "doggy" "doggy" 

[[18]] 
[1] "fishy" 

[[19]] 
[1] "baggy" 

[[20]] 
[1] "catty" 

[[21]] 
[1] "eggie" 

[[22]] 
[1] "gun_powder" 
+0

非常感謝你 – Rajan 2015-02-23 10:22:14

0

你應該閱讀有關正則表達式like this 教程他們是不容易的,但他們是,如果你用繩子合作是非常有用的。這裏有一個與regexp

matched <- vector("list", length(mylist)) 
    for(i in 1:length(mylist)) 
    { 
    index <- NULL 
    indexx <- vector("list", length(mylist[[i]])) 
    for(j in 1:length(mylist[[i]])) 
    { 
     dummy <- NULL 
     for(k in 1:length(mylist)) 
     { 
     c <- grep(paste("^",mylist[[i]][j],"$",sep=""),mylist[[k]],perl = TRUE, value = TRUE) 
     ind <- c(dummy, c) 
     dummy <- ind 
     } 
     indexx[[j]] <- ind 
    } 
    matched[[i]] <- indexx 
    } 

你的代碼^辛博爾表示字符串的開始和$表示結束。所以它會得到完全匹配。

+0

非常感謝你的回覆,我得到了我犯錯的地方 – Rajan 2015-02-23 09:42:32

+0

不客氣 – dax90 2015-02-23 09:57:33

+0

嗨,我想刪除一些經常出現在列表中的文字。例如:這是我的名單「Suresh_Raina_India_CSK」,「Mithun Manhas_India_CSK」,「Faf du Plessis_South_Africa_CSK」的一部分,我想刪除國家名稱和球隊名稱(CSK),並且只需要球員姓名。 – Rajan 2015-02-23 10:26:26

相關問題