我有一長串字符串,它們共享子字符串。該列表來自事件流數據,因此有成千上萬行,但我會簡化這個例子;寵物:使用子字符串R查找字符串R
+--------------------------------+
| Pets |
+--------------------------------+
| "one calico cat that's smart" |
| "German Shepard dog" |
| "A Chameleon that is a Lizard" |
| "a cute tabby cat" |
| "the fish guppy" |
| "Lizard Gecko" |
| "German Shepard dog" |
| "Budgie Bird" |
| "Canary Bird in a coal mine" |
| "a chihuahua dog" |
+--------------------------------+
dput output: structure(list(Pets = structure(c(8L, 6L, 1L, 3L, 9L, 7L, 6L, 4L, 5L, 2L),.Label = c("A Chameleon that is a Lizard", "a chihuahua dog", "a cute tabby cat", "Budgie Bird", "Canary Bird in a coal mine", "German Shepard dog", "Lizard Gecko", "one calico cat that's smart", "the fish guppy"), class = "factor")), .Names = "Pets", row.names = c(NA, -10L), class = "data.frame")
我想基礎上,通用型寵物(狗,貓等)添加信息,我有保留此信息一鍵表:
+----------+----------------+
| key | classification |
+----------+----------------+
| "dog" | "canine" |
| "cat" | "feline" |
| "lizard" | "reptile" |
| "bird" | "avian" |
| "fish" | "fish" |
+----------+----------------+
dput output: structure(list(key = structure(c(3L, 2L, 5L, 1L, 4L), .Label = c("bird", "cat", "dog", "fish", "lizard"), class = "factor"), classification = structure(c(2L, 3L, 5L, 1L, 4L), .Label = c("avian", "canine", "feline", "fish", "reptile"), class = "factor")), .Names = c("key", "classification"), row.names = c(NA, -5L), class = "data.frame")
怎麼辦我使用Pets
表中的「長字符串」在密鑰表中查找相關的classification
?問題是,我的查找字符串包含在密鑰表中找到的子字符串。
我用grepl這樣開始:
key[grepl(pets[1,1], key[ , 2]), ]
但是,這是行不通的,因爲「三色貓」是不是在鍵列表,雖然「貓」是。我正在尋找的結果將是「feline
」。 (注意:我不能簡單地切換事物,因爲在我自己的代碼中,它位於一個apply函數中,並且循環遍歷數據中的每一行。所以,而不是pets[1,1]
它是pets[n,1]
最後我打算cbind
對事件流數據的結果做進一步分析。)
我在繞包裝如何做到這一點時遇到了麻煩。有什麼建議?
看來,鍵總是每個「長字符串」的第二個字。這是一個合理的假設嗎? – useR
不幸的是,沒有。字符串有幾個到幾個不同的單詞。我只知道「關鍵」字在那裏。 – JoeM05
然後你應該提供一個不符合這個假設的長字符串。此外,請提供您的數據集,並將'dput(my_data)'的輸出複製並粘貼到您的問題中,而不是您目前如何格式化它的數據集 – useR