2017-09-26 67 views
1

我想將相同的連續觀察值合併到摺疊的字符串中。一個簡單的例子如下所示:合併/摺疊向量中的相同連續元素

a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T") 
[1] "H" "H" "H" "N" "T" "N" "T" "H" "N" "T" "T" 

b <- c("HHH", "N", "T", "N", "T", "H", "N", "TT") 
[1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

c <- c("HHH", "HHH", "N", "T", "N", "T", "H", "N", "TT", "TT") 
[1] "HHH" "HHH" "N" "T" "N" "T" "H" "N" "TT" "TT" 

在這裏,我要創造條件,採取矢量a,並把它變成兩種載體bc功能。例如,由於前三個觀察值都是H's,它們一起將變爲HHH。與兩個T變成TT一樣。請注意,我想保留整體順序,並且給定元素以連續方式出現的次數不限於三次。因此,例如,可能有連續10個A,應該轉換爲單個AAAAAAAAAA

我試圖從for循環開始一步一步地建立起來,但由於在連續發生中重複次數不限制的問題,所以無法進一步獲得進一步的結果。我也嘗試過使用基地rle函數。但

rle(a) 

給出類似

Run Length Encoding 
    lengths: int [1:8] 3 1 1 1 1 1 1 2 
    values : chr [1:8] "H" "N" "T" "N" "T" "H" "N" "T" 

凡十元變爲8,而不是記錄連續出現的位置。

回答

1
with(rle(a), sapply(1:length(values), function(i) 
    paste(rep(values[i], lengths[i]), collapse = ""))) 
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

OR

sapply(split(a, cumsum(c(TRUE, a[-1] != head(a, -1)))), paste, collapse = "") 
# 1  2  3  4  5  6  7  8 
#"HHH" "N" "T" "N" "T" "H" "N" "TT" 
+1

哇 - 這是快!非常感謝! –

0

我們可以使用rleiddata.table

library(data.table) 
unname(tapply(a, rleid(a), FUN = paste, collapse="")) 
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

或用base Rrletapply

with(rle(a), unname(tapply(a, rep(seq_along(values), lengths), FUN = paste, collapse=""))) 
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

還是一個base R選擇是使用正則表達式lookarounds

strsplit(paste(a, collapse=""), "(?<=(.))(?!\\1)", perl = TRUE)[[1]] 
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 
-1

除了已經給出解決方案,重複字符之間,分裂是paste在一起,我很感興趣,並不依賴於一個通用算法任何語言特性。

你說你試過了,但我沒有看到作爲真正問題的無限次數的重複。我寫的基本上是迭代原始數組並克隆它。如果原始數組的值與最後一個數組的值相同,而不是將其作爲新項添加到新數組,則將其連接到「clone」數組的最後一個值中。

算法:

Create empty array(w) 
Iterate by index(i) of the original vector(v) 
    If this is the first entry 
     w[1] = v[1] 
    Else 
     If v[i] is the same as v[i-1] 
     Last entry in w is concatenated with v[i] 
     Else 
     Add v[i] to the end of w 

在Python:

def collapseVector(v): 
    w = []; 
    for i in range(len(v)): 
     if i == 0: 
      w.append(v[i]); 
     else: 
      if v[i] == v[i-1]: 
       w[len(w)-1] = w[len(w)-1] + v[i]; 
      else: 
       w.append(v[i]); 
    return w 
0

您可以使用gregexprregmatches

a <- c("H", "H", "H", "N", "T", "N", "T", "H", "N", "T", "T") 

# collapse string 
b <- paste(a, collapse = "") 

# extract instances of repeated characters 
regmatches(b, gregexpr("(.)\\1*", b))[[1]] 
# [1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

stringi等效可能是:

library(stringi) 
stri_extract_all_regex(b, "(.)\\1*")[[1]] 
# [1] "HHH" "N" "T" "N" "T" "H" "N" "TT" 

而且ore包好措施:

library(ore) 
matches(ore.search("(.)\\1*", b, all = TRUE)) 
#[1] "HHH" "N" "T" "N" "T" "H" "N" "TT"