分裂長字符串成較小的字符串

我有一個數據幀，其包括數字像這樣的柱：分裂長字符串成較小的字符串

360010001001002 
360010001001004 
360010001001005 
360010001001006

我想打入的2位，3位，5位塊，1位，4位：

36 001 00010 0 1002 
36 001 00010 0 1004 
36 001 00010 0 1005 
36 001 00010 0 1006

這看起來應該是簡單，但我在看strsplit文檔，我無法理清我怎麼會由長度做到這一點。

來源

2013-05-07 Amanda

是您的主要意圖一）**轉換子串長度向量對的索引**或 b）**分裂爲df列，並有效地做到這一點**：將塊分割爲新的單獨的d.f.列（ - > ddply（transform，...）），或者只是對同一列進行一些字符串操作（例如插入' - '）？（ - > ldply） – smci 2014-03-09 15:47:35

我的問題很久就解決了，但是因爲您提出了......是的：我想將這些塊作爲單獨的列。他們是一個身份證號碼。我必須回去仔細看看，但這些塊有意義：'36'是州，'001'縣，'00010'人口普查區塊或其他東西。 – Amanda 2014-03-13 12:40:43

正確，但是我的問題是a）是否指定寬度= c（2,3,5,1,4）的任意向量而不是簡單的舊索引對：（1 ，2），（3,5），（6,10），（11,11），（12,15）。幾位回答者掛斷了這個累積指數算法是否是你問題的關鍵部分。事實證明，事實並非如此。爲了清晰起見，你可以重新說明 – smci 2014-03-13 12:48:01

假設這樣的數據：

x <- c("360010001001002", "360010001001004", "360010001001005", "360010001001006")

試這個：

read.fwf(textConnection(x), widths = c(2, 3, 5, 1, 4))

如果x是數字，則在本聲明中將x替換爲as.character(x)。

來源

2013-05-08 01:05:27

+1 - 非常整潔！我會記住這一點。 – Arun 2013-05-08 07:26:48

我卷繞這樣做：'FOO $ county_id < - as.vector（GSUB（FOO $ fullfipsid，圖案= 「..（...）*」，替換= 「\\ 1」））'對於每個大塊。工作。但我接受這個答案B/C它是優雅的，也有效。（我測試過） – Amanda 2013-05-09 19:23:01

可以使用substring（假設字符串的長度/數是固定的）：

xx <- c(360010001001002, 360010001001004, 360010001001005, 360010001001006) 
out <- do.call(rbind, lapply(xx, function(x) as.numeric(substring(x, 
        c(1,3,6,11,12), c(2,5,10,11,15))))) 
out <- as.data.frame(out)

來源

2013-05-07 22:14:53 Arun

'ddply（mutate ...）'似乎比'do.call（rbind，...）'更優雅？請參閱下面的答案。用於積累索引的' – smci 2014-03-09 15:40:07

'和'cumsum（）' – smci 2014-03-09 15:49:27

甲功能版本：

split.fixed.len <- function(x, lengths) { 
    cum.len <- c(0, cumsum(lengths)) 
    start <- head(cum.len, -1) + 1 
    stop <- tail(cum.len, -1) 
    mapply(substring, list(x), start, stop) 
}  

a <- c(360010001001002, 
     360010001001004, 
     360010001001005, 
     360010001001006) 

split.fixed.len(a, c(2, 3, 5, 1, 4)) 
#  [,1] [,2] [,3] [,4] [,5] 
# [1,] "36" "001" "00010" "0" "1002" 
# [2,] "36" "001" "00010" "0" "1004" 
# [3,] "36" "001" "00010" "0" "1005" 
# [4,] "36" "001" "00010" "0" "1006"

來源

2013-05-07 22:32:54 flodel

+1 - 在這裏可以很好地使用（像往常一樣）！ :) – Arun 2013-05-08 07:27:05

（相對於Python的哇，這個任務是非常笨拙和痛苦的。安美居...）

PS我現在看到你的主要目的是爲了轉化子的向量長度爲對指數的。你可以使用cumsum()，則指數一起排序：

ll <- c(2,3,5,1,4) 
sort(c(1, cumsum(ll), (cumsum(ll)+1)[1:(length(ll)-1)])) 
# now extract these as pairs.

但是它是相當痛苦的。 flodel對此的回答比較好。

至於分裂成d.f.的實際任務，列，並且能夠有效地做：

stringr::str_sub()與plyr::ddply()/ldply

require(plyr) 
require(stringr) 

df <- data.frame(value=c(360010001001002,360010001001004,360010001001005,360010001001006)) 
df$valc = as.character(df$value) 

df <- ddply(df, .(value), mutate, chk1=str_sub(valc,1,2), chk3=str_sub(valc,3,5), chk6=str_sub(valc,6,10), chk11=str_sub(valc,11,11), chk14=str_sub(valc,12,15)) 

#    value   valc chk1 chk3 chk6 chk11 chk14 
# 1 360010001001002 360010001001002 36 001 00010  0 1002 
# 2 360010001001004 360010001001004 36 001 00010  0 1004 
# 3 360010001001005 360010001001005 36 001 00010  0 1005 
# 4 360010001001006 360010001001006 36 001 00010  0 1006

來源

2014-03-09 15:18:50 smci

您可以使用此功能從stringi包優雅地結合

splitpoints <- cumsum(c(2, 3, 5, 1,4)) 
stri_sub("360010001001002",c(1,splitpoints[-length(splitpoints)]+1),splitpoints)

來源

2014-03-13 11:43:53 bartektartanus

分裂長字符串成較小的字符串

回答

相關問題