2017-02-15 79 views
-1

我在一個數據框中有一列,我使用colsplit將其分成三個單獨的列。R扁平列表列

df <- transform(df, concatenation = colsplit(concatenation, pattern="->-", 
names = c('att1', 'att2','att3', 'att4'))) 

OR

df$concatenation <- colsplit(concatenation, pattern="->-", 
names = c('att1', 'att2','att3', 'att4'))) 

concatenation 
a->-a->-b->-c 
b->-a->-b->-d 
3->-a->-x->-c 
2->-a->-y->-8 

現在我有以下幾列,concatenation.att1,concatenation.att2等

concatenation.att1 concatenation.att2 concatenation.att3 concatenation.att4 
a     a     b     c 
b     a     b     d 
3     a     x     c 
2     a     y     8 

當試圖導出這個數據幀到CSV我得到的以下錯誤:

Error in ncol(xj) : object 'xj' not found 

OR

Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) : 
    missing value where TRUE/FALSE needed 

從研究我已經推斷,這是從我的嵌套列,但是我找不到出口到CSV一個簡單的方法來拉平數據框(如下)。

att1 att2 att3 att4 
a a b c 
b a b d 
3 a x c 
2 a y 8 

目前我重新分配數據到合適的水平,並刪除堆疊列,但我相信有一個更好的方式來做到這一點。

df$att1 <- df$concatenation$att1 
df$att2 <- df$concatenation$att2 
df$att3 <- df$concatenation$att3 
df$att4 <- df$concatenation$att4 

df$concatenation <- NULL 

下面是一個可重複的例子:

#read in table 
df <- read.table(textConnection(
    "concatenation  Value 
AFG->-Afghanistan->-1950->-True 20,249 
    AFG->-Afghanistan->-1951->-True 21,352 
    AFG->-Afghanistan->-1952->-True 22,532 
    AFG->-Afghanistan->-1953->-True 23,557 
    AFG->-Afghanistan->-1954->-True 24,555 
    ALB->-Albania->-1950->-True 8,097 
    ALB->-Albania->-1951->-True 8,986"), header=TRUE) 

#Split concatenation var 
df <- transform(df, concatenation = colsplit(concatenation, pattern="->-", 
              names = c('att1', 'att2','att3', 'att4'))) 
#write to csv 
write.csv(df, "myfile.csv") 
+1

*我在一個數據框中有一列,我使用colsplit將它分成三個單獨的列* ......很高興看到列值。 *我無法找到一個簡單的方法來平整數據幀* ...這將是很好的看到所需的輸出。 – Parfait

+0

我已經在表中添加了預期的輸出。希望這可以讓它更清晰 – sdhaus

回答

1

貌似tidyr::separate將做到這一點。

nm <- c('att1', 'att2','att3', 'att4') 
df2 <- tidyr::separate(df, concatenation, nm, sep = "->-") 

sapply(df2, typeof) 
#  att1  att2  att3  att4  Value 
# "character" "character" "character" "character" "integer" 
write.csv(df2) 
# "","att1","att2","att3","att4","Value" 
# "1","AFG","Afghanistan","1950","True","20,249" 
# "2","AFG","Afghanistan","1951","True","21,352" 
# "3","AFG","Afghanistan","1952","True","22,532" 
# "4","AFG","Afghanistan","1953","True","23,557" 
# "5","AFG","Afghanistan","1954","True","24,555" 
# "6","ALB","Albania","1950","True","8,097" 
# "7","ALB","Albania","1951","True","8,986" 

而在基地R,strsplit()將工作。

df3 <- do.call(rbind.data.frame, strsplit(as.character(df$concatenation), "->-")) 
cbind(setNames(df3, nm), df["Value"]) 
+0

謝謝,這就是我一直在尋找的。 – sdhaus

1

爲什麼你需要在這裏變換?試試這個:

df$concatenation <- colsplit(df$concatenation, "->-", 
        names = c("att1", "att2","att3", "att4")) 
+0

沒錯,我只是在沒有轉換的情況下運行它,它確實產生了相同的結果。但是,寫入CSV時仍然會導致錯誤。 - if(inherit(X [[j]],「data.frame」)&& ncol(xj)> 1L)X [[j]] < - as.matrix(X [[j]])中的錯誤: 缺少TRUE/FALSE所需的值 – sdhaus