2017-05-25 68 views
0

我的數據看起來像列拆分成R欄和行

df <- data.frame(user_id=c('13','15'), 
       answer_id = c('{"row[0][0]":"A","row[0][1]":"B","row[0][2]":"C","row[0][3]":"D","row[1][0]":"A1","row[1][1]":"B1","row[1][2]":"C1","row[1][3]":"D1"}', '{"row[0][0]":"W","row[0][1]":"X","row[0][2]":"Y","row[0][3]":"Z","row[1][0]":"W1","row[1][1]":"X1","row[1][2]":"Y1","row[1][3]":"Z1"} 
')) 

需要的數據視圖

user_id  answer_id1  answer_id2 answer_id3  answer_id4 
13     A    B    C   D 
13     A1    B1   C1   D1 
15     W    X    Y   Z 
15     W1    X1   Y1   Z1 

我是新與R和希望能儘快得到解決,因爲我總是

+3

誰給你這個數據是你的敵人的一種形式。永遠不要相信他。 – G5W

+0

你的數據看起來是JSON(我假設這只是真實數據外觀的一個例子)。如果是這樣,看看jsonlite包,你可以把這種類型的數據轉換成列表,然後在R中很容易地使用數據框。 – RobertMc

+0

yap我有JSON代碼,但有一些excel文件和忙碌的工作,但感謝得到了代碼 – Janjua

回答

2

可能不是最好的解決方案,但這可以使您從您的示例輸入到使用stringr,purrr,& tidyr。有關在stringr::str_match_all()調用中使用的正則表達式的解釋,請參閱regex101

df <- data.frame(user_id=c('13','15'), 
       answer_id = c('{"row[0][0]":"A","row[0][1]":"B","row[0][2]":"C","row[0][3]":"D","row[1][0]":"A1","row[1][1]":"B1","row[1][2]":"C1","row[1][3]":"D1"}', '{"row[0][0]":"W","row[0][1]":"X","row[0][2]":"Y","row[0][3]":"Z","row[1][0]":"W1","row[1][1]":"X1","row[1][2]":"Y1","row[1][3]":"Z1"}'), 
       stringsAsFactors=F) 

#use regex to extract row ids and answers 
regex_matches  <- stringr::str_match_all(df$answer_id, '\\"row\\[(\\d+)\\]\\[(\\d+)\\]\\":\\"([^\\"]*)\\"') 
#add user id to each result 
answers_by_user <- purrr::map2(df$user_id, regex_matches, ~cbind(.x, .y[,-1])) 
#combine list of matrices and convert to df 
answers_df  <- data.frame(do.call(rbind, answers_by_user)) 
#add meaningful names 
names(answers_df) <- c("user_id", "row_1", "row_2", "value") 
#convert to wide 
spread_row_1  <- tidyr::spread(answers_df, row_1, value) 
final_df   <- tidyr::spread(answers_df, row_2, value) 
#remove row column 
final_df$row_1 <- NULL 
#clean up names 
names(final_df) <- c("user_id", "answer_id1", "answer_id2", "answer_id3", "answer_id4") 
final_df 

#output 
    user_id answer_id1 answer_id2 answer_id3 answer_id4 
1  13   A   B   C   D 
2  13   A1   B1   C1   D1 
3  15   W   X   Y   Z 
4  15   W1   X1   Y1   Z1 
+0

偉大的工作是有可能分隔重複行??? – Janjua

+0

我不確定你的意思。如果您需要顯示所需的輸出以充分描述您想要的內容,可能值得張貼另一個問題 –

+0

我很感謝您的幫助和時間,關閉 – Janjua

1

第2欄看起來像JSON,所以你可以做這樣的事情來讓它進入,你可以做一些事情......

library(rjson) 
df2 <- lapply(1:nrow(df),function(i) 
      data.frame(user=df[i,1], 
      answer=unlist(fromJSON(as.character(df[i,2]))),stringsAsFactors = FALSE)) 
df2 <- do.call(rbind,df2) 
df2[,"r1"] <- gsub(".+\\[(\\d)]\\[(\\d)].*","\\1",rownames(df2)) 
df2[,"r2"] <- gsub(".+\\[(\\d)]\\[(\\d)].*","\\2",rownames(df2)) 

df2 
      user answer r1 r2 
row[0][0] 13  A 0 0 
row[0][1] 13  B 0 1 
row[0][2] 13  C 0 2 
row[0][3] 13  D 0 3 
row[1][0] 13  A1 1 0 
row[1][1] 13  B1 1 1 
row[1][2] 13  C1 1 2 
row[1][3] 13  D1 1 3 
row[0][0]1 15  W 0 0 
row[0][1]1 15  X 0 1 
row[0][2]1 15  Y 0 2 
row[0][3]1 15  Z 0 3 
row[1][0]1 15  W1 1 0 
row[1][1]1 15  X1 1 1 
row[1][2]1 15  Y1 1 2 
row[1][3]1 15  Z1 1 3 
+0

yap它是從JSON導出我猜donno很多,非常感謝:D – Janjua