這裏有一個數據幀我可以想到的解決方案可以讓你得到你所需要的包裝和逐行處理:
承擔df
看起來像這樣利用read.csv
和stringsAsFactors = FALSE
:
df
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
你可以這樣做:
library(jsonlite)
l <- lapply(df$Transportation, fromJSON)
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x) sapply(l, function(y) y[[x]]))
爲了得到這個:
如果
df
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
不知道還有一個更高效辦法。
EDIT基於添加的信息涉及畸形JSON在實際數據中
在情況下,存在在Transportation
列中的原始格式不正確的JSON,這裏是解決它的一種方法:
原始數據幀如下:
df <- read.table(text = 'Married,Transportation,Color
YES,"{""Company"":""GTS"",""Type"":""Limo""}",White
,"{""Driver"":""John""}",Green
NO,"{""Type"":""Van"",""Driver"":""John""}",',
header = TRUE, sep = ',', stringsAsFactors = FALSE)
行結合和額外的行與畸形JSON一個額外的「「」字符:
df <- rbind(df, data.frame(Married = 'NO',
Transportation = '{"Company": ""GTLS"}',
Color = 'Red'))
新的df看起來是這樣的(見第4行畸形的JSON):
Married Transportation Color
1 YES {"Company":"GTS","Type":"Limo"} White
2 {"Driver":"John"} Green
3 NO {"Type":"Van","Driver":"John"}
4 NO {"Company": ""GTLS"} Red
現在,用這個來獲取所有嵌套的JSON爲單獨列:
l <- lapply(df$Transportation, function(x) tryCatch({fromJSON(x)}, error = function(e) NA))
n <- unique(unlist(sapply(l, names)))
df[, n] <- lapply(n, function(x)
sapply(l, function(y)
if (!is.null(names(y))) y[[x]]))
輸出作爲如下:
Married Transportation Color Company Type Driver
1 YES {"Company":"GTS","Type":"Limo"} White GTS Limo NULL
2 {"Driver":"John"} Green NULL NULL John
3 NO {"Type":"Van","Driver":"John"} NULL Van John
4 NO {"Company": ""GTLS"} Red NULL NULL NULL
爲什麼你這麼反對解析? – hrbrmstr
@hrbrmstr我只是不認爲解析是一個有效的方法。我大概有30名不同的JSON對象,他們的鍵/值是不同的順序等 – user8010356