2017-08-27 73 views
0

我有一個數據幀。在數據幀的每一行中,最後一列是一個字符串(名稱爲data_listing)。 data_listing字符串本身是一系列由逗號分隔的鍵:值對。這裏是字符串中的一個的示例:如何將包含可解析字段的字符串添加到可添加到數據幀的列中

> data_listing[1:2] 
[1] "id:4006422,memberId:2932850,price:999,make:Chevrolet,model:Cobalt,makeYear:2009,trim:LT,mileage:142000,sellerType:For Sale By Owner,dealerOptions:null,index:2"                                                                    
[2] "id:3987513,memberId:67473,price:26799,make:Audi,model:S5,makeYear:2013,trim:Prestige,mileage:44673,sellerType:Dealership,dealerOptions:{options:{VDPcarousel:true,allowUsed:true,calculator:true,carFaxIntegration:true,featuredCarousel:true,feed:true,homepageSpotlight:0,inlineSpotlight:11,limit:-1,map:true,monsterAds:true,pop:2,priceReduced:true,refresh:7,wrap:true,chat:false,inventoryComparison:true,standardFeatured:3}},index:3" 

我想在data_listing串的每個值創建在數據幀的一列。每列將使用鍵值作爲其名稱。

如果我運行strsplit(data_listing, ","),那麼我會得到一個字符串列表。每個列表元素都包含一個字符向量「key:value」對。

我毫不猶豫地寫了一個for循環來grep每個sublist元素,並將值添加到原始數據框中的各個列,但這是我唯一可以弄清楚如何做到這一點的方法。

我已經看過transform和tidyr::separate(),但是這些可以幫助我們對字符串中的單個項目進行grep,而不是爲28個值。

你會如何解決這個問題?

回答

1

我會做這樣的事情:

data_listing <- c("id:4006422,memberId:2932850,price:999,make:Chevrolet,model:Cobalt,makeYear:2009,trim:LT,mileage:142000,sellerType:For Sale By Owner,dealerOptions:null,index:2", 
        "id:3987513,memberId:67473,price:26799,make:Audi,model:S5,makeYear:2013,trim:Prestige,mileage:44673,sellerType:Dealership,dealerOptions:{options:{VDPcarousel:true,allowUsed:true,calculator:true,carFaxIntegration:true,featuredCarousel:true,feed:true,homepageSpotlight:0,inlineSpotlight:11,limit:-1,map:true,monsterAds:true,pop:2,priceReduced:true,refresh:7,wrap:true,chat:false,inventoryComparison:true,standardFeatured:3}},index:3") 

library(tidyverse) 

# custom fxn for use on a single element in data_listing 
parser <- function(x) { 
    strsplit(x, ",",) %>% 
     unlist %>% 
     as.tibble %>% 
     separate(value, c("colnames", "values")) %>% 
     spread(colnames, values) 
} 

map_dfr(data_listing, parser) # apply to each element then rbind() together 

# console ... 
# A tibble: 2 x 28 
dealerOptions  id index  make makeYear memberId mileage model price 
<chr> <chr> <chr>  <chr> <chr> <chr> <chr> <chr> <chr> 
1   null 4006422  2 Chevrolet  2009 2932850 142000 Cobalt 999 
2  options 3987513  3  Audi  2013 67473 44673  S5 26799 
# ... with 19 more variables: sellerType <chr>, trim <chr>, allowUsed <chr>, 
# calculator <chr>, carFaxIntegration <chr>, chat <chr>, featuredCarousel <chr>, 
# feed <chr>, homepageSpotlight <chr>, inlineSpotlight <chr>, 
# inventoryComparison <chr>, limit <chr>, map <chr>, monsterAds <chr>, pop <chr>, 
# priceReduced <chr>, refresh <chr>, standardFeatured <chr>, wrap <chr> 
+0

完美。字符串的第二個到最後一個元素(dealerOptions)是一個包含子元素的複雜元素。我將不得不遵循你的邏輯,並重復你已經做了什麼來將經銷商選項分散到新的數據列中,當莊家選項存在時。 –