2014-11-04 90 views
4

我有R中創建R中新的數據幀

customer_key item_key units 
2669699   16865 1.00 
2669699   16866 1.00 
2669699   46963 2.00 
2685256   55271 1.00 
2685256   43458 1.00 
2685256   54977 1.00 
2685256    2533 1.00 
2685256   55011 1.00 
2685256   44785 2.00 

這種格式的數據,但我想要得到的唯一head_key作爲柱,我想我的其他變量名是item_key和獨特價值的值將是單位這樣

customer_key  '16865' '16866' '46963' '55271' '43458' '54977' '2533' 
    2669699   1.00  1.00  1.00  0.00  0.00  0.00  0.00 
    2685256   0.00  0.00  0.00  1.00  1.00  1.00  2.00 

請幫我改變我的數據聚類分析

回答

3

這裏有一種方法。

library(tidyr) 

spread(mydf,item_key, units, fill = 0) 

# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271 
#1  2669699 0  1  1  0  0  2  0  0  0 
#2  2685256 1  0  0  1  2  0  1  1  1 
+0

這是一個很好的解決方案,但我認爲管道在這裏只是不必要的,只是'spread(df,item_key,u nits,fill = 0)'就足夠了。這也是有趣的,你不需要指定'customer_key' – 2014-11-04 08:12:33

+0

@DavidArenburg謝謝你。你是對的。我修改了我的答案。我仍然用'dacst'的方式思考,並且對語法有些困惑。 :) – jazzurro 2014-11-04 08:19:08

2
library(dplyr); library(tidyr) 
df2 <- df %>% arrange(item_key) %>% spread(item_key, units, fill=0) 
df2 
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271 
# 1  2669699 0  1  1  0  0  2  0  0  0 
# 2  2685256 1  0  0  1  2  0  1  1  1 

數據

df <- structure(list(customer_key = c(2669699L, 2669699L, 2669699L, 
2685256L, 2685256L, 2685256L, 2685256L, 2685256L, 2685256L), 
    item_key = c(16865L, 16866L, 46963L, 55271L, 43458L, 54977L, 
    2533L, 55011L, 44785L), units = c(1, 1, 2, 1, 1, 1, 1, 1, 
    2)), .Names = c("customer_key", "item_key", "units"), class = "data.frame", row.names = c(NA, 
-9L)) 
3

這只是一個簡單的dcast任務。假設df是你的數據集

library(reshape2) 
dcast(df, customer_key ~ item_key , value.var = "units", fill = 0) 
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271 
# 1  2669699 0  1  1  0  0  2  0  0  0 
# 2  2685256 1  0  0  1  2  0  1  1  1 
3

由於包已經覆蓋(+1到你所有),這裏有一些基礎的解決方案的入黨:

xtabs

xtabs(units ~ customer_key + item_key, df) 
#    item_key 
# customer_key 2533 16865 16866 43458 44785 46963 54977 55011 55271 
#  2669699 0  1  1  0  0  2  0  0  0 
#  2685256 1  0  0  1  2  0  1  1  1 

reshape

reshape(df, direction = "wide", idvar = "customer_key", timevar = "item_key") 
# customer_key units.16865 units.16866 units.46963 units.55271 
# 1  2669699   1   1   2   NA 
# 4  2685256   NA   NA   NA   1 
# units.43458 units.54977 units.2533 units.55011 units.44785 
# 1   NA   NA   NA   NA   NA 
# 4   1   1   1   1   2 
+0

(+1)好的舊'xtabs'。我總是忘記他。 '重塑'總是一個尷尬的功能 – 2014-11-04 08:14:40

+0

'刺'不在我心中。好提醒。謝謝。 :) +1 – jazzurro 2014-11-04 08:28:38