基於計數創建摘要數據框

我正在嘗試使用數據框來創建摘要計數的第二個數據框。我的原始數據的結構爲：基於計數創建摘要數據框

mydata <- read.table(header=TRUE, text=" 
item type store1 store2 store3 store4 store5 
chair timber 0 1 4 0 6 
chair metal 0 1 4 1 9 
chair upholstered 3 0 0 1 1 
table indoor 1 8 0 1 0 
table outdoor 1 12 2 1 0 
bed single 0 0 2 1 0 
bed double 0 1 1 1 0 
bed queen 1 0 0 1 3 
bed king 5 0 1 3 0")

我希望我的彙總數據幀在每個店鋪來算每一種類型的目前的傢俱，並給我一個總結其中的股票在每個店（只存在/不存在，不東西的個數）。它應該看起來像這樣：

summary <- read.table(header=TRUE, text=" 
store chair_types table_types bed_types total_types 
store1 1 2 2 5 
store2 2 2 1 5 
store3 2 1 3 6 
store4 2 2 4 8 
store5 3 0 1 4")

這在Excel中很容易，但我試圖咬住子彈並學會正確地做。如果這是重複的道歉，但我找不到類似的例子。提前致謝。

來源

2016-09-28 setbackademic

我們可以用dplyr/tidyr來做到這一點。在'item'分組後，循環訪問'store'列（summarise_each），得到每個'store'列中非零元素的數量（sum(.!=0），轉換爲'long'格式（gather），paste substriing' _types'到‘項’，spread‘長’格式‘寬’，並創建一個‘總長‘格式’列使用rowSums

library(dplyr) 
library(tidyr) 
mydata %>% 
    group_by(item) %>% 
    summarise_each(funs(sum(.!=0)), store1:store5) %>% 
    gather(store, val, store1:store5) %>% 
    mutate(item = paste0(item, "_types")) %>% 
    spread(item, val) %>% 
    mutate(total = rowSums(.[-1])) 
# store bed_types chair_types table_types total 
# <chr>  <int>  <int>  <int> <dbl> 
#1 store1   2   1   2  5 
#2 store2   1   2   2  5 
#3 store3   3   2   1  6 
#4 store4   4   2   2  8 
#5 store5   1   3   0  4

這也可通過第一轉換做’ ，按「商品」，「商店」分組，獲得按「商店」分組的非零元素數量（summarise），通過累加'val'創建'總計'列，然後spread

mydata %>% 
    gather(store, val, store1:store5) %>% 
    group_by(item, store) %>% 
    summarise(val = sum(val!=0)) %>% 
    group_by(store) %>% 
    mutate(Total = sum(val)) %>% 
    spread(item, val)

我們也可以使用rowsum和addmargins

addmargins(t(rowsum(+(mydata[-(1:2)]!=0), mydata[,1])), 2) 
#  bed chair table Sum 
#store1 2  1  2 5 
#store2 1  2  2 5 
#store3 3  2  1 6 
#store4 4  2  2 8 
#store5 1  3  0 4

來源

2016-09-28 05:00:01 akrun

我寫了下面的'GROUP_BY（MYDATA，項目）％>％summarize_if（ is.numeric，sum（。！= 0））'而不是您的summarize_each代碼。我認爲這會起作用，但我收到以下消息。 'UseMethod（「as.fun_list」）中的錯誤：沒有適用於將'as.fun_list'應用於類「c（'integer'，'numeric'）的對象的方法」'任何想法？ – jazzurro

這是完美的，akrun。非常感謝您的幫助。 – setbackademic

@jazzurro我發現這個工作'mydata％>％mutate_each（funs（。！= 0），store1：store5）％>％group_by（item）％>％summarise_if（is.logical，sum）' – akrun

你想要可以使用R功能aggregate在基礎R stats包做什麼的核心base R做到這一點很容易

> aggregated <- aggregate(mydata[grep("store",names(mydata))], 
          by = mydata["item"], 
          FUN = function(x) sum(x != 0)) 
> aggregated 
    item store1 store2 store3 store4 store5 
1 bed  2  1  3  4  1 
2 chair  1  2  2  2  3 
3 table  2  2  1  2  0

第一個p參數mydata[grep("store",names(mydata))]正在從您的數據框中選擇「存儲」。第二個參數by = mydata["item"]表示您希望使用「項目」來標識數據框中的組。最後，FUN = function(x) sum(x != 0)指示您要計算每個商店列的每個商品的非零元素數。

這可能是足夠的，但如果你想更多類似格式化它什麼你有以上，你可以這樣做：

> summary <- as.data.frame(t(aggregated[-1])) 
> names(summary) <- aggregated[[1]] 
> summary[["total"]] <- rowSums(summary) 
> summary 
     bed chair table total 
store1 2  1  2  5 
store2 1  2  2  5 
store3 3  2  1  6 
store4 4  2  2  8 
store5 1  3  0  4

來源

2016-09-28 05:37:11 Barker

基於計數創建摘要數據框

回答

相關問題