2015-02-23 74 views
1

的樣本數據集的行:創建數據有條件

Price=c(6651, 7255, 25465, 35645, 2556, 3665) 
NumberPurchased=c(25, 30, 156, 250, 12, 16) 
Type=c("A", "A", "C", "C", "B", "B") 
Source=c("GSC", "MYL", "TTC", "ZAF", "CAN", "HLT") 
df1 <- data.frame(Price, NumberPurchased, Type, Source) 

我希望能夠創建一個新的數據幀有兩個額外的變量(IDPurchaseDate),但根據可變Type更多的數據行。

我想要應用的規則: 如果Type = A,PurchaseDate是「2013」​​,「2014」。 如果Type = B,PurchaseDate是「2013」​​。 如果Type = C,則PurchaseDate爲「2013」​​,「2014」,「2015」。

如果Type是A,則將PriceNumberPurchased除以2,並且具有如上所述的具有不同PurchaseDate的2行。 如果Type是B,保持原樣與作爲如果Type是C,除以3 PriceNumberPurchased,並且如上文指定的具有3行具有不同PurchaseDate

因此,我想是這樣的一個新的數據集:

Price=c(3325.5, 3325.5, 3627.5, 3627.5, 8488.3, 8488.3, 8488.3, 11881.6, 11881.6, 11881.6, 2556, 3665) 
NumberPurchased=c(12.5, 12.5, 15, 15, 52, 52, 52, 83.3, 83.3, 83.3, 12, 16) 
Type=c("A", "A", "A", "A", "C", "C", "C", "C", "C", "C","B", "B") 
Source=c("GSC", "GSC", "MYL", "MYL", "TTC","TTC", "TTC", "ZAF", "ZAF","ZAF", "CAN", "HLT") 
PurchaseDate=c("2013", "2014", "2013", "2014", "2013", "2014", "2015", "2013", "2014", "2015", "2013", "2013") 
ID=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6) 
df2 <- data.frame(Price, NumberPurchased, Type, Source, PurchaseDate, ID) 

任何見解?

回答

2

以下是一種可能的方法。首先,我們將創建一個Type的索引,然後我們將相應地增加數據,然後我們將使用data.table包來計算新變量。

library(data.table) 
setDT(df1)[, indx := as.numeric(factor(Type, levels = c("B", "A", "C")))] 
# setDT(df1)[, indx := ifelse(Type == "C", 3, 2)] # Alternative index per your comment 
df2 <- df1[rep(seq_len(.N), indx)] 

df2[, `:=`(
      Price = Price/.N, 
      PurchaseDate = 2013:(2013 + (.N - 1)), 
      NumberPurchased = NumberPurchased/.N, 
      ID = .GRP 
      ), 
      by = .(Source, Type)][] 

#   Price NumberPurchased Type Source indx PurchaseDate ID 
# 1: 3325.500  12.50000 A GSC 2   2013 1 
# 2: 3325.500  12.50000 A GSC 2   2014 1 
# 3: 3627.500  15.00000 A MYL 2   2013 2 
# 4: 3627.500  15.00000 A MYL 2   2014 2 
# 5: 8488.333  52.00000 C TTC 3   2013 3 
# 6: 8488.333  52.00000 C TTC 3   2014 3 
# 7: 8488.333  52.00000 C TTC 3   2015 3 
# 8: 11881.667  83.33333 C ZAF 3   2013 4 
# 9: 11881.667  83.33333 C ZAF 3   2014 4 
# 10: 11881.667  83.33333 C ZAF 3   2015 4 
# 11: 2556.000  12.00000 B CAN 1   2013 5 
# 12: 3665.000  16.00000 B HLT 1   2013 6 
+0

該解決方案的工作,但後續的問題 - 將這項工作如果TYPE = A,除以2,如果TYPE = B,除以2,以及,類型= C,3分? – Tan 2015-02-23 10:31:46

+1

是的,只需將第一行改爲setDT(df1)[,indx:= ifelse(Type ==「C」,3,2)]',然後按原樣運行其餘代碼。我編輯了答案,以便您可以選擇想要的索引。 – 2015-02-23 10:34:13