R - 每年新客戶數和前一年購買客戶的百分比

我有一個非常大的客戶數據集，他們購買了一些日期（年）。我想R鍵給我：R - 每年新客戶數和前一年購買客戶的百分比

每年新客戶的數量，並
從當年客戶之前％（N-1）。

我的數據是這樣的：

customer_id  year  
12001   2007 
12001   2008 
12001   2009 
12002   2006 
12002   2007 
12003   2005 
...    ...

每個客戶做了一段時間的各種購買。

我想有輸出是這樣的：

# Table1 
year no. of new customers 
2005   34 
2006   25 
2007   17 
...   ...

表1報告每年獨特的新價值;和：

# Table2 
year % of customers that also purchased at (year-1) 
2005  25% 
2006  17% 
...  ...

此表2表示「記錄在2005年的所有客戶，25％也被記錄在2004年，2006年記錄的所有客戶，17％也被記錄在2006年，等等。」

我知道第一部分是partially answered，但它不適用於R.而我在其他地方找不到類似的東西。

來源

2014-10-12 Billaus

您需要同時提供與所提供的數據集相對應的數據集和所需的輸出。就目前來看，你似乎提供了一個太小的數據集和一些虛構的想要的輸出，它與提供的數據集無關。這種行爲通常會讓用戶跳過你的問題並繼續前進，而理論上可以很容易地回答 – 2014-10-12 11:15:39

生成一些樣本數據

set.seed(31) 
nSamples=5000 
df<-data.frame(id=sample(12001:12100,nSamples,replace=T), 
       year=sample(2001:2014,nSamples,replace=T))

您可以使用表來確定每個客戶每年有多少購買

t_purchasePerYear<-table(df$year,df$id)

然後你就可以得到客戶每年

數量的變化

nCustPerYear <- apply(t_purchasePerYear,1,function(x){sum(x>0)}) 
nCustPerYear 
nYear = length(nCustPerYear) 
nNewCustPerYear <- nCustPerYear[2:nYear] - nCustPerYear[1:(nYear-1)] 
nNewCustPerYear

製作今年購買的顧客的第二張桌子，但不是最後一張

t_didBuyThisYearAndLast <- t_purchasePerYear[2:nYear,]>0 & t_purchasePerYear[1:(nYear-1),]>0

現在得到今年購買了custormers的數量和最後

nBuyThisYearAndLast <- apply(t_didBuyThisYearAndLast,1,function(x){sum(x)}) 
nBuyThisYearAndLast

除以每年客戶數量得到的百分比

pcntBuyThisYearAndLast <- nBuyThisYearAndLast/nCustPerYear[2:nYear] *100 
pcntBuyThisYearAndLast

來源

2014-10-12 11:01:46 user3969377

除非我誤解東西，以下可能會有所幫助：

tab = table(DF) 
tab 
#   year 
#customer_id 2005 2006 2007 2008 2009 2010 
#  12001 0 0 1 1 1 0 
#  12002 0 1 1 0 0 0 
#  12003 1 0 0 0 0 0 
#  12004 1 0 1 0 0 0 
#  12006 0 0 0 1 0 0 
#  12007 0 0 0 1 1 0 
#  12008 0 0 0 0 0 1 

#new customers per year 
as.data.frame(table(factor(colnames(tab)[max.col(tab, "first")], colnames(tab)))) 
# Var1 Freq 
#1 2005 2 
#2 2006 1 
#3 2007 1 
#4 2008 2 
#5 2009 0 
#6 2010 1 

#pct 
as.data.frame(as.table((colSums((tab[, -1] == tab[, -ncol(tab)]) * (tab[, -1] == 1))/colSums(tab[, -1])) * 100)) 
# Var1  Freq 
#1 2006 0.00000 
#2 2007 33.33333 
#3 2008 33.33333 
#4 2009 100.00000 
#5 2010 0.00000

其中「DF」：

DF = structure(list(customer_id = c(12001L, 12001L, 12001L, 12002L, 
12002L, 12003L, 12004L, 12004L, 12006L, 12007L, 12007L, 12008L 
), year = c(2007L, 2008L, 2009L, 2006L, 2007L, 2005L, 2005L, 
2007L, 2008L, 2008L, 2009L, 2010L)), .Names = c("customer_id", 
"year"), class = "data.frame", row.names = c(NA, -12L))

來源

2014-10-12 11:17:55

R - 每年新客戶數和前一年購買客戶的百分比

回答

相關問題