2015-10-16 27 views
0

我有一個日期爲CustOrder關於客戶購買從2008-2013與以下信息(這只是部分數據):如何通過發現與前幾年的重疊來計算客戶收購率?

CustID OrderYear Amount 
101102 2008  22429.00 
101102 2009  11045.00 
101435 2010  10740.77 
101435 2011  73669.50 
107236 2012  162123.50 
101416 2010  8102.00 
101416 2011  360.00 
101416 2012  36576.00 
101416 2013  1960.00 
101467 2012  997.00 
101604 2010  2971.53 
101664 2009  91.94 
101664 2011  130.93 
......... 

有些客戶可能繼續購買,每年(即101416),或者只是某些年(即101664)。我想弄清楚客戶購買率,即當年新增客戶數量和數量(對於沒有連續購買的客戶,只考慮首次購買)。例如,

Year Customer TotalCustomerNumber NewCustomerRate 
2008 5   5      0% 
2009 3   8      37% 
2010 4   12     33% 
2011 2   14     14% 
2012 3   17     17% 
2013 2   19     10% 

任何人有任何想法/提示如何做到這一點?

我感謝任何幫助!

+0

子集年;在客戶ID上使用唯一,然後統計唯一ID的出現次數。 – user1945827

+0

我相信你會創建一個循環來循環每年並計算當年的新客戶收購。在循環中使用當前年份,您可以將循環中的數據分爲只有少於當年的ID,'df [df $ Year giraffehere

回答

1

我花了一些時間來解決方案,這種方法應該工作。看看一個有關詳情,評論:

# Setting a seed for reproducibility. 
set.seed(10) 

# Setting what years we want allowed. 
validYears <- 2008:2015 

# Generating a "fake" dataset for testing purposes. 
custDF <- data.frame(CustID = abs(as.integer(rnorm(250, 50, 50))), OrderYear = 0, Amount = abs(rnorm(250, 100, 1000))) 
custDF$OrderYear <- sapply(custDF$OrderYear, function(x) x <- sample(validYears, 1)) # Adding random years for each purchase. 

# Initializing a new data frame to store the output values. 
newDF <- data.frame(Year = validYears, NewCustomers = 0, RunningNewCustomerTotal = 0, NewCustomerRate = "") 
custTotal <- 0 # Initializing a variable to be used in the loop. 
firstIt <- 1 # Denotes the first iteration. 

for (year in validYears) { # For each uniqueYear in your data set (which I arbitarily defined before making the dataset) 

    # Getting the unique IDs of the current year and the unique IDs of all past years. 
    currentIDs <- unique(custDF[custDF$OrderYear == year, "CustID"]) 
    pastIDs <- unique(custDF[custDF$OrderYear < year, "CustID"]) 

    if (firstIt == 1) { pastIDs <- c(-1) } # Setting a condition for the first iteration. 

    newIDs <- currentIDs[!(currentIDs %in% pastIDs)] # Getting all IDs that have not been previously used. 
    numNewIDs <- length(newIDs) # Getting the number of new IDs. 
    custTotal <- custTotal + numNewIDs # Getting the running total. 

    # Adding the new data into the data frame. 
    newDF[newDF$Year == year, "NewCustomers"] <- numNewIDs 
    newDF[newDF$Year == year, "RunningNewCustomerTotal"] <- custTotal 

    # Getting the rate. 
    if (firstIt == 1) { 

    NewCustRate <- 0 
    firstIt <- 2 

    } else { NewCustRate <- (1 - (newDF[newDF$Year == (year - 1), "RunningNewCustomerTotal"]/custTotal)) * 100 } 

    # Inputting the new data. Format and round are just getting the decimals down. 
    newDF[newDF$Year == year, "NewCustomerRate"] <- paste0(format(round(NewCustRate, 2)), "%") 

} 

隨着輸出:

> newDF 
    Year NewCustomers RunningNewCustomerTotal NewCustomerRate 
1 2008   32      32    0% 
2 2009   22      54    41% 
3 2010   19      73    26% 
4 2011   14      87    16% 
5 2012   7      94   7.4% 
6 2013   3      97   3.1% 
7 2014   9      106   8.5% 
8 2015   5      111   4.5% 

希望這有助於!

+0

謝謝長頸鹿!它幫助我很多!!!我只改變了一點點''newDF < - data.frame(Year = validYears,NewCustomers = 0,RunningNewCustomerTotal = 0,NewCustomerRate = 0)'否則,我得到錯誤信息'無效因子水平, NA生成' –

+0

無後顧之憂。我剛剛將該列初始化爲字符類,因爲我知道後面必須將它與「%」連接起來,但無論如何工作。 – giraffehere

相關問題