我花了一些時間來解決方案,這種方法應該工作。看看一個有關詳情,評論:
# Setting a seed for reproducibility.
set.seed(10)
# Setting what years we want allowed.
validYears <- 2008:2015
# Generating a "fake" dataset for testing purposes.
custDF <- data.frame(CustID = abs(as.integer(rnorm(250, 50, 50))), OrderYear = 0, Amount = abs(rnorm(250, 100, 1000)))
custDF$OrderYear <- sapply(custDF$OrderYear, function(x) x <- sample(validYears, 1)) # Adding random years for each purchase.
# Initializing a new data frame to store the output values.
newDF <- data.frame(Year = validYears, NewCustomers = 0, RunningNewCustomerTotal = 0, NewCustomerRate = "")
custTotal <- 0 # Initializing a variable to be used in the loop.
firstIt <- 1 # Denotes the first iteration.
for (year in validYears) { # For each uniqueYear in your data set (which I arbitarily defined before making the dataset)
# Getting the unique IDs of the current year and the unique IDs of all past years.
currentIDs <- unique(custDF[custDF$OrderYear == year, "CustID"])
pastIDs <- unique(custDF[custDF$OrderYear < year, "CustID"])
if (firstIt == 1) { pastIDs <- c(-1) } # Setting a condition for the first iteration.
newIDs <- currentIDs[!(currentIDs %in% pastIDs)] # Getting all IDs that have not been previously used.
numNewIDs <- length(newIDs) # Getting the number of new IDs.
custTotal <- custTotal + numNewIDs # Getting the running total.
# Adding the new data into the data frame.
newDF[newDF$Year == year, "NewCustomers"] <- numNewIDs
newDF[newDF$Year == year, "RunningNewCustomerTotal"] <- custTotal
# Getting the rate.
if (firstIt == 1) {
NewCustRate <- 0
firstIt <- 2
} else { NewCustRate <- (1 - (newDF[newDF$Year == (year - 1), "RunningNewCustomerTotal"]/custTotal)) * 100 }
# Inputting the new data. Format and round are just getting the decimals down.
newDF[newDF$Year == year, "NewCustomerRate"] <- paste0(format(round(NewCustRate, 2)), "%")
}
隨着輸出:
> newDF
Year NewCustomers RunningNewCustomerTotal NewCustomerRate
1 2008 32 32 0%
2 2009 22 54 41%
3 2010 19 73 26%
4 2011 14 87 16%
5 2012 7 94 7.4%
6 2013 3 97 3.1%
7 2014 9 106 8.5%
8 2015 5 111 4.5%
希望這有助於!
子集年;在客戶ID上使用唯一,然後統計唯一ID的出現次數。 – user1945827
我相信你會創建一個循環來循環每年並計算當年的新客戶收購。在循環中使用當前年份,您可以將循環中的數據分爲只有少於當年的ID,'df [df $ Year
giraffehere