R中的關係數據庫

我正在分析來自電子商務網站的數據，並且所有內容都以關係格式存儲。R中的關係數據庫

我想計算產品被用戶購買的概率（產品訂購的次數除以用戶的訂單數量）。

所以最後的結果是：

User Product Probability 

1 | 2323 | 0.32 

userid <-c(1,1,1,1,2,2,2,2) 
product<-c(876,324,122,65,44,324,54,23) 
probability <- c(0.32,0.10,0.25,0.5,0.7,0.8,0.45,0.05) 
exampleresult <- data.frame(userid,product,probability)

示例數據：

orderid <- c(100,111,122,134,144,152,164,177,188,199,200,251,222) 
userid <- c(1,1,1,2,2,2,2,3,3,4,5,5,6) 
orders<-data.frame(orderid,userid) 

productid <- c(66,55,44,54,32,23,65,122,656,324,876,342) 
productname<-c('soda','corn','apple','milk','juice','water','potato','banana','orange','fish','meat','salami') 
products<-data.frame(productid,productname) 

orderid <- c(100,100,100,100,100,111,111,111,122,134,134,134,134,144,144,144,144,144,144,152,164,177,188,188,188,188,199,200,251,222) 
productid <- c(55,54,324,23,324,54,876,324,122,65,65,44,324,54,23,44,324,23,66,876,65,55,32,122,66,66,44,54,66,65) 
ordpro<- data.frame(orderid,productid)

用戶每次買東西的順序與所有他或她買的產品創建。一個用戶可以有多個訂單，每個訂單可以有多個產品。

目前我正在做這個沒有成功。另外考慮到用戶數量，需要花費很多時間。

x <- numeric(length(unique(orders$userid))) 
y <- list() 
for (i in 1:numeric(length(unique(orders$userid)))) { 
    y[[i]] <- table(ordpro[ordpro$orderid %in% orders[orders$userid == "orderid"], "productid"])/length(orders,[orders$userid == i,"orderid"]) 
    x[i] <- length(y[[i]]) 
} 
mydata <- data.frame(x,y)

來源

2017-06-19 italo

你有沒有考慮加入鹼或是不可行的，你的情況？ – RobertMc

我沒有。這將是一個巨大的數據幀，但它是可能的。 – italo

@italo有看看dplyr，特別是新版本，它允許你使用基地，而它不在R中的內存中。https://github.com/tidyverse/dplyr – RobertMc

檢查合併功能加入您的數據幀：

http://www.statmethods.net/management/merging.html

How to join (merge) data frames (inner, outer, left, right)?

避免使用循環

來源

2017-06-19 20:11:31 napsta32

我做了以下內容，其中按產品計算出總的訂單，以及按用戶分產品組的總數，然後我將用戶訂單除以總數。你在找什麼？可能不是最有效的方法，我應該用'總結'在那裏我想，但也許它可以幫助你找到自己的答案。

庫（dplyr

orders_user <- merge(ordpro, orders, by = "orderid") 

bought_user <- merge(orders_user, products, by = "productid") 

df <- bought_user %>% group_by(productid) %>% mutate(tot_prod = n()) %>% 
    group_by(userid, productid) %>% mutate(tot_user = n()) %>% 
    mutate(prop = tot_user/tot_prod) 

df <- unique(df %>% select(userid, productid, productname, 
           tot_prod, tot_user, prop) %>% 
           arrange(userid, productid)) 

head(df, 10) 

    userid productid productname tot_prod tot_user  prop 
    <dbl>  <dbl>  <fctr> <int> <int>  <dbl> 
     1  1  23  water  3  1 0.3333333 
     2  1  54  milk  4  2 0.5000000 
     3  1  55  corn  2  1 0.5000000 
     4  1  122  banana  2  1 0.5000000 
     5  1  324  fish  5  3 0.6000000 
     6  1  876  meat  2  1 0.5000000 
     7  2  23  water  3  2 0.6666667 
     8  2  44  apple  3  2 0.6666667 
     9  2  54  milk  4  1 0.2500000 
     10  2  65  potato  4  3 0.7500000

來源

2017-06-19 20:17:02 csmontt

R中的關係數據庫

回答

相關問題