在線零售中用戶產品購買的過渡計數

我試圖從頭開始構建馬爾可夫鏈用戶轉換矩陣，但卡在字典值分配中。下面是示例代碼在線零售中用戶產品購買的過渡計數

## user purchase sequence seperated by '|' at different time intervals 
## lets say in first purchase user bought 3 4 12 23 45 41 25 these products then 4 5 12 17 19 25 46 3 and so on 
user_purchase = '3 4 12 23 45 41 25|4 5 12 17 19 25 46 3|39 12 3 23 50 24 35 13|42 34 17 19 46' 
## I need to find the transition count from first purchase to second and so on 
## e.g 3-1 is 0 , 3-2 is 0 , 3-3 is 0 , 3-4 is 1 
## hence output should be {...,2:[(0,0),(0,0),.....], 3:[(0,1),(0,1),(0,1),(1,1), ...], 4:[...]} its a dictionary of list with tuples 

### lets say its the total no of products ranging from 1 to 50 that user can buy 
prod = range(1,51) 

### initializing a dictionary of list with tuples 
t = (0,0) 
list1= [] 
for _ in range(len(prod)): 
    list1.append(t) 
user_tran = {} 
for p in prod: 
    user_tran[p]= list1 


# def trans_matrix(prod_seq): 
basket_seq = user_purchase.split('|') 
iteration = len(basket_seq) 
for j in range(iteration-1): 
    trans_from = basket_seq[j] 
    trans_to = basket_seq[j+1] 
    tfrom = map(int,trans_from.split(' ')) 
    print tfrom 
    tto = map(int,trans_to.split(' ')) 
    for item in tfrom: 
### problem here is in each iteration the default value for all keys is updated from [(0,0),(0,0),....] to item_list 
     item_list = user_tran[item] ### seems problem here 
     for i in range(len(prod)): 
      if i+1 in tto: 
       temp = item_list[i] 
       x = list(temp) 
       x[0] = x[0] +1 
       x[1] = x[1] +1 
       item_list[i] = tuple(x) 
      else: 
       temp = item_list[i] 
       x = list(temp) 
       x[0] = x[0] 
       x[1] = x[1] + 1 
       item_list[i] = tuple(x) 
     user_tran[item] = item_list ### list updation should only be for item specified as key in user_tran but all keys are updated with same value

user_tran [3] [1：5]

缺貨[38]：[（0，23），（15，23），（7，23），（7，23）]

所需的輸出從3在不同的時間

0過渡到1,2中的購買3個序列和產品3存在於前三個購買序列。但有從3-3

兩個過渡[（0,3），（0,3），（2,3），...，直到產品50]

來源

2017-08-06 suvir gupta

你能解釋一下你的元組列表是什麼意思嗎？ –

我想要做的是，讓我們說，有一個商店有5個產品，用戶在第一次訪問時購買了1,3,4，在第二次訪問中購買了3,2,5，所以從第一次訪問到第二次訪問由概率矩陣。例如，對於產品1，此處可能的轉換爲 1-1,1-2,1-3,1-4,1-5和實際轉換 1-3,1-2,1-5 因此輸出字典元組列表應該看起來像 {1：[（0,1），（1,1），（1,1），（0,1），（1,1）]，2：[...]} 這裏是元組第一個元組列表（0,1）表示從1-1轉換，0表示1-1沒有發生，1個指示從第1次轉換到第2次轉換。 –

這裏的問題是，無論何時我嘗試更新字典特定於某個鍵的列表值時，我都無法根據鍵有選擇地爲字典賦值。所有鍵均使用相同的值進行更新。我不知道如果我錯誤地分配給列表'user_tran'列表值的字典。 –

我做沒有找到原因，但我試圖用numpy數組實現它，沒有任何元組和字典。

我的輸出與您的預期輸出不同，但我完全按照您的目標與詞典完成。它只是將字典列表版本翻譯成numpy陣列版本。可能會幫助你。

import numpy as np 

user_purchase = '3 4 12 23 45 41 25|4 5 12 17 19 25 46 3|39 12 3 23 50 24 35 13|42 34 17 19 46' 
prod = range(0, 50) 
user_tran = np.zeros((50,50,2)) 
basket_seq = user_purchase.split('|') 
iteration = len(basket_seq) 
for j in range(iteration-1): 
    trans_from = basket_seq[j] 
    trans_to = basket_seq[j+1] 
    tfrom = map(int,trans_from.split(' ')) 
    tfrom = [x-1 for x in tfrom] 
    tto = map(int,trans_to.split(' ')) 
    tto = [x - 1 for x in tto] 
    for item in tfrom: 
     item_list = user_tran[item, :, :] 
     for i in range(len(prod)): 
      if i + 1 in tto: 
       temp = item_list[i, :] 
       item_list[i, :] = np.array([temp[0] + 1, temp[1] + 1]) 
      else: 
       temp = item_list[i, :] 
       item_list[i, :] = np.array([temp[0], temp[0] + 1]) 
     user_tran[item, :, :] = item_list 
print user_tran[2, 1:5, :]

user_tran形式是如下：NxMx2其中N是數在字典版本鍵，M是數在您的商店的產品和2是而不是用2個值元組。例如：獲得第3鍵在你的字典，並從1到列表中的項目4號，你必須寫

user_tran[2, 1:5, :] #instead of user_tran[3][1:5]

因爲數組索引0開始而不是1

，您將獲得4×2矩陣，其中4是您列表中的元素數量，2是您元組的2個值。

來源

2017-08-07 08:00:50

謝謝你Batyr！但我試圖使用常規數據結構在分佈式平臺上使用spark進行計算，因爲實際數據集大約有50000個產品和超過100萬個用戶。不知道如何在關閉函數中使用np數組。 item_list = user_tran [item] 我認爲這裏傳遞的是位置引用而不是值引起的問題。 –

在線零售中用戶產品購買的過渡計數

回答

相關問題