熊貓：訪問下一行（如鉛對Oracle）

我有一個日誌的用戶交互與網站：熊貓：訪問下一行（如鉛對Oracle）

id user_id action_type comment timestamp

並非所有ACTION_TYPE可相同。他們中的一些更重要的，有些則少：

PURCHASE (important, primary) 
VISIT_PAGE (less important, secondary)

我想我的表格轉換爲以下幾點：

id user_id action_type comment timestamp next_id goal_id

其中：

NEXT_ID是下一個針對未來最接近時間戳的用戶採取的行動

goal_id在未來

例如下一主要動作與最接近的時間戳的用戶，如果用戶有以下病史：

/ -> /toys -> /toys/lego -> /toys/lego/ABC001 -> PURCHASE

然後，我有如下表：

id user_id action_type comment   timestamp next_id goal_id 
1  1   VISIT_PAGE /    123456789 2   5 
2  1   VISIT_PAGE  /toys    123457789 3   5 
3  1   VISIT_PAGE  /toys/lego   123458889 4   5 
4  1   VISIT_PAGE  /toys/lego/ABC001 123459889 5   5 
5  1   PURCHASE       123460889 NULL  5

這可以使用熊貓完成嗎？這與Oracle中的LEAD功能非常相似。

來源

2017-02-27 Denis Kulagin

'pandas.Series.shift' –

假設你有一些主要動作類型的列表：這是一個骯髒的方式來做到這一點。請注意我沒有使用任何魔法大熊貓，但希望這個給你一些想法：

primaries = set("two") # set of primary actions 

# an example dataframe 
df = pd.DataFrame([[1,1,1,1,1], 
       ["one"] * 4 + ["two"] * 1, 
       ["/", "toys", "toys/lego", "toys/lego/ABC001"], 
       [1001, 1002, 1003, 1004, 1005]] 
      ).T 
df.columns = ["user_id", "action_type", "comment", "timestamp"] 

# reindexing to make it look like your sample 
df.index = range(1, len(df)+1) 
df.head() 

nt = [] # next_ids 
gl = [] # goal_ids 

for i in df.iterrows(): 
    if i[1]["action_type"] not in primaries: 
     nt.extend([i[0]+1]) 
    else: 
     nt.extend([None]) 
     gl.extend([i[0]] * len(nt)) 

new_df = pd.merge(df, pd.DataFrame({"next_id" : nt], "goal_id" : ids[gl]}))

來源

2017-02-27 21:46:09 putonspectacles

可能會奏效，如果排序的時間戳，TNX！仍然在尋找* Pandagick！*的方式） –

熊貓：訪問下一行（如鉛對Oracle）

回答

相關問題