2017-03-09 41 views
1

的上述細胞,添加列,每個小區相鄰的小區比較我是新手,還沒有蟒蛇相鄰列

我的數據框有兩列和多行:

Customer_Acquired_Date|Customer_mobile_number 
1/20/2017|100000001 
2/2/2017|100000002 
2/12/2017|100000001 
2/23/2017|100000004 
3/1/2017|100000005 
3/7/2017|100000004 

我要添加名爲「RepeatOrNew」的列 此新列中的每個單元格都將在相鄰列的上述單元格中查找客戶手機號碼。如果存在,則輸入「Repeat」,如果不存在,則輸入「New」。

輸出:

Customer_Acquired_Date|Customer_mobile_number|RepeatOrNew 
1/20/2017|100000001|New 
2/2/2017|100000002|New 
2/12/2017|100000001|Repeat 
2/23/2017|100000004|New 
3/1/2017|100000005|New 
3/7/2017|100000004|Repeat 

我完全空白哪裏開始。請協助。

謝謝, Ninad。

回答

0

您可以使用grouping組合,GroupBy和numpy的的where功能的cumcount方法來獲取所需輸出。以下應構成一個像樣的起點:

import pandas as pd 
import numpy as np 
from io import StringIO 


data_stream = StringIO("""Customer_Acquired_Date|Customer_mobile_number 
1/20/2017|100000001 
2/2/2017|100000002 
2/12/2017|100000001 
2/23/2017|100000004 
3/1/2017|100000005 
3/7/2017|100000004""") 

customers = pd.read_table(data_stream, sep="|", header=0) 


counter = customers.groupby('Customer_mobile_number').cumcount() 
customers['RepeatOrNew'] = np.where(counter == 0, 'New','Repeat') 

或者一個班輪:

customers['RepeatOrNew'] = customers.groupby('Customer_mobile_number').cumcount().apply(lambda x: 'New' if x == 0 else 'Repeat') 

應該產生這樣的:

Customer_Acquired_Date Customer_mobile_number RepeatOrNew 
0    1/20/2017    100000001   New 
1    2/2/2017    100000002   New 
2    2/12/2017    100000001  Repeat 
3    2/23/2017    100000004   New 
4    3/1/2017    100000005   New 
5    3/7/2017    100000004  Repeat 

我希望這個證明是有用的。