蟒蛇大熊貓合併文本的兩行或多行成一條線

我有類似下面的文本數據的數據幀，蟒蛇大熊貓合併文本的兩行或多行成一條線

name | address     | number 
1 Bob bob      No.56 
2   @gmail.com   
3 Carly [email protected]   No.90 
4 Gorge [email protected]  
5   .com     
6          No.100

，並希望使它像這個幀。

name | address    | number 
1 Bob [email protected]   No.56 
2 Carly [email protected]   No.90     
3 Gorge [email protected]   No.100

我正在使用熊貓來讀取文件，但不知道如何使用合併或concat。

來源

2017-02-15 TTaa

在name列的情況下，由獨特的價值觀，

print df 

    name   address number 
0 Bob    bob No.56 
1 NaN  @gmail.com  NaN 
2 Carly [email protected] No.90 
3 Gorge  [email protected]  NaN 
4 NaN    .com  NaN 
5 NaN    NaN No.100 

df['name'] = df['name'].ffill() 
print df.fillna('').groupby(['name'], as_index=False).sum() 

    name   address number 
0 Bob [email protected] No.56 
1 Carly [email protected] No.90 
2 Gorge [email protected] No.100

你可能需要ffill()，bfill()，[::-1]，.groupby('name').apply(lambda x: ' '.join(x['address']))，strip()，lstrip()，rstrip()，replace()種事情擴展上面的代碼更復雜的數據。

來源

2017-02-15 04:19:18 su79eu7k

如果要轉換性行的數據幀（每列中可能有NaN條目），則可能沒有直接的pandas方法。

你需要一些代碼在name列賦值，使大熊貓能夠知道bob的分離行和@gmail.com屬於同一用戶Bob。

您可以使用fillna或ffill方法填寫第name列中的每個空條目，請參閱pandas dataframe missing data。

df ['name'] = df['name'].ffill() 

# gives 
    name address number 
0 Bob bob No.56 
1 Bob @gmail.com 
2 Carly [email protected] No.90 
3 Gorge [email protected] 
4 Gorge .com  
5 Gorge  No.100

然後你可以使用groupby和sum作爲聚合功能。

df.groupby(['name']).sum().reset_index() 

# gives 
    name address number 
0 Bob [email protected] No.56 
1 Carly [email protected] No.90 
2 Gorge [email protected] No.100

您可能會發現NaN和空白有用之間的轉換，見Replacing blank values (white space) with NaN in pandas和pandas.DataFrame.fillna。

來源

2017-02-15 04:02:16

蟒蛇大熊貓合併文本的兩行或多行成一條線

回答

相關問題