Python Pandas Dataframe：使用列中的值創建新列

我搜索了幾本書籍和網站，找不到與我想要做的事情完全相符的內容。我想從一個數據幀創建細目清單和重新配置數據，像這樣：Python Pandas Dataframe：使用列中的值創建新列

 A  B    A  B  C  D 
0  1  aa   0 1  aa 
1  2  bb   1 2  bb 
2  3  bb   2 3  bb aa 
3  3  aa  --\ 3 4  aa bb dd 
4  4  aa  --/ 4 5  cc 
5  4  bb 
6  4  dd 
7  5  cc

我和分組實驗，堆垛，拆垛等，但沒有，我已經嘗試產生了預期的結果。如果它不是很明顯，我對Python非常陌生，解決方案會很棒，但對我需要遵循的過程的理解是完美的。

在此先感謝

來源

2015-02-05 Velcro

使用熊貓可以查詢所有結果，例如其中A = 4。

一個粗糙但工作的方法是遍歷各種索引值，並將所有「喜歡」結果收集到一個numpy數組中，並將其轉換爲新的數據框。

僞代碼來說明我的例子：（將需要重寫實際工作）

l= [0]*df['A'].max() 
for item in xrange(df['A'].max()): 
    l[item] = df.loc[df['A'].isin(item)] 

df = pd.DataFrame(l) 
# or something of the sort

我希望幫助。從評論

更新：

animal_list=[] 

for animal in ['cat','dog'...]: 
    newdf=df[[x.is('%s'%animal) for x in df['A']]] 

    body=[animal]  
    for item in newdf['B'] 
     body.append(item) 

    animal_list.append(body) 

df=pandas.DataFrame(animal_list)

來源

2015-02-05 15:33:32 user2589273

感謝user2589273 ......我怕我是不是在我的例子不夠具體。兩列中的實際數據都是由字符串組成的，當我嘗試這些時，它會抱怨嘗試乘以字符串。爲了幫助我理解，第一行是幹什麼的？ – Velcro 2015-02-05 20:45:19

嘗試df = df.convert_objects（convert_numeric = True）將字符串轉換爲數據框的浮點數。或者更具體地說df ['A'] = df ['A']。convert_objects（convert_numeric = True）。我的第一行是創建一個空的零數組，因爲我不知道你的值是連續的還是有差距... – user2589273 2015-02-05 23:57:57

我也意識到我對max的使用可能是不正確的 - 現在編輯答案 – user2589273 2015-02-05 23:58:24

一個快速和骯髒的方法，將與字符串的工作。根據需要自定義列命名。

data = {'A': [1, 2, 3, 3, 4, 4, 4, 5], 
     'B': ['aa', 'bb', 'bb', 'aa', 'aa', 'bb', 'dd', 'cc']} 
df = pd.DataFrame(data) 

maxlen = df.A.value_counts().values[0] # this helps with creating 
            # lists of same size 

newdata = {} 
for n, gdf in df.groupby('A'): 
    newdata[n]= list(gdf.B.values) + [''] * (maxlen - len(gdf.B)) 

# recreate DF with Col 'A' as index; experiment with other orientations 
newdf = pd.DataFrame.from_dict(newdict, orient='index') 

# customize this section 
newdf.columns = list('BCD') 
newdf['A'] = newdf.index 
newdf.index = range(len(newdf)) 
newdf = newdf.reindex_axis(list('ABCD'), axis=1) # to set the desired order 

print newdf

結果是：

 
    A B C D 
0 1 aa   
1 2 bb   
2 3 bb aa  
3 4 aa bb dd 
4 5 cc

來源

2015-02-06 18:29:53 clocker

Python Pandas Dataframe：使用列中的值創建新列

回答

相關問題