大熊貓含系列陣列

我有一個熊貓數據框柱看起來有點像：大熊貓含系列陣列

Out[67]: 
0  ["cheese", "milk... 
1  ["yogurt", "cheese... 
2  ["cheese", "cream"... 
3  ["milk", "cheese"...

現在，最終我想這是一個平坦的列表，但在試圖拉平這個，我注意到，大熊貓對待["cheese", "milk", "cream"]作爲str而非list

我將如何去壓扁這使我結束了：

["cheese", "milk", "yogurt", "cheese", "cheese"...]

[編輯] 所以下面給出的答案似乎是：

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"])

s = s.str.strip("[]") 
df = s.str.split(',', expand=True) 
df = df.applymap(lambda x: x.replace("'", '').strip()) 
l = df.values.flatten() 
print (l.tolist())

這是偉大的，問題解答，答案接受，但它給我的印象相當不雅的解決方案。

來源

2016-03-01 toast

的可能的複製[蟒蛇大熊貓展平數據幀到列表（http://stackoverflow.com/questions/25440008/python- pandas-flatten-a-dataframe-to-a-list） – soon

不，它不是重複的，因爲列的類型是字符串而不是列表 – jezrael

您可以使用numpy.flatten然後平嵌套lists - see：

print df 
        a 
0 [cheese, milk] 
1 [yogurt, cheese] 
2 [cheese, cream] 

print df.a.values 
[[['cheese', 'milk']] 
[['yogurt', 'cheese']] 
[['cheese', 'cream']]] 

l = df.a.values.flatten() 
print l 
[['cheese', 'milk'] ['yogurt', 'cheese'] ['cheese', 'cream']] 

print [item for sublist in l for item in sublist] 
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

編輯：

您可以嘗試：

import pandas as pd 

s = pd.Series(["['cheese', 'milk']", "['yogurt', 'cheese']", "['cheese', 'cream']"]) 

#remove [] 
s = s.str.strip('[]') 
print s 
0  'cheese', 'milk' 
1 'yogurt', 'cheese' 
2  'cheese', 'cream' 
dtype: object 

df = s.str.split(',', expand=True) 
#remove ' and strip empty string 
df = df.applymap(lambda x: x.replace("'", '').strip()) 
print df 
     0  1 
0 cheese milk 
1 yogurt cheese 
2 cheese cream 

l = df.values.flatten() 
print l.tolist() 
['cheese', 'milk', 'yogurt', 'cheese', 'cheese', 'cream']

來源

2016-03-01 11:59:58 jezrael

我認爲'df.values.a中有一個錯字.flatten（）'它應該是'df.a.values.flatten（）' – shanmuga

是的，你是對的。我糾正它。謝謝。 – jezrael

這只是爲我打印每個單獨的字母： 's = pd.Series（[「['cheese'，'milk']」，「['酸奶'，'奶酪']」，「'乾酪' '）''））' 'l = s.values.flatten（）' 'print（[sublist中的item列表中的item列表]）' – toast

從STR轉換列值，列出你可以使用df.columnName.tolist()和壓扁你可以做df.columnName.values.flatten()

來源

2016-03-01 11:59:49

您可以將Series轉換成DataFrame，然後調用stack：

s.apply(pd.Series).stack().tolist()

來源

2016-03-01 12:27:28 Colin

這會返回一個包含['milk'，'cheese']'s = pd.Series（[「['cheese'，'milk']」）的字符串列表，「（'酸奶'，'奶酪']」，「['奶酪'，'奶油']」]）' 's.apply（pd.Series）.stack（）。tolist（）' – toast

從原始描述中，我認爲這是'Series'的類型是字符串列表：'s2 = pd.Series（[['cheese'，'milk']，['yogurt'，'cheese']，['cheese '，'cream']]）'，在這種情況下's2.apply（pd.Series）.stack（）。tolist（）'應該工作。如果'Series'的類型是一個表示字符串列表的字符串，那麼可以添加一個eval：'s.apply（lambda x：pd.Series（eval（x）））。stack（）。tolist（）' – Colin

大熊貓含系列陣列

回答

相關問題