如何用熊貓創建一個具有特定值的列表？

我有一個非常大的.csv文件是這樣的：如何用熊貓創建一個具有特定值的列表？

column1,id,column3,column4,words,column6 
string,309483,0,0,hi#1,string string .... 
string,234234,0.344,0,hello#1,string string .... 
... 
string,89789,0,.56799,world#1,string string .... 
string,212934,0.8967,0,wolf#1 web#1 mouse#3,string string ....

我想在列表中已在column3比0浮點數更大，將它們放入一個列表，例如所有words提取，對於上面的例子，這將是輸出：

[hello#1, wolf#1, web#1, mouse#3]

任何想法如何用熊貓來解決這個任務？先謝謝你們。

來源

2015-03-25 newWithPython

你想追加只有第四列的值嗎？ – 2015-03-25 04:13:01

感謝@AvinashRaj的提要。是的第四列的所有值 – newWithPython 2015-03-25 04:13:33

感謝您的評論@ cphlewis – newWithPython 2015-03-25 04:58:26

' '.join(df[df.column3 > 0].words).split(' ')

結果從測試數據：

['hello＃1'，'wolf＃1'，'web＃1'，'mouse＃3']

熊貓語法在中間選擇正確的行; join所有的單詞 - colunn值在一起，split它們分開成單獨的單詞。

來源

2015-03-25 04:53:37 cphlewis

校正：

可以與iterrows做到這一點，但它不是簡明上述溶液中：

import itertools 

your_list = list(row[1]['words'].split(' ') for row in dataframe.iterrows() if row[1]['column 3'] > 0) 
chain = itertools.chain(*your_list) 
your_list = list(chain)

來源

2015-03-25 04:22:30 kennes

@cphlewis我明白你的意思了。感謝您的更正。 – kennes 2015-03-25 11:58:48

如果你想要的所有唯一單詞的列表：

df[df.column3 > 0].words.unique()

您可以通過執行

list(df[df.column3 > 0].words.unique())

施放此列表或使用，這將是numpy的陣列方法比快以上：

df[df.column3 > 0].words.unique().values.tolist()

來源

2015-03-25 19:53:22 EdChum

非常感謝您的提示，'不可（）'或'unique（）'？ – newWithPython 2015-03-25 19:58:03

對不起，這是一個錯字，現在會修復 – EdChum 2015-03-25 20:21:20

如何用熊貓創建一個具有特定值的列表？

回答

相關問題