2017-04-11 89 views
0

它在用戶的CSV文件的每列末尾總是會有額外的空白。這樣的CSV:從CSV中刪除每列末尾不同數量的NaN

847,73.3,809,74.9,655,80.6,694,45.5,647,47.8 
848,24.3,810,23.1,656,18.2,695,48.6,648,47.3 
566,26.1,541,7.8,438,19.1,463,45.5,433,18.2 
567,0.5,542,0.1,439,0.2,464,53.1,434,0.2 
426,0.0,407,0.0,330,0.0,348,98.6,326,0.0 
... 
339,37.9,324,74.9,,,349,1.4,, 
340,62.0,325,25.1,,,,,, 
341,0.1,326,0.0,,,,,, 

使用熊貓

pd.read_csv(ref_file) 

結果

0      694.0  45.5      647.0  47.8 
1      695.0  48.6      648.0  47.3 
2      696.0   5.6      649.0   4.8 
3      697.0   0.3      650.0   0.2 
4      698.0   0.0      432.0  81.6 
5      463.0  45.5      433.0  18.2 
6      464.0  53.1      434.0   0.2 
7      465.0   1.4      324.0  81.6 
8      466.0   0.0      325.0  18.4 
9      348.0  98.6      326.0   0.0 
10      349.0   1.4      NaN   NaN 
11      NaN   NaN      NaN   NaN 
12      NaN   NaN      NaN   NaN 

之後變成NaN的我試過

df.last_valid_index() 

,但只檢查第一列。所有這一列最後都有不同數量的NaN,在這種情況下如何去除NaN?

編輯:我試過.dropna()。它不起作用,因爲它根據NaN列的最大數量切割所有行。我想剪下NaN只是每列的數字,最後應該有不同數量的行。

+0

您是否嘗試過['df.dropna()'](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html)? – Craig

+0

@克雷格感謝您的建議。我剛剛嘗試過,但它刪除了NaN的所有行。現在每列都被切成9,這不是我的目的。我想在每列中刪除NaN。 – Jan

+3

熊貓不會那樣工作。所有列在數據框中具有相同的長度,並且任何缺失值都以NaN表示爲佔位符。缺失值由Pandas中的操作正確處理。你想通過移除NaN來解決什麼問題? – Craig

回答

2

如果你想作爲每列的列表和這些名單爲一系列

df.T.stack().groupby(level=0).apply(list) 

0 [847.0, 848.0, 566.0, 567.0, 426.0, 339.0, 340... 
1  [73.3, 24.3, 26.1, 0.5, 0.0, 37.9, 62.0, 0.1] 
2 [809.0, 810.0, 541.0, 542.0, 407.0, 324.0, 325... 
3   [74.9, 23.1, 7.8, 0.1, 0.0, 74.9, 25.1, 0.0] 
4     [655.0, 656.0, 438.0, 439.0, 330.0] 
5       [80.6, 18.2, 19.1, 0.2, 0.0] 
6   [694.0, 695.0, 463.0, 464.0, 348.0, 349.0] 
7     [45.5, 48.6, 45.5, 53.1, 98.6, 1.4] 
8     [647.0, 648.0, 433.0, 434.0, 326.0] 
9       [47.8, 47.3, 18.2, 0.2, 0.0] 
dtype: object 

否則,如果你想每一行作爲一個列表。

df.stack().groupby(level=0).apply(list) 

0 [847.0, 73.3, 809.0, 74.9, 655.0, 80.6, 694.0,... 
1 [848.0, 24.3, 810.0, 23.1, 656.0, 18.2, 695.0,... 
2 [566.0, 26.1, 541.0, 7.8, 438.0, 19.1, 463.0, ... 
3 [567.0, 0.5, 542.0, 0.1, 439.0, 0.2, 464.0, 53... 
4 [426.0, 0.0, 407.0, 0.0, 330.0, 0.0, 348.0, 98... 
5    [339.0, 37.9, 324.0, 74.9, 349.0, 1.4] 
6       [340.0, 62.0, 325.0, 25.1] 
7        [341.0, 0.1, 326.0, 0.0] 
dtype: object