2016-07-22 123 views
3

使用pandas first_valid_index()來獲得列的第一個非空值的索引,我該如何移動列的單個值而不是整列。即如何移動一個熊貓數據幀列的單個值

data = {'year': [2010, 2011, 2012, 2013, 2014, 2015, 2016,2017, 2018, 2019], 
     'columnA': [10, 21, 20, 10, 39, 30, 31,45, 23, 56], 
     'columnB': [None, None, None, 10, 39, 30, 31,45, 23, 56], 
     'total': [100, 200, 300, 400, 500, 600, 700,800, 900, 1000]} 

df = pd.DataFrame(data) 
df = df.set_index('year') 
print df 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10  10 400 
2014  39  39 500 
2015  30  30 600 
2016  31  31 700 
2017  45  45 800 
2018  23  23 900 
2019  56  56 1000 

for col in df.columns: 
    if col not in ['total']: 
     idx = df[col].first_valid_index() 
     df.loc[idx, col] = df.loc[idx, col] + df.loc[idx, 'total'].shift(1) 

print df  

AttributeError: 'numpy.float64' object has no attribute 'shift' 

期望的結果:

print df 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10  310 400 
2014  39  39 500 
2015  30  30 600 
2016  31  31 700 
2017  45  45 800 
2018  23  23 900 
2019  56  56 1000 

回答

1

您可以過濾所有列名,其中是至少一個NaN值,然後使用uniontotal柱:

for col in df.columns: 
    if col not in pd.Index(['total']).union(df.columns[~df.isnull().any()]): 
     idx = df[col].first_valid_index() 
     df.loc[idx, col] += df.total.shift().loc[idx] 
print (df) 
     columnA columnB total 
year       
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10 310.0 400 
2014  39  39.0 500 
2015  30  30.0 600 
2016  31  31.0 700 
2017  45  45.0 800 
2018  23  23.0 900 
2019  56  56.0 1000 
+0

是否總是最後一列? – jezrael

+0

或更好,如果在'Total'列是'NaN'值,是可能的嗎? – jezrael

+0

是的,總數可以有NaN值 – ArchieTiger

2

是你想要的嗎?

In [63]: idx = df.columnB.first_valid_index() 

In [64]: df.loc[idx, 'columnB'] += df.total.shift().loc[idx] 

In [65]: df 
Out[65]: 
     columnA columnB total 
year 
2010  10  NaN 100 
2011  21  NaN 200 
2012  20  NaN 300 
2013  10 310.0 400 
2014  39  39.0 500 
2015  30  30.0 600 
2016  31  31.0 700 
2017  45  45.0 800 
2018  23  23.0 900 
2019  56  56.0 1000 

UPDATE:從熊貓0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers開始。

+0

是,但我得到'nan'爲'columnA' – ArchieTiger

+0

'在df.columns山坳: if col in not ['total']: idx = df [col] .first_valid_index() print df.ix [idx,col] + df.total.shift()。ix [idx]' – ArchieTiger

+0

@ArchieTiger ,爲什麼你使用for循環? – Merlin