2016-02-29 38 views
1

如果HDI值大於.5,則添加一個名爲ADJ_HDI的homework2數據框的新列,即HDI值,否則它等於零。在編輯值時在Pandas中添加一列

我們一直在嘗試幾個小時來創建這個沒有運氣的語法,任何人都可以請幫忙嗎?

回答

0

試試這個,假設你的HDI是在一個名爲「人類發展指數」列,您要創建一個新的列等於HDI,或0,如果是HDI < 0.5

def adj_hdi(row): 
    hdi = row['HDI'] 
    if hdi>.5: 
     return hdi 
    else: 
     return 0 
mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1) 
+0

生成一個警告,但是當我顯示它正在工作,比數據幀KS! –

0

替代解決方案:

homework2['ADJ_HDI'] = 0 
homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI'] 
0

我認爲你可以使用非常快速的解決方案與numpy.where

homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0) 

時序

import pandas as pd 
import numpy as np 

homework2 = pd.DataFrame({"A": [10, 8, 1, 1, 2, 2, 2], 
          "HDI": [25, np.nan, 2.3, 2.4, 1.2, 0.3, 5.7]}) 

#for test 7k uncomment row bellow 
#homework2 = pd.concat([homework2]*1000).reset_index(drop=True) 
print homework2 
h = homework2.copy() 
h1 = homework2.copy() 
def a(mydataframe): 
    def adj_hdi(row): 
     hdi = row['HDI'] 
     if hdi>.5: 
      return hdi 
     else: 
      return 0 
    mydataframe['ADJ_HDI'] = mydataframe.apply(lambda row: adj_hdi(row), axis = 1) 
    return mydataframe 

def b(homework2): 
    homework2['ADJ_HDI'] = 0 
    homework2.loc[(homework2['HDI'] > 0.5), ['ADJ_HDI']] = homework2['HDI'] 
    return homework2 

def c(homework2): 
    homework2['ADJ_HDI'] = np.where(homework2['HDI'] > .5, homework2['HDI'], 0) 
    return homework2 

print a(homework2)  
print b(h) 
print c(h1) 

len(homework2) = 7

In [2]: %timeit a(homework2) 
1000 loops, best of 3: 376 µs per loop 

In [3]: %timeit b(h) 
The slowest run took 4.62 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 1.49 ms per loop 

In [4]: %timeit c(h1) 
The slowest run took 5.52 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 283 µs per loop 

len(homework2) = 7k

In [7]: %timeit a(homework2) 
10 loops, best of 3: 106 ms per loop 

In [8]: %timeit b(h) 
100 loops, best of 3: 2.63 ms per loop 

In [9]: %timeit c(h1) 
The slowest run took 5.30 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 324 µs per loop 
相關問題