2017-05-07 71 views
0

我需要改變我的代碼以NumPy二維數組,而不是pandas數據幀的工作:更改代碼來處理NumPy的陣列,而不是熊貓據幀

df = pd.DataFrame(data=np.array([[nan, 2, 3], [4, 5, 6], [7, 8, 9]]), columns=["col1", "col2", "col3"]) 

list_of_NA_features = ["col1"] 

for feature in list_of_NA_features: 
    for index,row in df.iterrows(): 
     if (pd.isnull(row[feature]) == True): 
      missing_value = 5 # for simplicity, let's put 5 instead of a function 
      df.ix[index,feature] = missing_val 

什麼是來爲他們做for index,row in df.iterrows():pd.isnull(row[feature]) == True和​​的正確方法NumPy數組?

這是我迄今所做的:

np_arr = df.as_matrix 

for feature in list_of_NA_features: 
    for feature in xrange(np_arr.shape[1]): 
     # ??? 

我怎樣才能到能夠執行np_arr[irow,feature]行的指數?在NumPy數組中,將值分配給特定行和列的正確方法是什麼:​​?

UPDATE

我通過刪除功能fill_missing_values並與值5代它簡化了代碼。但是,在我的真實情況下,我需要估計缺失值。

+1

我認爲正確的方法是使用量化的方法。但是如果不能看到一個小的可重複的樣本數據集和一個期望的數據集就很困難...... ;-) – MaxU

+1

我建議添加一個簡單的示例數據框。 –

+0

@AndrasDeak:這只是一個返回整數的函數。事實上,在這種情況下它並不重要。所以我沒有解釋這個功能。 – Dinosaurius

回答

-1

設置

#setup a numpy array the same as your Dataframe 
a = np.array([[np.nan, 2., 3.], 
     [ 4., 5., 6.], 
     [ 7., 8., 9.]]) 

#list_of_NA_features now contains the column index in the numpy array 
list_of_NA_features = [0] 

解決方案:

#Now you can see how those operations can be carried out on a numpy array. I'm just saying you can do this on a numpy array in the way you did it on a Dataframe. I'm not saying this is the best way of doing what you are trying to do. 
for feature in list_of_NA_features: 
    for index, row in enumerate(a): 
     if np.isnan(row[feature]): 
      missing_value = 5 
      a[index,feature] = missing_value 

Out[167]: 
array([[ 5., 2., 3.], 
     [ 4., 5., 6.], 
     [ 7., 8., 9.]]) 
+0

那麼顯示正確的方式呢? –

+0

這不是OP所要求的。 – Allen