使用MultiIndex在DataFrame上索引

我有一個需要填充的大熊貓數據框。使用MultiIndex在DataFrame上索引

這裏是我的代碼：

trains = np.arange(1, 101) 
#The above are example values, it's actually 900 integers between 1 and 20000 
tresholds = np.arange(10, 70, 10) 
tuples = [] 
for i in trains: 
    for j in tresholds: 
     tuples.append((i, j)) 

index = pd.MultiIndex.from_tuples(tuples, names=['trains', 'tresholds']) 
df = pd.DataFrame(np.zeros((len(index), len(trains))), index=index, columns=trains, dtype=float) 

metrics = dict() 
for i in trains: 
    m = binary_metric_train(True, i) 
    #Above function returns a binary array of length 35 
    #Example: [1, 0, 0, 1, ...] 
    metrics[i] = m 

for i in trains: 
    for j in tresholds: 
     trA = binary_metric_train(True, i, tresh=j) 
     for k in trains: 
      if k != i: 
       trB = metrics[k] 
       corr = abs(pearsonr(trA, trB)[0]) 
       df[k][i][j] = corr 
      else: 
       df[k][i][j] = np.nan

我的問題是，當這一段代碼終於完成了計算，我的數據幀df仍然只含有零。即使沒有插入NaN。我認爲我的索引是正確的。另外，我還單獨測試我的binary_metric_train功能，它不會返回長度的數組35.

任何人能發現什麼，我在這裏失蹤？

編輯：爲了清楚起見，該數據幀是這樣的：

    1 2 3 4 5 ... 
trains tresholds 
    1   10 
       20 
       30 
       40 
       50 
       60 
    2   10 
       20 
       30 
       40 
       50 
       60 
    ...

來源

2015-04-23 JNevens

你執行[鏈接索引（http://pandas.pydata.org/pandas-docs/stable/indexing.html ＃indexing-view-versus-copy）可能有效也可能不行，爲了更新/添加值而建議的索引方法是使用新的'iloc'，'ix'或'loc'，請參見[ docs]（http://pandas.pydata.org/pandas-docs/stable/indexing.html） – EdChum

@EdChum我明白'loc'用於基於標籤的索引。我如何使用'loc'在我的DataFrame中使用此MultiIndex索引正確的值？類似於'df.loc [k] .loc [i] .loc [j]'？ – JNevens

你能解釋一下什麼k，我和j代表一個簡單的例子，你的代碼目前不能運行 – EdChum

由於@EdChum指出，你應該在pandas索引採取lookt。這裏有一些測試數據用於說明目的，這些數據應該可以解決問題。

import numpy as np 
import pandas as pd 

trains  = [ 1, 1, 1, 2, 2, 2] 
thresholds = [10, 20, 30, 10, 20, 30] 
data  = [ 1, 0, 1, 0, 1, 0] 
df = pd.DataFrame({ 
    'trains'  : trains, 
    'thresholds' : thresholds, 
    'C1'   : data, 
    'C2'   : data 
}).set_index(['trains', 'thresholds']) 

print df 
df.ix[(2, 30), 0] = 3 # using column index 
# or... 
df.ix[(2, 30), 'C1'] = 3 # using column name 
df.loc[(2, 30), 'C1'] = 3 # using column name 
# but not... 
df.loc[(2, 30), 1] = 3 # creates a new column 
print df

哪些修改前和修改後的輸出DataFrame：

    C1 C2 
trains thresholds   
1  10   1 1 
     20   0 0 
     30   1 1 
2  10   0 0 
     20   1 1 
     30   0 0 
        C1 C2 1 
trains thresholds    
1  10   1 1 NaN 
     20   0 0 NaN 
     30   1 1 NaN 
2  10   0 0 NaN 
     20   1 1 NaN 
     30   3 0 3

來源

2015-04-23 09:53:56 Matt

使用多索引進行多軸設置的語法是''df.loc [（1,20），1]'' – Jeff

是不是df.loc [1,20] [0]整數基於位置？因爲沒有標籤0的列。我更喜歡它是基於標籤的。 – JNevens

@Jeff謝謝，我修好了。 – Matt

使用MultiIndex在DataFrame上索引

回答

相關問題