2017-02-10 69 views
2

我如何(對於比所提供示例大得多的矩陣有效地)返回最大的n的列名稱和索引(或行名稱)或最小值返回熊貓數據系列中第n個最大值的索引和列名稱

import pandas as pd 
import numpy as np 

dates = pd.date_range('20130101', periods=6) 
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) 
matrix = df.corr() 
matrix 
      A   B   C   D 
A 1.000000 -0.814913 0.495993 -0.880296 
B -0.814913 1.000000 -0.211421 0.551441 
C 0.495993 -0.211421 1.000000 -0.414037 
D -0.880296 0.551441 -0.414037 1.000000 

然後,我會做一些如

def get_n_smallest(matrix, n): 
    # can return as two variables, list, tuple, whatever... 
    return row_name, col_name 

get_n_smallest(matrix,0) 
# would return D, A for the value -.880296 
+0

@JohnGalt但隨後這僅僅是最低的,而不是第n個最低 – thefoxrocks

+0

真,怎麼樣'matrix.unstack()sort_values()指數[。 n-1]'爲第n小? – Zero

回答

1

我認爲你可以使用stackSeries,然後通過drop_duplicatessort_values並刪除重複通過索引index得到MultiIndex值:

np.random.seed(100) 
dates = pd.date_range('20130101', periods=6) 
df = pd.DataFrame(np.random.randn(6,4), index=dates, columns=list('ABCD')) 
matrix = df.corr() 
print (matrix) 
      A   B   C   D 
A 1.000000 0.570860 -0.558334 -0.434793 
B 0.570860 1.000000 -0.358834 -0.564178 
C -0.558334 -0.358834 1.000000 0.170589 
D -0.434793 -0.564178 0.170589 1.000000 

print (matrix.stack().drop_duplicates().sort_values()) 
B D -0.564178 
A C -0.558334 
    D -0.434793 
B C -0.358834 
C D 0.170589 
A B 0.570860 
    A 1.000000 
dtype: float64 

def get_n_smallest(matrix, n): 
    return matrix.stack().drop_duplicates().sort_values().index[n] 

print (get_n_smallest(matrix,0)) 
('B', 'D') 

print (get_n_smallest(matrix,1)) 
('A', 'C') 

print (get_n_smallest(matrix,2)) 
('A', 'D') 

def get_n_largest(matrix, n): 
    return matrix.stack().drop_duplicates().sort_values(ascending=False).index[n] 


print (get_n_largest(matrix,0)) 
('A', 'A') 

print (get_n_largest(matrix,1)) 
('A', 'B') 

print (get_n_largest(matrix,2)) 
('C', 'D')