熊貓系列與整個數據框之間的相關性

我有一系列值，我正在計算給定表的每一行的皮爾森相關性。熊貓系列與整個數據框之間的相關性

我該怎麼做？

例子：

import pandas as pd 

v = [-1, 5, 0, 0, 10, 0, -7] 
v1 = [1, 0, 0, 0, 0, 0, 0] 
v2 = [0, 1, 0, 0, 1, 0, 0] 
v3 = [1, 1, 0, 0, 0, 0, 1] 

s = pd.Series(v) 
df = pd.DataFrame([v1, v2, v3], columns=['a', 'b', 'c', 'd', 'e', 'f', 'g']) 

# Here I expect ot do df.corrwith(s) - but won't work

使用Series.corr()來計算，預計產量

-0.1666666666666666 # correlation with the first row 
0.83914639167827343 # correlation with the second row 
-0.35355339059327379 # correlation with the third row

來源

2017-01-23 bluesummers

你需要的Series相同index作爲DataFramecolumns爲對齊Series通過DataFrame和corrwith添加axis=1爲行相關：

s1 = pd.Series(s.values, index=df.columns) 
print (s1) 
a -1 
b  5 
c  0 
d  0 
e 10 
f  0 
g -7 
dtype: int64 

print (df.corrwith(s1, axis=1)) 
0 -0.166667 
1 0.839146 
2 -0.353553 
dtype: float64

print (df.corrwith(pd.Series(v, index=df.columns), axis=1)) 
0 -0.166667 
1 0.839146 
2 -0.353553 
dtype: float64

編輯：

您可以指定列和使用子集：

cols = ['a','b','e'] 

print (df[cols]) 
    a b e 
0 1 0 0 
1 0 1 1 
2 1 1 0 

print (df[cols].corrwith(pd.Series(v, index=df.columns), axis=1)) 
0 -0.891042 
1 0.891042 
2 -0.838628 
dtype: float64

來源

2017-01-23 12:46:22 jezrael

謝謝，什麼是新手的錯誤......正是我所需要的 – bluesummers

沒問題怎麼樣，如果數據框有更多的列，你會忽略它嗎？這意味着你只想計算只有匹配列進行索引的相關性，而忽略其他索引。 – bluesummers

請檢查編輯是否爲你想要的。 – jezrael

熊貓系列與整個數據框之間的相關性

回答

相關問題