我編寫了一個程序,該程序讀取一個CSV文件並計算兩列之間的相關性。問題是尋找相關性的標準方法在曲線和其他非線性函數上不起作用。是否有其他功能或簡單的方法來修改數據以確定相關性?以下是我的代碼,csv輸入和當前輸出。查找numpy.corr()的非線性函數的相關性
def findCorrelation(csvFileName):
data = pd.read_csv(csvFileName)
data = data.values
df = pd.DataFrame(data=data)
npList = np.asarray(df)
np2 = npList.astype(float)
df2 = pd.DataFrame(data=np2)
corr = df2.corr()
corr = corr.values
return corr[0][1]
def correlationMeaning(corr):
if corr == 1:
return ['perfect', 'positive', str(corr)]
elif corr > 0.9:
return ['high', 'positive', str(corr)]
elif corr > 0.5:
return ['medium', 'positive', str(corr)]
elif corr > 0.1:
return ['low', 'positive', str(corr)]
elif corr > -0.1:
return ['no', str(corr)]
elif corr > -0.5:
return ['low', 'negative', str(corr)]
elif corr > -0.9:
return ['medium', 'negative', str(corr)]
elif corr > -1:
return ['high', 'negative', str(corr)]
elif corr == -1:
return ['perfect', 'negative', str(corr)]
else:
return ['error']
print correlationMeaning(findCorrelation('CurveData.csv'))
CSV輸入:
Temp,Sales
30,50
34,52
38,54
42,56
46,58
50,60
54,62
58,62
62,60
66,58
70,56
74,54
78,52
82,50
輸出:
['no', '0.0']