2011-02-02 168 views
0

我有一個默認的示例詞典,看起來像這樣:爲什麼這個功能不能用於我的輸入?

critics = {'Lisa Rose': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.5, 
'Just My Luck': 3.0, 'Superman Returns': 3.5, 'You, Me and Dupree': 2.5, 
'The Night Listener': 3.0}, 
      'Gene Seymour': {'Lady in the Water': 3.0, 'Snakes on a Plane': 3.5, 
'Just My Luck': 1.5, 'Superman Returns': 5.0, 'The Night Listener': 3.0, 
'You, Me and Dupree': 3.5}, 
      'Michael Phillips': {'Lady in the Water': 2.5, 'Snakes on a Plane': 3.0, 
'Superman Returns': 3.5, 'The Night Listener': 4.0}, 
'Claudia Puig': {'Snakes on a Plane': 3.5, 'Just My Luck': 3.0, 
'The Night Listener': 4.5, 'Superman Returns': 4.0, 
'You, Me and Dupree': 2.5}, 
      'Mick LaSalle': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 
'Just My Luck': 2.0, 'Superman Returns': 3.0, 'The Night Listener': 3.0, 
'You, Me and Dupree': 2.0}, 
'Jack Matthews': {'Lady in the Water': 3.0, 'Snakes on a Plane': 4.0, 
'The Night Listener': 3.0, 'Superman Returns': 5.0, 'You, Me and Dupree': 3.5}, 
      'Toby': {'Snakes on a Plane':4.5,'You, Me and Dupree':1.0,'Superman Returns':4.0}} 

我使用,使用Pearson相關係數,看起來像這樣返回最相似的人在字典中的功能:

from math import sqrt 
def sim_pearson(prefs,p1,p2): 
# lista na zaednichki tochki 
    si={} 
    for item in prefs[p1]: 
     if item in prefs[p2]: si[item]=1 
# najdi go brojot na elementi 
    n=len(si) 
# ako nemaat zaednichki tochki vrati 0 
    if n==0: return 0 
# dodadi gi site 
    sum1=sum([prefs[p1][it] for it in si]) 
    sum2=sum([prefs[p2][it] for it in si]) 
# sumiraj gi kvadratite 
    sum1Sq=sum([pow(prefs[p1][it],2) for it in si]) 
    sum2Sq=sum([pow(prefs[p2][it],2) for it in si]) 
# sumiraj gi proizvodite 
    pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si]) 
# presmetka na Pirsonoviot koeficient 
    num=pSum-(sum1*sum2/n) 
    den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n)) 
    if den==0: return 0 
    r=num/den 
    return r 

它的工作原理。例如,對於呼叫print sim_pearson(critics, 'Toby', 'Lisa Rose'),我得到係數0.991240707162。

然而,當我嘗試用我的字典相同的功能是:

tests = {'dzam': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'kex': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'rokoko': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     '[email protected]': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj3AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiMAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiBAgw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjtAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj_AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiIAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj9AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiqAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjzAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxikAgw': 3.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiaAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxj1AQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjxAQw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiYAgw': 5.0}, 
     'seljak': {'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxiKAgw': 5.0, 
'ag1yYW5kb20tcmFuZG9tcg8LEghib29rbWFyaxjvAQw': 1.0, }} 

我總是得到1.0,不管我有匹配的字典,這是爲什麼呢?

順便說一下,我使用散列,所以我的字典必須有這個長字符串。 :)

+0

整數和分組不相處得很好。請我們從__future__進口部門`看看是否是這個問題。 – 2011-02-02 12:32:10

+2

你所有的字符串在你的失敗測試中都是一樣的 - 是你想要的嗎?如果是這樣,這就是爲什麼你得到1.0,表明完美的相關性,因爲一切都是相同的。 – payne 2011-02-02 12:33:29

回答

1

你很可能被隱藏在眼睛裏的長鑰匙所迷惑,這些鑰匙串是不同的。

嘗試在測試'seljak'中將所有值設置爲0並與其運行關聯。你會看到一個0相關:

print sim_pearson(tests, '[email protected]', 'seljak') 

變化試驗'seljak'的最後一個值1,你會看到一個負相關關係重新運行該腳本。