0
我是Solr的新手,並嘗試索引一些文檔,其中每個文檔都是json。有一些文件的得分應該很高,但得分非常低。我查詢的字段類型是text_general。 需要對tfNorm,字段長度等字段有所瞭解瞭解Solr中的評分
附加是調試查詢的結果。
"718152d81b4db95f":"\n1.0891073 = sum of:\n 0.5578956 = weight(channel_genre:sports in 53) [SchemaSimilarity], result of:\n 0.5578956 = score(doc=53,freq=11.0 = termFreq=11.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 1.8740268 = tfNorm, computed from:\n 11.0 = termFreq=11.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 256.0 = fieldLength\n 0.53121173 = weight(channel_genre:kids in 53) [SchemaSimilarity], result of:\n 0.53121173 = score(doc=53,freq=12.0 = termFreq=12.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.8974556 = tfNorm, computed from:\n 12.0 = termFreq=12.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 256.0 = fieldLength\n",
"7071fa048f60603":"\n1.0834496 = sum of:\n 0.5491592 = weight(channel_genre:sports in 75) [SchemaSimilarity], result of:\n 0.5491592 = score(doc=75,freq=23.0 = termFreq=23.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 1.8446804 = tfNorm, computed from:\n 23.0 = termFreq=23.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 655.36 = fieldLength\n 0.53429043 = weight(channel_genre:kids in 75) [SchemaSimilarity], result of:\n 0.53429043 = score(doc=75,freq=29.0 = termFreq=29.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.9084525 = tfNorm, computed from:\n 29.0 = termFreq=29.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 655.36 = fieldLength\n",
"17e4a205707dc974":"\n1.0824875 = sum of:\n 0.62048614 = weight(channel_genre:sports in 64) [SchemaSimilarity], result of:\n 0.62048614 = score(doc=64,freq=24.0 = termFreq=24.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 2.0842745 = tfNorm, computed from:\n 24.0 = termFreq=24.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 163.84 = fieldLength\n 0.46200132 = weight(channel_genre:kids in 64) [SchemaSimilarity], result of:\n 0.46200132 = score(doc=64,freq=4.0 = termFreq=4.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.6502403 = tfNorm, computed from:\n 4.0 = termFreq=4.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 163.84 = fieldLength\n",
"1a48c3a658cc07af":"\n1.0820175 = sum of:\n 0.58498204 = weight(channel_genre:sports in 59) [SchemaSimilarity], result of:\n 0.58498204 = score(doc=59,freq=16.0 = termFreq=16.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 1.9650128 = tfNorm, computed from:\n 16.0 = termFreq=16.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 256.0 = fieldLength\n 0.49703547 = weight(channel_genre:kids in 59) [SchemaSimilarity], result of:\n 0.49703547 = score(doc=59,freq=8.0 = termFreq=8.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.7753801 = tfNorm, computed from:\n 8.0 = termFreq=8.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 256.0 = fieldLength\n",
"e073dacae12f494b":"\n1.0804946 = sum of:\n 0.5613358 = weight(channel_genre:sports in 17) [SchemaSimilarity], result of:\n 0.5613358 = score(doc=17,freq=19.0 = termFreq=19.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 1.8855827 = tfNorm, computed from:\n 19.0 = termFreq=19.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 455.1111 = fieldLength\n 0.51915884 = weight(channel_genre:kids in 17) [SchemaSimilarity], result of:\n 0.51915884 = score(doc=17,freq=17.0 = termFreq=17.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.8544034 = tfNorm, computed from:\n 17.0 = termFreq=17.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 455.1111 = fieldLength\n",
"c69628bbb1d9f3ca":"\n1.0785265 = sum of:\n 0.55884564 = weight(channel_genre:sports in 96) [SchemaSimilarity], result of:\n 0.55884564 = score(doc=96,freq=14.0 = termFreq=14.0\n), product of:\n 0.29769886 = idf(docFreq=223, docCount=300)\n 1.877218 = tfNorm, computed from:\n 14.0 = termFreq=14.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 334.36734 = fieldLength\n 0.51968086 = weight(channel_genre:kids in 96) [SchemaSimilarity], result of:\n 0.51968086 = score(doc=96,freq=13.0 = termFreq=13.0\n), product of:\n 0.27996004 = idf(docFreq=227, docCount=300)\n 1.8562679 = tfNorm, computed from:\n 13.0 = termFreq=13.0\n 1.2 = parameter k1\n 0.75 = parameter b\n 142.80667 = avgFieldLength\n 334.36734 = fieldLength\n",
據我所提交的「c69628bbb1d9f3ca」分數查詢應該比其他documents.What我很想念這裏瞭解高。請解釋。
在Solr的查詢是 - channel_genre:「體育」 AND channel_genre:「孩子」,即 返回的文檔數(誰看孩子和體育預先顯性用戶數):150 最大分值:1.2256454 我專門增加了100名經常觀看Kids和Sports的用戶來驗證他們是否進入前100名。但是有6名用戶降到100以下,「c69628bbb1d9f3ca」就是這樣一個用戶。只是想了解場地的長度是否對比分產生巨大影響。 – annu
鑑於你發佈的領域的分數接近,我會說它在這種情況下。 –
順便說一句,你考慮嘗試omitNorms =「true」在你的領域,(應該禁用長度標準化) –