2016-02-26 64 views
1

我有一張電影表,我想搜索標題並返回最接近的匹配。Postgres全文檢索按位置排序

我認爲全文搜索可能有用,但似乎無法按單詞的位置排序,儘管postgres知道位置。這是可能的postgres?

這裏是我的查詢:

SELECT collectibles.id, collectibles.title, ts_rank_cd(to_tsvector('english', collectibles.title), plainto_tsquery('old school')) AS score 
FROM collectibles WHERE to_tsvector('english', collectibles.title) @@ plainto_tsquery('old school') 
ORDER BY score DESC; 

下面是一些結果:(這是我似乎可以找出最佳的格式,對不起!)

id | title | score 

- 277568 | Wilson Meadows: Live At The 15th Old School & Blues Festival | 0.1 
- 3545 | 5 Film Collection: Will Ferrell: Campaign/Old School (Unrtated Version)/Blades Of Glory/Roxbury/Semi-Pro | 0.1 
- 10366 | Alice Cooper: Old School: 1964-1974 (DVD/CD Combo) | 0.1 
- 13004 | American Classics: Old School (3-Disc Set) | 0.1 
- 13005 | American Classics: Old School: Classic Chevrolets | 0.1 
- 13006 | American Classics: Old School: Classic Travel Trailers | 0.1 
- 13007 | American Classics: Old School: Kings Of Kustomizing | 0.1 
- 14592 | Anchorman: The Legend Of Ron Burgundy (Widescreen/ Extended Edition)/Old School (R-Rated Version) (Back-To-Back) | 0.1 
- 14593 | Anchorman: The Legend Of Ron Burgundy (Widescreen/ Extended Edition)/Old School (R-Rated Version) (Side-By-Side) | 0.1 
- 20242 | Audiovisualize: Mixed By Addictive TV: Snake Worship Island/Corp. Inc./Old School Futures/These Melodies/Robot War/... | 0.1 
- 192057 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition) | 0.1 
- 192058 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition)/Road Trip (R-Rated) (Back-To-Back) | 0.1 
- 192059 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition)/Road Trip (R-Rated) (Side-By-Side) | 0.1 
- 192060 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition)/Road Trip (Unrated) (Back-To-Back) | 0.1 
- 192061 | Old School (DreamWorks/ Widescreen/ Unrated Version/ Special Edition)/Road Trip (Unrated) (Side-By-Side) | 0.1 
- 192062 | Old School (Warner Brothers/ R-Rated Version) | 0.1 
- 192063 | Old School (Warner Brothers/ R-Rated Version/ Blu-ray) | 0.1 
- 192064 | Old School (Warner Brothers/ Unrated Version) | 0.1 
- 192065 | Old School (Warner Brothers/ Unrated Version/ Blu-ray) | 0.1 
- 192066 | Old School Comedy (4-Pack): Atoll K/Jack And The Beanstalk/The Flying Deuces/Africa Screams | 0.1 
- 192067 | Old School Hip Hop Dance #1: Beginner | 0.1 
- 192068 | Old School Hip Hop Greatest | 0.1 
- 192069 | Old School Hip Hop: Run DMC & Flava Flav (2-Disc) | 0.1 
- 192070 | Old School Hits Movie Marathon Collection (3-Disc) | 0.1 
- 192071 | Old School Returns | 0.1 

比分爲所有這些是0.1,但許多標題中的單詞的位置更靠近字符串的前面。有沒有辦法將這些排名更高?不幸的是,字符串或ID的長度並不是很好的等級限定符。

回答

0

這裏您需要使用ts_rank(tsvector,tsquery,normalization factor)函數的規範化。在下面的代碼片段中,我使用了normalization = 1(它將等級除以1 +文檔長度的對數),但您可以將其調整爲您真正需要的值。下面是例如:

WITH s(id,tsv) AS (VALUES 
    (1,to_tsvector('english','Alice Cooper: Old School: 1964-1974 (DVD/CD Combo)')), 
    (2,to_tsvector('english','American Classics: Old School: Kings Of Kustomizing')), 
    (3,to_tsvector('english','Old School Hip Hop Greatest')), 
    (4,to_tsvector('english','Old School Returns')) 
) 
SELECT id,ts_rank(tsv,tsq,1) AS rank 
FROM s,to_tsquery('english','old & school') tsq 
ORDER BY rank DESC; 

結果:

id | rank  
----+----------- 
    4 | 0.0495516 
    3 | 0.0383384 
    2 | 0.0353013 
    1 | 0.0312636 
(4 rows) 
+0

標準化是好,但我希望能通過放置單詞來排名。考慮尋找'硅谷': 硅谷的盜窟 - 硅谷:美國的經驗 - 硅谷:第一季完成。 理想情況下,硅谷在前面的標題將是第一個,但它不是最短的字符串。 如果postgres無法做到這一點,那麼我可能會使用規範化,然後切換到不同的搜索系統。 – d3vkit

+0

@ d3vkit add「(5,to_tsvector('english','Some Old School'))」並且它將獲得最高等級 – mpugach

0

documentation says

此外,*可以連接到一個詞位來指定前綴匹配

to_tsquery還可以接受單引號的短語

,你可以這樣做:

SELECT to_tsquery('''old school'':*'); 
     to_tsquery  
---------------------- 
'old':* & 'school':* 
(1 row) 

所以你的情況會是這樣的:

SELECT 
    collectibles.id, 
    collectibles.title, 
    ts_rank_cd(
    to_tsvector('english', collectibles.title), 
    to_tsquery('''old school'':*') 
) AS score 
FROM collectibles 
WHERE to_tsvector('english', collectibles.title) @@ to_tsquery('''old school'':*') 
ORDER BY score DESC;