2012-01-11 72 views
0

我使用的思維獅身人面像寶石我的查詢需要大約45秒完成(13萬條記錄,包含索引的文件夾爲1.1GB)。我假設我有一些配置不正確(第一次Sphinx用戶)。無論如何,讓我知道,如果你看到任何看起來不對勁。下面是我的配置:想法減少搜索時間在獅身人面像

define_index do 
    indexes :name 
    indexes :summary 
    indexes :tag_list 

    indexes categories.name, :as => :category_name 

    has "RADIANS(lat)", :as => :latitude, :type => :float 
    has "RADIANS(lng)", :as => :longitude, :type => :float 

    set_property :field_weights => { 
    :name   => 8, 
    :summary  => 6, 
    :category_name => 5, 
    :tag_list  => 3 
    } 
    set_property :delta => ThinkingSphinx::Deltas::ResqueDelta 
    set_property :ignore_chars => %w(' -) 
end 

下面是一個例子查詢:

Location.search('Restaurant', 
       :geo => [0.5837843098436726,-1.9560609568879357], 
       :latitude_attr => "latitude", 
       :longitude_attr => "longitude", 
       :with => {"@geodist" => 0.0..4000.0}, 
       :include => :categories, 
       :page => 1, 
       :per_page => 100) 

我的日誌顯示:

Sphinx Query (43066.3ms) restaurant 
Sphinx Found 467 results 

我會繼續通過文檔挖掘和嘗試的東西!

UPDATE:我development.sphinx.conf

indexer 
{ 
} 

searchd 
{ 
    listen = 127.0.0.1:9312 
    log = /project_path/log/searchd.log 
    query_log = /project_path/log/searchd.query.log 
    pid_file = /project_path/log/searchd.development.pid 
} 

source location_core_0 
{ 
    type = pgsql 
    sql_host = localhost 
    sql_user = user 
    sql_pass = pass 
    sql_db = db_name 
    sql_query_pre = UPDATE "business_entities" SET "delta" = FALSE WHERE "delta" = TRUE 
    sql_query_pre = SET TIME ZONE 'UTC' 
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = FALSE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type" 
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = FALSE 
    sql_attr_uint = sphinx_internal_id 
    sql_attr_uint = sphinx_deleted 
    sql_attr_uint = class_crc 
    sql_attr_float = latitude 
    sql_attr_float = longitude 
    sql_attr_string = sphinx_internal_class 
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0)/1) 
} 

index location_core 
{ 
    source = location_core_0 
    path = /project_path/db/sphinx/development/location_core 
    morphology = stem_en 
    charset_type = utf-8 
    ignore_chars = ', - 
    enable_star = 1 
} 

source location_delta_0 : location_core_0 
{ 
    type = pgsql 
    sql_host = localhost 
    sql_user = user 
    sql_pass = pass 
    sql_db = db_name 
    sql_query_pre = 
    sql_query_pre = SET TIME ZONE 'UTC' 
    sql_query = SELECT "business_entities"."id" * 1::INT8 + 0 AS "id" , "business_entities"."name" AS "name", "business_entities"."summary" AS "summary", "business_entities"."tag_list" AS "tag_list", "business_entities"."id" AS "sphinx_internal_id", 0 AS "sphinx_deleted", CASE COALESCE("business_entities"."type", '') WHEN 'Location' THEN 2817059741 WHEN 'Group' THEN 2885774273 WHEN 'BraintreeBusiness' THEN 28779289 WHEN 'InvoicedBusiness' THEN 1440117572 ELSE 2817059741 END AS "class_crc", COALESCE("business_entities"."type", '') AS "sphinx_internal_class", RADIANS(lat) AS "latitude", RADIANS(lng) AS "longitude" FROM "business_entities" WHERE ("business_entities"."type" = 'Location') AND ("business_entities"."id" >= $start AND "business_entities"."id" <= $end AND "business_entities"."delta" = TRUE AND "business_entities"."type" = 'Location') GROUP BY "business_entities"."id", "business_entities"."name", "business_entities"."summary", "business_entities"."tag_list", "business_entities"."id", "business_entities"."type" 
    sql_query_range = SELECT COALESCE(MIN("id"), 1::bigint), COALESCE(MAX("id"), 1::bigint) FROM "business_entities" WHERE "business_entities"."delta" = TRUE 
    sql_attr_uint = sphinx_internal_id 
    sql_attr_uint = sphinx_deleted 
    sql_attr_uint = class_crc 
    sql_attr_float = latitude 
    sql_attr_float = longitude 
    sql_attr_string = sphinx_internal_class 
    sql_query_info = SELECT * FROM "business_entities" WHERE "id" = (($id - 0)/1) 
} 

index location_delta : location_core 
{ 
    source = location_delta_0 
    path = /project_path/db/sphinx/development/location_delta 
} 

index location 
{ 
    type = distributed 
    local = location_delta 
    local = location_core 
} 
+0

你能請在這裏發表的sphinx.conf。 – 2012-01-12 09:48:59

+0

如果您確實發佈配置文件,請確保您從中刪除數據庫憑證詳細信息(用戶名和密碼)。 – pat 2012-01-12 12:32:05

+0

好的,張貼我的發展.sphinx.conf – 2012-01-12 16:10:52

回答

0

,我發現我的問題 - 這些記錄恰好是在STI表,但我只希望這些索引類型地點(地點沒有任何後代)。在這張表中的1300萬條記錄中,99.99984%(嚴重)是位置類型。 SELECT DISTINCT類型FROM business_entities查詢時間過長(即使使用索引)。最棘手的部分也發覺了這一點,因爲日誌已報告獅身人面像查詢持續84秒但它真的是那樣的問題掠奪SQL查詢:

SQL (43647.1ms) SELECT DISTINCT type FROM business_entities 
SQL (39857.7ms) SELECT DISTINCT type FROM business_entities 

Sphinx Query (84173.0ms) restaurant 

所以我猴子修補在初始化思考獅身人面像返回的唯一I型在乎:

module ThinkingSphinx 
    class Source 
    module SQL 
     def type_values 
     ['Location'] 
     end 
    end 
    end 
end 

https://gist.github.com/1603565

+1

你也可以在Sphinx配置中添加這個作爲WHERE子句的一部分 - define_index塊中的以下內容應該可以實現:'where'business_entities.type ='Location'「' – pat 2012-01-17 04:22:58

+1

另外:我建議在該類型列上放置數據庫索引。 – pat 2012-01-17 04:23:26

0

我不知道正因如此,它的運行對於搜索這麼慢,但我會在查詢中簡化的東西,加回的複雜性開始一點一點,看看是否有什麼特別的原因。所以,第一:

Location.search('Restaurant') 

那麼也許:

Location.search('Restaurant', :per_page => 100) 

等。不要忘記,您的索引定義中的:field_weights也會產生影響。

所有這一切說,我沒有察覺任何與你在做什麼特別奇怪的,43秒的搜索(或任何接近)是我還沒有遇到過的。

+0

感謝您的答覆帕特,我只是嘗試了簡單的查詢 - 它需要更長的時間 - 我試圖從索引中刪除字段權重,它沒有任何效果。我刪除了關聯關係的索引,這使得構建索引需要更少的時間。我會繼續嘗試...... – 2012-01-12 16:15:48