2016-05-17 62 views
1

我想找到一種方式來索引文件的描述,如「In-N-Out漢堡」,並進行搜索像「在N出」或「進出」或只是直接「in-n-out」,並讓它返回「In-N-Out漢堡」文件。通過查看文檔,我很困惑如何在索引或搜索時處理短劃線。有什麼建議麼?Elasticsearch模糊搜索短語與破折號

我的當前設置和映射:

curl -XPUT http://localhost:9200/objects -d '{ 
    "settings": { 
     "analysis": { 
      "analyzer": { 
       "lower": { 
        "type": "custom", 
        "tokenizer": "keyword", 
        "filter": [ "lowercase" ] 
       } 
      } 
     } 
    } 
}' 

curl -XPUT http://localhost:9200/objects/object/_mapping -d '{ 
    "object" : { 
     "properties" : { 
      "objectDescription" : { 
       "type" : "string", 
       "fields" : { 
        "lower": { 
         "type": "string", 
         "analyzer": "lower" 
        } 
       } 
      }, 
      "suggest" : { 
       "type" : "completion", 
       "analyzer" : "simple", 
       "search_analyzer" : "simple", 
       "payloads" : true 
      } 
     } 
    } 
}' 
+0

我的答案有什麼好運? –

+0

我很抱歉!我現在不在國內,不能玩它。只要我回家,我會讓你知道:) –

回答

1

我還沒有看到任何問題,當我與你的設置中進行索引,並把文件:

curl -XPUT http://localhost:9200/objects/object/001 -d '{ 
    "description": "In-N-Out Burger", 
    "name" : "first_document" 
}' 

,然後試圖找到它:

curl -XGET 'localhost:9200/objects/object/_search?q=in+and+out&pretty' 
{ 
    "took" : 6, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 1, 
    "max_score" : 0.05038611, 
    "hits" : [ { 
     "_index" : "objects", 
     "_type" : "object", 
     "_id" : "001", 
     "_score" : 0.05038611, 
     "_source" : { 
     "description" : "In-N-Out Burger", 
     "name" : "first_document" 
     } 
    } ] 
    } 
} 

curl -XGET 'localhost:9200/objects/object/_search?pretty&q=in-n-out' 
{ 
    "took" : 8, 
    "timed_out" : false, 
    "_shards" : { 
    "total" : 5, 
    "successful" : 5, 
    "failed" : 0 
    }, 
    "hits" : { 
    "total" : 1, 
    "max_score" : 0.23252454, 
    "hits" : [ { 
     "_index" : "objects", 
     "_type" : "object", 
     "_id" : "001", 
     "_score" : 0.23252454, 
     "_source" : { 
     "description" : "In-N-Out Burger", 
     "name" : "first_document" 
     } 
    } ] 
    } 
} 

正如你所見,它可以被發現。分析器使用' - '作爲分隔符,並在索引文檔和嘗試找到它時將短語分割爲標記。你可以看到這個工作:

curl -XGET 'localhost:9200/objects/_analyze?pretty=true' -d 'In-N-Out Burger' 
{ 
    "tokens" : [ { 
    "token" : "in", 
    "start_offset" : 0, 
    "end_offset" : 2, 
    "type" : "<ALPHANUM>", 
    "position" : 0 
    }, { 
    "token" : "n", 
    "start_offset" : 3, 
    "end_offset" : 4, 
    "type" : "<ALPHANUM>", 
    "position" : 1 
    }, { 
    "token" : "out", 
    "start_offset" : 5, 
    "end_offset" : 8, 
    "type" : "<ALPHANUM>", 
    "position" : 2 
    }, { 
    "token" : "burger", 
    "start_offset" : 9, 
    "end_offset" : 15, 
    "type" : "<ALPHANUM>", 
    "position" : 3 
    } ] 
} 
+0

是的,你是對的。謝謝! –