2016-08-20 70 views
1

我試圖在關鍵字分析字段上應用html_strip和小寫過濾器。搜索時,我注意到搜索結果不符合預期。在關鍵字分析字段上應用html_strip和小寫過濾器

這是我們試圖創造

PUT /test_index 
{ 
    "settings": { 
    "number_of_shards": 5, 
    "number_of_replicas": 0, 
    "analysis": { 
    "analyzer": { 
     "ExportPrimaryAnalyzer": { 
     "type": "custom", 
     "tokenizer": "whitespace", 
     "filter": "lowercase", 
     "char_filter": "html_strip" 
     }, 
     "ExportRawAnalyzer": { 
     "type": "custom", 
     "buffer_size": "1000", 
     "tokenizer": "keyword", 
     "filter": "lowercase", 
     "char_filter": "html_strip" 
     } 
    } 
    } 
}, 
    "mappings": { 
    "test_type": { 
     "properties": { 
     "city": { 
      "type": "string", 
      "analyzer" : "ExportPrimaryAnalyzer" 
     }, 
     "city_raw":{ 
      "type": "string", 
      "analyzer" : "ExportRawAnalyzer" 
     } 
     } 
    } 
    } 
} 

的指數,以下是數據例如:

PUT test_index/test_type/4 
{ 
    "city": "<p>I am from Pune</p>", 
    "city_raw": "<p>I am from Pune</p>" 
} 

當我們試圖做就可以了通配符,我們沒有得到結果。以下是我們試圖觸發的查詢。

{ 
    "query": { 
    "wildcard": { 
     "city_raw": "i am*" 
    } 
    } 
} 

任何幫助理解

回答

0

html_strip_filter將與new-lines代替HTML塊的元素。 因此,如果您使用keyword-tokenizer,則需要額外的過濾器以用空字符串替換new-lines

實施例:

PUT test 
{ 
    "settings": { 
     "number_of_shards": 5, 
     "number_of_replicas": 0, 
     "analysis": { 
     "char_filter": { 
      "remove_new_line": { 
       "type": "mapping", 
       "mappings": [ 
        "\\n =>" 
       ] 
      } 
     }, 
     "analyzer": { 
      "ExportPrimaryAnalyzer": { 
       "type": "custom", 
       "tokenizer": "whitespace", 
       "filter": [ 
        "lowercase" 
       ], 
       "char_filter": [ 
        "html_strip" 
       ] 
      }, 
      "ExportRawAnalyzer": { 
       "type": "custom", 
       "buffer_size": "1000", 
       "tokenizer": "keyword", 
       "filter": [ 
        "lowercase" 
       ], 
       "char_filter": [ 
        "html_strip", 
        "remove_new_line" 
       ] 
      } 
     } 
     } 
    }, 
    "mappings": { 
     "test_type": { 
     "properties": { 
      "city": { 
       "type": "string", 
       "analyzer": "ExportPrimaryAnalyzer" 
      }, 
      "city_raw": { 
       "type": "string", 
       "analyzer": "ExportRawAnalyzer" 
      } 
     } 
     } 
    } 
} 

PUT test/test_type/4 
{ 
    "city": "<p>I am from Bangalore I like Pune too</p>", 
    "city_raw": "<p>I am from Bangalore I like Pune too</p>" 
} 

post test/_search 
{ 
    "query": { 
    "wildcard": { 
     "city_raw": "i am *" 
    } 
    } 
} 

結果:

"hits": [ 
    { 
     "_index": "test", 
     "_type": "test_type", 
     "_id": "4", 
     "_score": 1, 
     "_source": { 
      "city": "<p>I am from Bangalore I like Pune too</p>", 
      "city_raw": "<p>I am from Bangalore I like Pune too</p>" 
     } 
    } 
    ]