無法從PhraseQuery或WildcardQuery的有效索引中找到任何結果？

出於某種原因，我無法從我的有效索引3552項目中找到任何結果。無法從PhraseQuery或WildcardQuery的有效索引中找到任何結果？

請看下面的代碼，當我運行它時，後面跟着程序的控制檯輸出。是索引文件的數量。 /c:/test/stuff.txt是作爲測試從文檔5中檢索的正確的索引路徑。底部的所有文本都是測試文件的全文（在XML類型輸出中）。我錯過了什麼，我的簡單查詢不會產生結果？

也許我的WildcardQuery語法不好？我認爲這將是低效的（由於在開始和結束的通配符），但它至少會返回從該指數文件...

import java.io.File; 
import java.io.IOException; 

import org.apache.lucene.document.Document; 
import org.apache.lucene.document.Fieldable; 
import org.apache.lucene.index.CorruptIndexException; 
import org.apache.lucene.index.IndexReader; 
import org.apache.lucene.index.Term; 
import org.apache.lucene.search.IndexSearcher; 
import org.apache.lucene.search.ScoreDoc; 
import org.apache.lucene.search.TopDocs; 
import org.apache.lucene.search.WildcardQuery; 
import org.apache.lucene.store.FSDirectory; 


public class Searcher 
{ 

    /** 
    * @param args 
    * @throws IOException 
    * @throws CorruptIndexException 
    */ 
    public static void main(String[] args) throws CorruptIndexException, IOException 
    { 

     System.out.println("Begin searching test..."); 

     IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(args[0]))); 

     // termContainsWildcard is shown to be true here when debugging 
     // numberOfTerms is 0 
     WildcardQuery query = new WildcardQuery(new Term("contents", "*stuff*")); 

     System.out.println("Query field is: " + query.getTerm().field()); 
     System.out.println("Query field contents is: " + query.getTerm().text()); 

     TopDocs results = searcher.search(query, 5000); 

     // no results returned :(
     System.out.println("Total results from index " + args[0] + ": " + results.totalHits); 

     for (ScoreDoc sd : results.scoreDocs) 
     { 
      System.out.println("Document matched. Number: " + sd.doc); 
     } 

     System.out.println(); 

     System.out.println("Begin reading test..."); 

     // now read from the index to see if I am crazy 
     IndexReader reader = IndexReader.open(FSDirectory.open(new File(args[0]))); 

     // correctly shows the number of documents in the local index 
     System.out.println("Number of indexed documents: " + reader.numDocs()); 

     // pick out a random, small document and check its fields 
     Document d = reader.document(5); 

     for (Fieldable f : d.getFields()) 
     { 
      System.out.println("Field name is: " + f.name()); 
      System.out.println(new String(f.getBinaryValue())); 
     } 
    } 
}

控制檯輸出運行時

開始搜索測試...
查詢字段是：內容
查詢字段內容爲：*stuff*
從指數C總的結果：\索引2：0

開始閱讀測試...
索引的文件數量：3552
字段名稱是：路徑
/c:/test/stuff.txt
字段名稱是：內容
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="Content-Length" content="8"/>
<meta name="Content-Encoding" content="UTF-8"/>
<meta name="Content-Type" content="text/plain"/>
<meta name="resourceName" content="stuff.txt"/>
<title/>
</head>
<body>
<p>stuff 
</p>
</body>
</html>

來源

2011-05-10 asteroid

您可以嘗試使用盧克運行查詢&試驗一些不同的查詢。您也可以使用Luke瀏覽索引條款，這可能會讓您知道發生了什麼。您用於索引文檔的代碼也可能會提供一些提示：例如，您的字段是否已編入索引？您從內容中獲取二進制值，這可能意味着它從未被標記並因此編入索引。

來源

2011-05-15 06:06:17

由於您的Luke建議，我能夠弄清楚這一點！您是正確的 - 沒有實際標記化的二進制字段。對於一個新手來說很令人困惑。 – asteroid 2011-05-26 18:59:56

默認情況下，前綴通配符查詢（通配符查詢一個前導*）在Lucene的禁用。有關更多信息，請參閱Lucene FAQ。如果要啓用前綴通配符查詢，請嘗試：

QueryParser.setAllowLeadingWildcard(true)

來源

2011-05-12 13:21:44 bajafresh4life

感謝您的回答......這是爲版本2 lucene？我正在運行3.1。0，並沒有看到這是一個靜態方法=（ – asteroid 2011-05-12 16:49:41

什麼是值得 - 刪除第二個通配符（以便WildcardQuery查詢=新WildcardQuery（新術語（「內容」，「*東西」））是什麼顯示在調試過程中仍然顯示termContainsWildcard等於true，這表明它至少可以識別通配符 – asteroid 2011-05-12 16:55:43

無法從PhraseQuery或WildcardQuery的有效索引中找到任何結果？

回答

相關問題