在Lucene中索引txt文件

我想爲推文創建一個小型搜索引擎。我有一個包含20000個推文的txt文件。文件格式是這樣的：在Lucene中索引txt文件

TommyFrench1
851
85170333395811123
Lurgan, Moira, Armagh. Derry
This week we are double delight on first goalscorers on the four Champions League matches in shop. ChampionsLeague

Im_Aarkay
175
851703414300037122
Paris
@ChampionsLeague @AS_Monaco @AS_Monaco_EN Nopes, it's when City knocked outta Champions league. .
.
etc

第一行是username，其次我有followers，其次是id和location和最後一個是text(tweet)。

我認爲每條推文都是一個文檔。所以我必須有20000個文件，每個文件必須有5個字段（用戶名，追隨者，ID等）。

我該如何編制索引？

我已經看到了一些教程，但我並沒有發現類似

編輯的東西：這是我的代碼。

import java.io.BufferedReader; 
import java.io.File; 
import java.io.FileReader; 
import java.io.IOException; 
import java.nio.file.Paths; 
import java.text.ParseException; 

import org.apache.lucene.analysis.Analyzer; 
import org.apache.lucene.analysis.standard.StandardAnalyzer; 
import org.apache.lucene.document.Document; 
import org.apache.lucene.document.Field; 
import org.apache.lucene.document.StringField; 
import org.apache.lucene.document.TextField; 
import org.apache.lucene.index.DirectoryReader; 
import org.apache.lucene.index.IndexReader; 
import org.apache.lucene.index.IndexWriter; 
import org.apache.lucene.index.IndexWriterConfig; 
import org.apache.lucene.queryparser.classic.QueryParser; 
import org.apache.lucene.search.IndexSearcher; 
import org.apache.lucene.search.Query; 
import org.apache.lucene.search.ScoreDoc; 
import org.apache.lucene.search.TopScoreDocCollector; 
import org.apache.lucene.store.Directory; 
import org.apache.lucene.store.FSDirectory; 
import org.apache.lucene.store.RAMDirectory; 
import org.apache.lucene.util.Version; 

public class MyProgram { 

    public static void main(String[] args) throws IOException, ParseException { 
     FileReader fileReader = new FileReader(new File("myfile.txt")); 
     BufferedReader br = new BufferedReader(fileReader); 
     String line = null; 

     String indexPath = "C:\\Desktop\\myfolder"; 
     Directory dir = FSDirectory.open(Paths.get(indexPath)); 

     Analyzer analyzer = new StandardAnalyzer(); 
     IndexWriterConfig iwc = new IndexWriterConfig(analyzer); 

     IndexWriter writer = new IndexWriter(dir, iwc); 


     while ((line = br.readLine()) != null) { 
      // reading lines until the end of the file 
      Document doc = new Document(); 
      String username = br.readLine(); 
      doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED)); // adding title field 
      String followers = br.readLine(); 
      doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED)); 
      String id = br.readLine(); 
      doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED)); 
      String location = br.readLine(); 
      doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED)); 
      String text = br.readLine(); 
      doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED)); 
      writer.addDocument(doc); // writing new document to the index 


      br.readLine(); 
     } 

    } 
}

即時得到以下錯誤： Index cannot be resolved or is not a field。

我該如何解決這個問題？

來源

2017-04-26 Lee Yaan

你說的「索引」的意思是，你要達到這個是什麼？ –

我有一個項目爲20000條推文創建一個小型搜索機器。索引過程是Lucene提供的核心功能之一。我必須閱讀txt文件，並且每條推文都必須是文檔。然後，每個文檔必須有域用戶名，ID，位置等我有關於熱它的工作原理，但即時通訊初學者在Lucene和我不能找到類似這樣的東西 –

你有沒有看這個問題的想法：http://stackoverflow.com /問題/ 4091441 /怎麼辦-I-索引和搜索文本文件功能於Lucene的-3-0-2？RQ = 1 –

從你的問題很難解釋你實際上面臨編譯時錯誤而不是運行時錯誤。

我複製你的代碼來理解，它的一個上編譯時錯誤 - Field.Index.ANALYZED參數上Field構造。

Refer Documentation 6.5.0中沒有這樣的構造函數了。

這是人們使用SOLR等頂級工具的原因之一，因爲這些變化一直在低Lucene API中發生。

無論如何，上述文件中，其也提到，你做什麼，

Expert: directly create a field for a document. Most users should use one of the sugar subclasses:

對於你的情況，TextField和StringField是相關類 - 有一個細微的差別兩個。

所以我會使用像 - new StringField(fieldName, fieldValue, Store.YES)等構造函數而不是直接在Field上做。

您可以使用Field也喜歡 - new Field(fieldName, fieldValue, fieldType)其中fieldType是FieldType。

可以初始化FieldType像 - FieldType txtFieldType = new FieldType(TextField.TYPE_STORED) OR FieldType strFieldType = new FieldType(StringField.TYPE_STORED)等

總而言之，他們的方式創建在Lucene的一個Field在最新版本中已經更改，因此創建Field情況下爲每Lucene的版本文件正在使用。

喜歡的東西 - doc.add(new Field("username", username, new FieldType(TextField.TYPE_STORED)))等

來源

2017-05-01 07:32:03

在Lucene中索引txt文件

回答

相關問題