2014-11-03 61 views
0

我在Lucene(版本4.10.1)中邁出了第一步,我目前的目標是從100KB大的文件中索引文本字段。由於文本不適合字符串,因此我將文本中的文本放入字節數組中。但是當我運行程序Lucene說Fields with BytesRef values cannot be indexed如何索引Lucene中的大文本字段4.10.1

所以問題是:如何索引大文本字段?

下面的代碼:

public class Main { 

    public static void main(String[] args) { 

     try { 
      Directory indexDir = FSDirectory.open(new File("testIndex")); 
      Analyzer analyzer = new StandardAnalyzer(); 
      IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_4_10_1, analyzer); 
      IndexWriter indexWriter = new IndexWriter(indexDir, conf); 
      Path path = Paths.get("text.txt"); 
      byte[] text = Files.readAllBytes(path); 

      Long startTime = System.currentTimeMillis(); 
      for(int i = 0;i<100;i++) { 
       Document doc = new Document(); 
       FieldType fieldType = new FieldType(); 
       fieldType.setIndexed(true); 
       fieldType.setTokenized(true); 
       fieldType.setStored(true); 
       fieldType.setOmitNorms(true); 
       fieldType.setStoreTermVectors(false); 
       fieldType.setStoreTermVectorOffsets(false); 
       fieldType.setStoreTermVectorPayloads(false); 
       fieldType.setStoreTermVectorPositions(false); 
       Field title = new Field("text"+i, text, fieldType); 

       doc.add(title); 

       indexWriter.addDocument(doc); 
      } 
      Long endTime = System.currentTimeMillis(); 
      Long elapsedTime = endTime - startTime; 
      System.out.println("Elapsed Time in Ms: "+elapsedTime); 

      indexWriter.close(); 

     } catch (IOException e) { 
      e.printStackTrace(); 
     } 

    } 

} 

回答

0

StringBuilder解決它。

代碼:

  Path path = Paths.get("text.txt"); 
      BufferedReader reader = Files.newBufferedReader(path, Charset.defaultCharset()); 
      StringBuilder stringBuilder = new StringBuilder(); 
      String line = null; 
      while((line = reader.readLine()) != null) { 
       stringBuilder.append(line).append("\n"); 
      } 
      String text = stringBuilder.toString();