2014-08-27 119 views
4

我正在使用SolrJ API 4.8將富文檔索引到solr。但我想要異步索引這些文檔。我做的功能同步發送文件,但我不知道如何改變它使其異步。任何想法?SolrJ - 使用ContentStreamUpdateRequest異步索引文檔

功能:

public Boolean indexDocument(HttpSolrServer server, String PathFile, InputReader external) 
{ 

     ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); 

     try { 
       up.addFile(new File(PathFile), "text"); 
     } catch (IOException e) { 
       Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
       return false; 
     } 

     up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 

     try { 
       server.request(up); 
     } catch (SolrServerException e) { 
       Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
       return false; 

     } catch (IOException e) { 
       Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
       return false; 
     } 
     return true; 
} 

Solr的服務器:4.8版

回答

1

這聽起來像你可能想看看使用的ExecutorService和FutureTask提供這樣做:

private static HttpSolrServer server; 
private static int threadPoolSize = 4; //Set this to something appropiate for your environment 

public static void main(String[] args) { 
    ExecutorService executor = Executors.newFixedThreadPool(threadPoolSize); 
    ArrayList<FutureTask<Boolean>> taskList = new ArrayList<FutureTask<Boolean>>(); 
    ArrayList<String> paths = new ArrayList<String>(); 
    //Initialize your list of paths here 

    for (String path : paths) { 
     FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new IndexDocumentTask(path)); 
     taskList.add(futureTask); 
     executor.execute(futureTask); 
    } 

    for (int i = 0; i < taskList.size(); i++) { 
     FutureTask<Boolean> futureTask = taskList.get(i); 

     try { 
      System.out.println("Index Task " + i + (futureTask.get() ? " finished successfully." : " encountered an error.")); 
     } catch (ExecutionException e) { 
      System.out.println("An Execution Exception occurred with Index Task " + i); 
     } catch (InterruptedException e) { 
      System.out.println("An Interrupted Exception occurred with Index Task " + i); 
     } 
    } 

    executor.shutdown(); 
} 

static class IndexDocumentTask implements Callable<Boolean> { 

    private String pathFile; 

    public IndexDocumentTask(String pathFile) { 
     this.pathFile = pathFile; 
    } 

    @Override 
    public Boolean call() { 
     return indexDocument(pathFile); 
    } 

    public Boolean indexDocument(String pathFile) { 
     ContentStreamUpdateRequest up = new ContentStreamUpdateRequest("/update/extract"); 

     try { 
      up.addFile(new File(pathFile), "text"); 
     } catch (IOException e) { 
      Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
      return false; 
     } 

     up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); 

     try { 
      server.request(up); 
     } catch (SolrServerException e) { 
      Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
      return false; 

     } catch (IOException e) { 
      Logger.getLogger(ANOIndexer.class.getName()).log(Level.SEVERE, null, e); 
      return false; 
     } 
     return true; 
    } 
} 

這是未經測試的代碼,所以我不知道是否像這樣調用server.request(up)是線程安全的。我認爲只使用一個HttpSolrServer實例會更簡潔,但您也可以在每個任務中創建新的HttpSolrServer實例。

如果您願意,可以增加IndexDocumentTask以實現Callable<Tuple<String, Boolean>>,以便您可以檢索要索引的文檔的文件名以及索引是否成功。

儘管我不認爲一次向Solr服務器發送多個請求應該是個問題,但您可能想限制您的請求,以免超載Solr服務器。