獲取MarkLogic中的所有文檔URI使用Java客戶端API

我試圖從數據庫中獲取所有文檔而不知道確切的URL。我有一個查詢獲取MarkLogic中的所有文檔URI使用Java客戶端API

DocumentPage documents =docMgr.read(); 
while (documents.hasNext()) { 
    DocumentRecord document = documents.next(); 
    System.out.println(document.getUri()); 
}

但是我沒有特定的網址，我想所有的文件

來源

2015-10-20 Ankita Bhowmik

你究竟在努力完成什麼？如果你想導出內容，MLCP會更容易。如果您想要進行一些數字運算，那麼在MarkLogic內部進行操作可能會更容易。 –

的第一步是使數據庫的URI的詞彙。

你可以EVAL一些XQuery和運行CTS：URI的（）（或服務器端JS和運行cts.uris（））：

ServerEvaluationCall call = client.newServerEval() 
     .xquery("cts:uris()"); 
    for (EvalResult result : call.eval()) { 
     String uri = result.getString(); 
     System.out.println(uri); 
    }

兩個缺點是：（1）你需要一個用戶privileges和（2）沒有分頁。

如果您有少量文件，則不需要分頁。但是對於大量的文檔分頁建議。下面是使用搜索API和分頁一些代碼：如果您需要添加/刪除在這個過程中發生的同時分離出一種保證「快照」列表

// do the next eight lines just once 
    String options = 
     "<options xmlns='http://marklogic.com/appservices/search'>" + 
     " <values name='uris'>" + 
     " <uri/>" + 
     " </values>" + 
     "</options>"; 
    QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager(); 
    optionsMgr.writeOptions("uriOptions", new StringHandle(options)); 

    // run the following each time you need to list all uris 
    QueryManager queryMgr = client.newQueryManager(); 
    long pageLength = 10000; 
    queryMgr.setPageLength(pageLength); 
    ValuesDefinition query = queryMgr.newValuesDefinition("uris", "uriOptions"); 
    // the following "and" query just matches all documents 
    query.setQueryDefinition(new StructuredQueryBuilder().and()); 
    int start = 1; 
    boolean hasMore = true; 
    Transaction transaction = client.openTransaction(); 
    try { 
     while (hasMore) { 
      CountedDistinctValue[] uriValues = 
       queryMgr.values(query, new ValuesHandle(), start, transaction).getValues(); 
      for (CountedDistinctValue uriValue : uriValues) { 
       String uri = uriValue.get("string", String.class); 
       //System.out.println(uri); 
      } 
      start += uriValues.length; 
      // this is the last page if uriValues is smaller than pageLength 
      hasMore = uriValues.length == pageLength; 
     } 
    } finally { 
     transaction.commit(); 
    }

交易纔是必需的。由於它增加了一些開銷，如果你不需要這樣的精確度，可以隨意刪除它。

來源

2015-10-20 20:28:05

你能告訴我，我們是否可以指定它應該從哪個集合中獲取uris？ –

您可以使用StructuredQueryBuilder.and而不是StructuredQueryBuilder.collection指定查詢https://docs.marklogic.com/javadoc/client/com/marklogic/client/query/StructuredQueryBuilder.html#collection%28java.lang.String。 ..％29 –

發現使用這個更好的選擇： ServerEvaluationCall呼叫= client.newServerEval（）的XQuery（「在收集$ X（\」 RexUserProfiles \「）回報（FN：文檔URI（$ X ））「）; –

找出頁面長度並在queryMgr中指定要訪問的起始點。繼續增加所有URL的起點和循環。我能夠獲取所有的URI。這可能不是那麼好的方法，但工作。

List<String> uriList = new ArrayList<>();  
     QueryManager queryMgr = client.newQueryManager(); 
     StructuredQueryBuilder qb = new StructuredQueryBuilder(); 
     StructuredQueryDefinition querydef = qb.and(qb.collection("xxxx"), qb.collection("whatever"), qb.collection("whatever"));//outputs 241152 
     SearchHandle results = queryMgr.search(querydef, new SearchHandle(), 10); 
     long pageLength = results.getPageLength(); 
     long totalResults = results.getTotalResults(); 
     System.out.println("Total Reuslts: " + totalResults); 
     long timesToLoop = totalResults/pageLength; 
     for (int i = 0; i < timesToLoop; i = (int) (i + pageLength)) { 
      System.out.println("Printing Results from: " + (i) + " to: " + (i + pageLength)); 
      results = queryMgr.search(querydef, new SearchHandle(), i); 
      MatchDocumentSummary[] summaries = results.getMatchResults();//10 results because page length is 10 
      for (MatchDocumentSummary summary : summaries) { 
//    System.out.println("Extracted friom URI-> " + summary.getUri()); 
       uriList.add(summary.getUri()); 
      } 
      if (i >= 1000) {//number of URI to store/retreive. plus 10 
       break; 
      } 
     } 
     uriList= uriList.stream().distinct().collect(Collectors.toList()); 
     return uriList;

來源

2018-02-01 20:16:05 chetan

獲取MarkLogic中的所有文檔URI使用Java客戶端API

回答

相關問題