如何從一個文件夾中解析多個pdf文件到Java

我有一個很多pdf文件夾，我需要將它們全部轉換爲txt並將這些文本文件保存在另一個文件夾中。我想爲此使用java。如何從一個文件夾中解析多個pdf文件到Java

我有這個代碼來解析PDF，但它只能在一個時間工作，我需要處理一個文件夾與數以千計的PDF文件。

PDFTextStripper pdfStripper = null; 
PDDocument pdDoc = null; 
COSDocument cosDoc = null; 
File file = new File("C:/my.pdf"); 

try { 
    PDFParser parser = new PDFParser(new FileInputStream(file)); 
    parser.parse(); 
    cosDoc = parser.getDocument(); 
    pdfStripper = new PDFTextStripper(); 
    pdDoc = new PDDocument(cosDoc); 
    pdfStripper.setStartPage(1); 
    pdfStripper.setEndPage(20); 
    String parsedText = pdfStripper.getText(pdDoc); 
    }catch (IOException e) { 
    // TODO Auto-generated catch block 
    e.printStackTrace(); 
}

任何想法？

來源

2017-04-24 fluxing23

把上面的代碼在一個循環中，遍歷該文件。 –

嘗試使用文件夾名稱，而不是使用一個文件名「listFiles（）'方法 –

，你可以嘗試這樣的事情

PDFTextStripper pdfStripper = null; 
PDDocument pdDoc = null; 
COSDocument cosDoc = null; 
String parsedText=""; // append the text to this every time 
File folder = new File("/yourFolder"); // put all the pdf files in a folder 
File[] listOfFiles = folder.listFiles(); // get all the files as an array 

    for (File file : listOfFiles) { // cycle through this array 
     if (file.isFile()) { // for every file 
      try { //do the same 
       PDFParser parser = new PDFParser(new FileInputStream(file)); 
       parser.parse(); 
       cosDoc = parser.getDocument(); 
       pdfStripper = new PDFTextStripper(); 
       pdDoc = new PDDocument(cosDoc); 
       pdfStripper.setStartPage(1); 
       pdfStripper.setEndPage(pdDoc.getNumberOfPages()); // if always till the last page 
       parsedText += pdfStripper.getText(pdDoc) + System.lineSeparator(); // append the text to the String 
       }catch (IOException e) { 
       // TODO Auto-generated catch block 
       e.printStackTrace(); 
       } 
     } 
    }

來源

2017-04-24 15:21:18 Yahya

非常感謝！作爲後續，我想知道是否有一種方法可以單獨保存新的分析文件，而不是一個大的文本文件。 – fluxing23

我很高興我可以幫忙:)你可以將每個循環後的「parsedText」保存到文件文本中，而不是將其附加到文本中 – Yahya

謝謝！我會嘗試的 – fluxing23

如何從一個文件夾中解析多個pdf文件到Java

回答

相關問題