在.Net中提取PDF文件中的所有unicode文本的最佳方法是什麼？

我使用iTextSharp的5.1.1提取所有文字數中的所有單詞，下面的代碼在.Net中提取PDF文件中的所有unicode文本的最佳方法是什麼？

public static string GetTextFromAllPages(String pdfPath) 
{ 
    PdfReader reader = new PdfReader(pdfPath); 
    StringWriter output = new StringWriter(); 
    for (int i = 1; i <= reader.NumberOfPages; i++) 
     output.WriteLine(PdfTextExtractor.GetTextFromPage(reader, i, new SimpleTextExtractionStrategy())); 

    return output.ToString(); 
}

，但針對不同的語言（英語，法語，..）和輸入文件它主要是給錯誤的結果從我期望的實際價值

來源

2010-05-24 Iman Abidi

iTextSharp（http://sourceforge.net/projects/itextsharp/）有一個強大的API來操縱PDF的。

來源

2010-05-24 12:14:41 etc

但它讓你在PDF文件中的單詞，段落和行數嗎？我想你會發現答案是......不。 – Rowan 2010-05-25 08:20:57

我認爲itextsharp無法計數，但還不確定 – 2010-05-25 11:07:52

在.Net中提取PDF文件中的所有unicode文本的最佳方法是什麼？

回答

相關問題