我正在使用iText(for .net)來閱讀pdf文件。它讀取文檔,但是當有空格時,它只讀取一個空格。如何閱讀帶有空格的pdf文件(實際上是)在c#.net中使用iTextsharp行代碼行
這使得無法通過獲取子字符串來提取數據。我想用空格逐行讀取數據,所以我知道文本的實際位置,因爲我想將數據寫入數據庫。
該文件是一個銀行對賬單,我想它轉儲到用於設計覈對系統的數據庫,
這裏是一個文件
以下的屏幕截圖是我使用的代碼
For page As Integer = 1 To pdfReader.NumberOfPages
' Dim strategy As ITextExtractionStrategy = New SimpleTextExtractionStrategy()
Dim Strategy As ITextExtractionStrategy = New iTextSharp.text.pdf.parser.LocationTextExtractionStrategy()
Dim currentText As String = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy)
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.[Default], Encoding.UTF8, Encoding.[Default].GetBytes(currentText)))
Dim delimiterChars As Char() = {ControlChars.Lf}
Dim lines As String() = currentText.Split(delimiterChars)
Dim Bnk_Name As Boolean = True
Dim Br_Name As Boolean = False
Dim Name_acc As Boolean = False
Dim statment As Boolean = False
Dim Curr As Boolean = False
Dim Open As Boolean = False
Dim BankName = ""
Dim Branch = ""
Dim AccountNo = ""
Dim CompName = ""
Dim Currency = ""
Dim Statement_from = ""
Dim Statement_to = ""
Dim Opening_Balance = ""
Dim Closing_Balance = ""
Dim Narration As String = ""
For Each line As String In lines
line.Trim()
'BANK NAME
If Bnk_Name Then
If line.Trim() <> "" Then
BankName = line.Substring(0, 21)
Bnk_Name = False
Else
Bnk_Name = False
End If
End If
但我想,因爲它是爲空格閱讀位置
您可以實施文本提取策略,嘗試通過爲大間隙插入多個空格字符來反映文本的水平佈局。對於iText/Java,在[本答案](https://stackoverflow.com/a/24911617/1729265)中已經描述了基於「LocationTextExtractionStrategy」的內容。 – mkl