c＃itextsharp，在頁面中找到不是塊的詞塊，用於添加粘滯便箋的位置

我已經閱讀了所有相關的StackOverflow，並且尚未找到適當的解決方案。我想打開PDF文件，然後進一步獲取文本（單詞）及其座標，爲其中一些添加便條。c＃itextsharp，在頁面中找到不是塊的詞塊，用於添加粘滯便箋的位置

似乎是不可能的任務，我被困住了。

這段代碼如何正確找到頁面中的所有單詞（但不是它們的座標）？

using (PdfReader reader = new PdfReader(path)) 
    { 
     StringBuilder sb = new StringBuilder(); 

     ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy(); 
     for (int page = 5; page <= 5; page++) 
     { 
      string text = PdfTextExtractor.GetTextFromPage(reader, page, strategy); 

      Console.WriteLine(text); 

     } 

     //txt = sb.ToString(); 

    }

但是這一次得到的座標，但對「塊」不能依靠他們在正確的順序。

PdfReader reader = new PdfReader(path); 
    PdfReaderContentParser parser = new PdfReaderContentParser(reader); 

    LocationTextExtractionStrategyEx strategy; 

    for (int i = 5; i <= 5; i++) // reader.NumberOfPages 
    { 
     //strategy = parser.ProcessContent(i, new SimpleTextExtractionStrategy()); 
     // new MyLocationTextExtractionStrategy("sample", System.Globalization.CompareOptions.None) 
     strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0)); 

     foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks) 
     { 
      if (chunk.m_text.Trim() == "MCU_MOSI") 
       Console.WriteLine("Bingo"); // <-- NEVER HIT 
     } 


     //Console.WriteLine(strategy.m_SearchResultsList.ToString()); // strategy.GetResultantText() + 



    }

它使用一個類從這個職位（由我小修改） Getting Coordinates of string using ITextExtractionStrategy and LocationTextExtractionStrategy in Itextsharp

但只有找到無用的「豆腐塊」。

所以問題是可以與iTextSharp真的在頁面中找到單詞，以便我可以在附近添加一些便籤？謝謝。

來源

2017-06-02 user1797147

塊有位置和大小信息。因此，您可以對它們進行排序並根據需要推斷空間。 – mkl

它看起來像chunk.m_text只包含在一個時間這就是爲什麼這永遠不會是真正的一個字母：

if (chunk.m_text.Trim() == "MCU_MOSI")

你可以做的卻是讓每個塊的文本添加到字符串，看看它是否包含你的文字。

PdfReader reader = new PdfReader(path); 
    PdfReaderContentParser parser = new PdfReaderContentParser(reader); 

    LocationTextExtractionStrategyEx strategy; 
    string str = string.Empty; 

    for (int i = 5; i <= 5; i++) // reader.NumberOfPages 
    { 
     strategy = parser.ProcessContent(i, new LocationTextExtractionStrategyEx("MCU_MOSI", 0)); 
     var x = strategy.m_SearchResultsList; 
     foreach (LocationTextExtractionStrategyEx.ExtendedTextChunk chunk in strategy.m_DocChunks) 
     { 
      str += chunk.m_text; 
      if (str.Contains("MCU_MOSI")) 
      { 
       str = string.Empty; 
       Vector location = chunk.m_endLocation; 
       Console.WriteLine("Bingo"); 
      }       
     } 
    }

注意這個位置的例子，我公開了m_endLocation。

來源

2017-06-02 13:50:40 ktyson

我不知道該如何謝謝你！就像一個魅力！ – user1797147

c＃itextsharp，在頁面中找到不是塊的詞塊，用於添加粘滯便箋的位置

回答

相關問題