2010-11-19 53 views
7

使用itextsharp(或任何c#pdf庫),我需要打開PDF,用實際值替換一些佔位符文本,並將其作爲byte []返回。使用itextsharp(或任何c#pdf庫),如何打開PDF,替換一些文本,並再次保存?

有人可以建議如何做到這一點?我查看了itext文檔,無法確定從哪裏開始。到目前爲止,我一直在堅持如何從pdfReader獲取源PDF到Document對象,我推測我可能接近這個錯誤的方式。

非常感謝

+0

發現此爲止:http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c -and-itextsharp/ – Chris 2010-11-19 02:24:19

回答

5

最後,我用PDFescape打開我現有的PDF文件,並將其放置在我需要把我的田某種形式的字段,然後再次將其保存到我的創建PDF文件。

http://www.pdfescape.com

後來我發現關於如何更換表單字段此博客條目:

http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/

所有作品很好!以下是代碼:

public static byte[] Generate() 
{ 
    var templatePath = HttpContext.Current.Server.MapPath("~/my_template.pdf"); 

    // Based on: 
    // http://www.johnnycode.com/blog/2010/03/05/using-a-template-to-programmatically-create-pdfs-with-c-and-itextsharp/ 
    var reader = new PdfReader(templatePath); 
    var outStream = new MemoryStream(); 
    var stamper = new PdfStamper(reader, outStream); 

    var form = stamper.AcroFields; 
    var fieldKeys = form.Fields.Keys; 

    foreach (string fieldKey in fieldKeys) 
    { 
    if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldA") 
     form.SetField(fieldKey, "1234"); 
    if (form.GetField(fieldKey) == "MyTemplatesOriginalTextFieldB") 
     form.SetField(fieldKey, "5678"); 
    } 

    // "Flatten" the form so it wont be editable/usable anymore 
    stamper.FormFlattening = true; 

    stamper.Close(); 
    reader.Close(); 

    return outStream.ToArray(); 
} 
+0

我不認爲你需要使用字段鍵 - 你可以使用: form.SetField(「MyTemplatesOriginalTextFieldA」,「1234」); 等。 – Lachlan 2010-12-13 02:30:44

+0

啊,是的,那將是我現在這樣做的方式。 – Chris 2010-12-13 23:51:06

+0

當我編寫代碼時,就是這樣,因爲我正在替換(未命名)字段中的實際值,而不是給出字段並對它們進行命名,因爲正確地建議是更好的選擇。 – Chris 2010-12-13 23:51:50

1

不幸的是我一直在尋找類似的東西,不能弄明白。下面是我得到的,也許你可以用它作爲一個起點。問題是PDF實際上並不保存文本,而是使用查找表和其他一些神祕的魔法。這個方法讀取頁面的字節值並嘗試轉換爲字符串,但據我所知它只能做英文而錯過了一些特殊字符,所以我放棄了我的項目並繼續前進。

string contents = string.Empty(); 
Document doc = new Document(); 
PdfReader reader = new PdfReader("pathToPdf.pdf"); 
using (MemoryStream memoryStream = new MemoryStream()) 
{ 

    PdfWriter writer = PdfWriter.GetInstance(doc, memoryStream); 
    doc.Open(); 
    PdfContentByte cb = writer.DirectContent; 
    for (int p = 1; p <= reader.NumberOfPages; p++) 
    { 
     // add page from reader 
     doc.SetPageSize(reader.GetPageSize(p)); 
     doc.NewPage(); 

     // pickup here something like this: 
     byte[] bt = reader.GetPageContent(p); 
     contents = ExtractTextFromPDFBytes(bt); 

     if (contents.IndexOf("something")!=-1) 
     { 
      // make your own pdf page and add to cb (contentbyte) 

     } 
     else 
     { 
      PdfImportedPage page = writer.GetImportedPage(reader, p); 
      int rot = reader.GetPageRotation(p); 
      if (rot == 90 || rot == 270) 
       cb.AddTemplate(page, 0, -1.0F, 1.0F, 0, 0, reader.GetPageSizeWithRotation(p).Height); 
      else 
       cb.AddTemplate(page, 1.0F, 0, 0, 1.0F, 0, 0); 
     } 
    } 
    reader.Close(); 
    doc.Close(); 
    File.WriteAllBytes("pathToOutputOrSamePathToOverwrite.pdf", memoryStream.ToArray()); 

這取自this site

private string ExtractTextFromPDFBytes(byte[] input) 
{ 
    if (input == null || input.Length == 0) return ""; 

    try 
    { 
     string resultString = ""; 

     // Flag showing if we are we currently inside a text object 
     bool inTextObject = false; 

     // Flag showing if the next character is literal 
     // e.g. '\\' to get a '\' character or '\(' to get '(' 
     bool nextLiteral = false; 

     //() Bracket nesting level. Text appears inside() 
     int bracketDepth = 0; 

     // Keep previous chars to get extract numbers etc.: 
     char[] previousCharacters = new char[_numberOfCharsToKeep]; 
     for (int j = 0; j < _numberOfCharsToKeep; j++) previousCharacters[j] = ' '; 


      for (int i = 0; i < input.Length; i++) 
      { 
       char c = (char)input[i]; 

       if (inTextObject) 
       { 
        // Position the text 
        if (bracketDepth == 0) 
        { 
         if (CheckToken(new string[] { "TD", "Td" }, previousCharacters)) 
         { 
          resultString += "\n\r"; 
         } 
         else 
         { 
          if (CheckToken(new string[] { "'", "T*", "\"" }, previousCharacters)) 
          { 
           resultString += "\n"; 
          } 
          else 
          { 
           if (CheckToken(new string[] { "Tj" }, previousCharacters)) 
           { 
            resultString += " "; 
           } 
          } 
         } 
        } 

        // End of a text object, also go to a new line. 
        if (bracketDepth == 0 && 
         CheckToken(new string[] { "ET" }, previousCharacters)) 
        { 

         inTextObject = false; 
         resultString += " "; 
        } 
        else 
        { 
         // Start outputting text 
         if ((c == '(') && (bracketDepth == 0) && (!nextLiteral)) 
         { 
          bracketDepth = 1; 
         } 
         else 
         { 
          // Stop outputting text 
          if ((c == ')') && (bracketDepth == 1) && (!nextLiteral)) 
          { 
           bracketDepth = 0; 
          } 
          else 
          { 
           // Just a normal text character: 
           if (bracketDepth == 1) 
           { 
            // Only print out next character no matter what. 
            // Do not interpret. 
            if (c == '\\' && !nextLiteral) 
            { 
             nextLiteral = true; 
            } 
            else 
            { 
             if (((c >= ' ') && (c <= '~')) || 
              ((c >= 128) && (c < 255))) 
             { 
              resultString += c.ToString(); 
             } 

             nextLiteral = false; 
            } 
           } 
          } 
         } 
        } 
       } 

       // Store the recent characters for 
       // when we have to go back for a checking 
       for (int j = 0; j < _numberOfCharsToKeep - 1; j++) 
       { 
        previousCharacters[j] = previousCharacters[j + 1]; 
       } 
       previousCharacters[_numberOfCharsToKeep - 1] = c; 

       // Start of a text object 
       if (!inTextObject && CheckToken(new string[] { "BT" }, previousCharacters)) 
       { 
        inTextObject = true; 
       } 
      } 
     return resultString; 
    } 
    catch 
    { 
     return ""; 
    } 
} 

private bool CheckToken(string[] tokens, char[] recent) 
{ 
    foreach (string token in tokens) 
    { 
     if ((recent[_numberOfCharsToKeep - 3] == token[0]) && 
      (recent[_numberOfCharsToKeep - 2] == token[1]) && 
      ((recent[_numberOfCharsToKeep - 1] == ' ') || 
      (recent[_numberOfCharsToKeep - 1] == 0x0d) || 
      (recent[_numberOfCharsToKeep - 1] == 0x0a)) && 
      ((recent[_numberOfCharsToKeep - 4] == ' ') || 
      (recent[_numberOfCharsToKeep - 4] == 0x0d) || 
      (recent[_numberOfCharsToKeep - 4] == 0x0a))) 
      { 
       return true; 
      } 
    } 
    return false; 
} 
+0

什麼是_numberOfCharsToKeep缺少聲明this.so指導我如何定義此。 – 2013-09-13 04:57:18

相關問題