用不同的頁面分析HTML中的iText塊循環

我目前有一個基於數據庫中幾行數據創建PDF的工作版本。對於數據庫中的每一行，它都會在PDF中創建一個新頁面。這一切都很好。現在我需要解析每行中的一些字段，以便正確呈現HTML。我可以see an example here它顯示解析整個文檔，雖然它是一個完整的字符串和解析文檔。用不同的頁面分析HTML中的iText塊循環

我需要的是創建個別格式化的網頁，只有特定的HTML字段被解析。是否有可能做到這一點？

下面是一些示例代碼，我有一個創建新頁面：

PdfFont fTimes = PdfFontFactory.CreateFont(FontConstants.TIMES_ROMAN); 
PdfFont fTimesBold = PdfFontFactory.CreateFont(FontConstants.TIMES_BOLD);      

// create the first page here 
doc.Add(new Paragraph("Abstract Submissions for " + eventName).SetFont(fTimes).SetFontSize(18).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Section Name: " + GetSectionName(ddlSections.SelectedValue)).SetFont(fTimes).SetFontSize(14).SetFontColor(Color.BLACK)); 
doc.Add(new Paragraph("Created: " + DateTime.Now.ToString("dddd, MMMM d, yyyy h:mm tt")).SetFont(fTimes).SetFontSize(11).SetFontColor(Color.BLACK)); 

// iterate through each of the items 
foreach (DataRow row in dsItems.Tables[0].Rows) 
{ 
    // create a new page for each abstract submission 
    doc.Add(new AreaBreak(iText.Layout.Properties.AreaBreakType.NEXT_PAGE)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationType"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["PresentationTitle"], "")).SetFont(fTimes).SetFontSize(16).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Authors"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
    // html field 
    doc.Add(new Paragraph(ValidationHelper.GetString(row["Abstract"], "")).SetFont(fTimes).SetFontSize(12).SetFontColor(Color.BLACK)); 
} 

doc.Close();

我應該注意到我使用的是MemoryStream與FileStream所以客戶端可以下載立即在文件系統中保存，不需要。

**編輯 - 添加樣本數據**

<table> 
    <tr> 
     <td>Poster</td> 
     <td>Abstract 1</td> 
     <td><strong><em>Doctor Name 1</em></strong> <strong>Doctor Name 2</strong></td> 
     <td><p>Some really long text <strong>which can have</strong> some different basic HTML <u>formatting in it</u></p></td> 
    </tr> 
    <tr> 
     <td>Presentation</td> 
     <td>Abstract 2</td> 
     <td><strong>Doctor Name 15 </strong><em>Doctor 3</em></td> 
     <td><p>Some really long text which can have some different basic HTML <em>formatting in it</em></p></td> 
    </tr> 
</table>

來源

2017-04-19 Brenden Kehren

你能分享一個要解析/渲染的內容樣本嗎？這個內容是一些小的子集，像一個富文本編輯器一樣的格式，或者它是任何通用的HTML/CSS的東西？ – COeDev

我添加了一些示例數據@COeDev很抱歉格式不佳。基本上，標籤中的所有內容都是數據庫列。只有這樣，我才能在沒有編輯器格式化的情況下獲得標記的全部效果。 –

如果除了「strong」，「p」，「em」以及html是有效的xml以外沒有其他東西，那麼可以輕鬆解析這些東西並從中創建itext元素。 – COeDev

有了這樣的模式，你可以創建自己的XML/HTML到iText的翻譯。你只需要實現你需要的標籤：

internal interface ICustomElement { IEnumerable<IElement> GetContent(); } 

internal class CustomElementFactory { 
    public ICustomElement GetElement(XmlNode node) { 
    switch (node.Name) { 
     case "p": return new CustomParagraph (node, this); 
     // implement the tags you need using the ICustomElement interface 
     default: // e.g. treat unknown nodes as text 
    } 
} 

public class PdfCreator { 
    public byte[] GetPdf(XmlDocument template) { 
    PdfDocument doc ... 
    CustomElementFactory factory ... 
    foreach(XmlNode node in template.ChildNodes) { 
     doc.AddElements(factory.GetElement(node).GetContent()); 
     // the point why all this is possible in such an easy generic way is that almost every itext element implements the IElement interface and therefore can be added to the document this way. And containers like PdfPCell are taking IElements as well. 
     // Good job itext guys! ;) 
    } 

    return doc.CloseDocument(); 
    } 
} 

// here comes the magic: 

internal class CustomParagraph : ICustomElement { 
    // ctor storing the xmlnode and factory in private field 
    public IEnumerable<IElement> GetContent() { 
    Paragraph p = new Paragraph(); 
    p.Add(node.InnerText); // create a underline or bold or whatever font here when you are implementing the special html tags 

    // if the node has child elements, get their content by calling the factory.GetElement(child).GetContent() for each child. Then loop over the the IElement.Chunks collection of each IElement to add the containing chunks to the paragraph of this scope. This way you will be able to process nested html tags recursively. 
    // find a way to pass the style information of this scope to the factory when processing child nodes, so you will be able to render <strong>bold<u>underlindANDBOLD</u></strong> stuff correctly 

    return new List<IElement> { p }; 
    } 
}

這需要一些工作和微調，但它可以做到。

來源

2017-04-21 05:36:48 COeDev

用不同的頁面分析HTML中的iText塊循環

回答

相關問題