2017-09-24 348 views
0

我有新的問題時,我將HTML轉換成DOCX它拋出異常:Docx4j將HTML轉換成DOCX

org.xml.sax.SAXParseException; lineNumber:4; columnNumber:73;實體「NBSP」被引用,但沒有宣佈

我的理解,這是因爲docx4j認爲我的文件是XML,並希望將其轉換爲DOCX,但只有5個在XML和這樣的實體預定義實體因爲nbsp沒有在XML中定義。我怎樣才能讓docx4j將HTML轉換爲doc,而無需在doctype中聲明實體nbsp?

docx4j的作品是不正確的還是它的限制?

這裏是我的代碼:

package ru.simplexsoftware.constructorOfDocuments.web.rest; 
import org.docx4j.convert.in.xhtml.XHTMLImporterImpl; 
import org.docx4j.openpackaging.exceptions.Docx4JException; 
import org.docx4j.openpackaging.exceptions.InvalidFormatException; 
import org.docx4j.openpackaging.packages.WordprocessingMLPackage; 
import org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart; 
import org.springframework.beans.factory.annotation.Autowired; 
import org.springframework.web.HttpRequestHandler; 
import ru.simplexsoftware.constructorOfDocuments.dao.TemplateDao; 
import javax.servlet.ServletException; 
import javax.servlet.http.HttpServletRequest; 
import javax.servlet.http.HttpServletResponse; 
import javax.xml.bind.JAXBException; 
import java.io.ByteArrayInputStream; 
import java.io.ByteArrayOutputStream; 
import java.io.IOException; 
import java.io.InputStream; 
import java.nio.charset.StandardCharsets; 


public class DocxFileDownloadServlet implements HttpRequestHandler { 

@Autowired 
TemplateDao templateDao; 
@Override 
public void handleRequest(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { 

    String parameter = request.getParameter("documentId"); 

    Long documentId = Long.parseLong(parameter); 

    WordprocessingMLPackage wordMLPackage = null; 
    try { 
     wordMLPackage = WordprocessingMLPackage.createPackage(); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 

    NumberingDefinitionsPart ndp = null; 
    try { 
     ndp = new NumberingDefinitionsPart(); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 
    try { 
     wordMLPackage.getMainDocumentPart().addTargetPart(ndp); 
    } catch (InvalidFormatException e) { 
     e.printStackTrace(); 
    } 
    try { 
     ndp.unmarshalDefaultNumbering(); 
    } catch (JAXBException e) { 
     e.printStackTrace(); 
    } 

    XHTMLImporterImpl xHTMLImporter = new XHTMLImporterImpl(wordMLPackage); 
    xHTMLImporter.setHyperlinkStyle("Hyperlink"); 

    String htmlString=templateDao.get(documentId).html; 
    htmlString = htmlString.replaceAll("<br>","<br/>"); 
    InputStream stream = new ByteArrayInputStream(htmlString.getBytes(StandardCharsets.UTF_8.name())); 
    // Convert the XHTML, and add it into the empty docx we made 
    try { 
     wordMLPackage.getMainDocumentPart().getContent().addAll(
       xHTMLImporter.convert(htmlString, null)); 
    } catch (Docx4JException e) { 
     e.printStackTrace(); 
    } 

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream(); 

    try { 
     wordMLPackage.save(outputStream); 
    } catch (Docx4JException e) { 
     e.printStackTrace(); 
    } 


    response.setContentType("application/msword"); 
    response.getOutputStream().write(outputStream.toString().getBytes("UTF-8")); 
    response.flushBuffer(); 

} 
} 
+1

手動或預處理聲明實體通過一個整潔的計劃。 docx4j-ImportXHTML預計格式良好的XML輸入。 – JasonPlutext

+0

JasonPlutext是否可以通過某種方法聲明所有實體?我只是不想手動聲明所有的html實體。 –

回答

0

你可以嘗試使用AltChunkType型插入HTML串入的docx款

wordMLPackage.getMainDocumentPart().addAltChunk(AltChunkType.Xhtml, htmlString .getBytes());