替換字符爲Doc的Docx

-1

我有一個包含中文字符和其他亞洲語言的docx文件。我可以在我的筆記本電腦上完美地將docx文件轉換爲PDF文件，中文字符正確地嵌入到PDF中，但是當相同的代碼作爲Linux服務器上的可運行jar運行時，中文字符會被替換爲＃符號。有人可以指導我解決這個問題嗎？感謝您提前給予幫助。下面替換字符爲Doc的Docx

public static void main(String[] args) throws Exception { 

    try { 

     Docx4jProperties.getProperties().setProperty("docx4j.Log4j.Configurator.disabled", "true"); 
     Log4jConfigurator.configure(); 
     org.docx4j.convert.out.pdf.viaXSLFO.Conversion.log.setLevel(Level.OFF); 

     System.out.println("Getting input Docx File"); 
     InputStream is = new FileInputStream(new File(
       "C:/Users/nithins/Documents/plugin docx to pdf/other documents/Contains Complex Fonts Verified.docx")); 
     WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(is); 
     wordMLPackage.setFontMapper(new IdentityPlusMapper()); 

     System.out.println("Setting File Encoding"); 
     System.setProperty("file.encoding", "Identity-H"); 
     System.out.println("Generating PDF file"); 

     org.docx4j.convert.out.pdf.PdfConversion c = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
       wordMLPackage); 
     File outFile = new File(
       "C:/Users/nithins/Documents/plugin docx to pdf/other documents/Contains Complex Fonts Verified.pdf"); 
     OutputStream os = new FileOutputStream(outFile); 
     c.output(os, new PdfSettings()); 
     os.close(); 

     System.out.println("Output pdf file generated"); 
    } catch (Exception e) { 
     e.printStackTrace(); 
    } 

} 

public static String changeExtensionToPdf(String path) { 
    int markerIndex = path.lastIndexOf(".docx"); 
    String pdfFile = path.substring(0, markerIndex) + ".pdf"; 
    return pdfFile; 
}

來源

2017-05-04 nithin subramanian

您對該docx使用java解決方案進行pdf轉換。這就是你告訴我們的。所以我們只能說在這個解決方案中你似乎做錯了什麼。 – mkl

對不起，我剛剛用我的java代碼編輯了這個問題 –

好的，所以你使用[tag：docx4j]。我添加了該標籤。不幸的是我根本不知道那個產品。只有一句話：'System.setProperty（「file.encoding」，「Identity-H」）應該沒有任何意義，** Identity-H **是PDF內部的東西;系統屬性「file.encoding」通常指的是文本文件，因此，* not *爲PDF，其畢竟是二進制文件，而不是文本文件。此外，即使您仍然遇到麻煩，您仍然可以將日誌級別設置爲關閉狀態，這是奇怪的，畢竟可能會有日誌輸出可以幫助您。 – mkl

的Java代碼給出從docx4j的「入門」文檔複製：

docx4j can only use fonts which are available to it. 

These fonts come from 2 sources: 
• those installed on the computer 
• those embedded in the document 

Note that Word silently performs font substitution. When you open an existing document in 
Word, and select text in a particular font, the actual font you see on the screen won't be 
the font reported in the ribbon if it is not installed on your computer or embedded in the 
document. To see whether Word 2007 is substituting a font, go into Word Options 
> Advanced > Show Document Content and press the "Font Substitution" button. 

Word's font substitution information is not available to docx4j. As a developer, you 3 
options: 
• ensure the font is installed or embedded 
• tell docx4j which font to use instead, or 
• allow docx4j to fallback to a default font 

To embed a font in a document, open it in Word on a computer which has the font installed 
(check no substitution is occuring), and go to Word Options > Save > Embed Fonts in File. 

If you want to tell docx4j to use a different font, you need to add a font mapping. The 
FontMapper interface is used to do this. 

On a Windows computer, font names for installed fonts are mapped 1:1 to the corresponding 
physical fonts via the IdentityPlusMapper. 

A font mapper contains Map<String, PhysicalFont>; to add a font mapping, as per the example in the ConvertOutPDF sample: 
    // Set up font mapper 
    Mapper fontMapper = new IdentityPlusMapper(); 
    wordMLPackage.setFontMapper(fontMapper); 

    // .. example of mapping font Times New Roman which doesn't have certain Arabic glyphs 
    // eg Glyph "ي" (0x64a, afii57450) not available in font "TimesNewRomanPS-ItalicMT". 
    // eg Glyph "ج" (0x62c, afii57420) not available in font "TimesNewRomanPS-ItalicMT". 
    // to a font which does 
    PhysicalFont font 
      = PhysicalFonts.get("Arial Unicode MS"); 
     // make sure this is in your regex (if any)!!! 
    if (font!=null) { 
     fontMapper.put("Times New Roman", font); 
     fontMapper.put("Arial", font); 
    } 

You'll see the font names if you configure log4j debug level logging for 
org.docx4j.fonts.PhysicalFonts

如果您關閉日誌上org.docx4j.fonts，它應該告訴你有關丟失字形。請參閱https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/fonts/GlyphCheck.java

來源

2017-05-07 11:06:16 JasonPlutext

替換字符爲Doc的Docx

回答

相關問題