2012-07-16 81 views
0

嗨,在我的代碼中運行以下代碼,但是這段代碼在執行過程中崩潰了。java htmlcleaner在清理過程中崩潰

ByteArrayInputStream input = new ByteArrayInputStream(fileContent); 

final HtmlCleaner cleaner = new HtmlCleaner(); 
CleanerProperties props = cleaner.getProperties(); 

DomSerializer doms = new DomSerializer(props, true); 

org.w3c.dom.Document xmlDoc = null; 

try { 
    TagNode node = cleaner.clean(input); 
    xmlDoc = doms.createDOM(node); 
} catch (Exception e) { 
    System.out.println("Tiding error "); 
    e.printStackTrace(); 
} 

這是錯誤的堆棧跟蹤:

NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces. 
    at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.checkDOMNSErr(CoreDocumentImpl.java:2535) 
    at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.setName(AttrNSImpl.java:113) 
    at com.sun.org.apache.xerces.internal.dom.AttrNSImpl.<init>(AttrNSImpl.java:74) 
    at com.sun.org.apache.xerces.internal.dom.CoreDocumentImpl.createAttributeNS(CoreDocumentImpl.java:2138) 
    at com.sun.org.apache.xerces.internal.dom.ElementImpl.setAttributeNS(ElementImpl.java:656) 
    at org.htmlcleaner.DomSerializer.setAttributes(DomSerializer.java:97) 
    at org.htmlcleaner.DomSerializer.createDOM(DomSerializer.java:37) 

任何人的幫助可以找出爲什麼它的發生?

真誠,佐利

回答

0

HTMLCleaner遇到處理命名空間的問題。這是一個XML命名空間聲明,會給它麻煩的例子:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="de" lang="de" 
    xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml" 
    itemscope itemtype="http://schema.org/CreativeWork"> 

,你可以看到itemscope屬性被破壞,使得HtmlCleaner拋出NAME_SPACE_ERR。爲了避免這個問題

一種方法是添加行

props.setNamespacesAware(false); 

果然命名空間處理掉。