2012-03-20 117 views
5

我需要驗證大的XML與有限內存使用情況。隨着我發現的每一個代碼到目前爲止我得到了內存不足的錯誤。如何驗證xml針對xsd模式的大xml?

方法我試過:

//method 1 
     SAXParserFactory factory = SAXParserFactory.newInstance(); 
     factory.setValidating(false); 
     factory.setNamespaceAware(true); 

     SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); 
     factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())})); 
     SAXParser parser = factory.newSAXParser(); 
     XMLReader reader = parser.getXMLReader(); 
     reader.setErrorHandler(new SimpleErrorHandler()); 
     reader.parse(new InputSource(inputXml)); 
//method2 

XMLValidationSchemaFactory sf = XMLValidationSchemaFactory.newInstance(XMLValidationSchema.SCHEMA_ID_W3C_SCHEMA); 
      XMLValidationSchema vs = sf.createSchema(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd")); 
      XMLStreamReader2 sr = (XMLStreamReader2) XMLInputFactory2.newInstance().createXMLStreamReader(new FileInputStream(inputXml)); 
      sr.validateAgainst(vs); 
      try { 
       while (sr.hasNext()) { 
       sr.next(); 
       } 
       System.out.println("Validated ok!"); 
      } catch (XMLValidationException ve) { 
       System.err.println("Validation problem: "+ve); 
       isValid = false; 
      } 
      sr.close(); 

//方法3

 SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); 
      String fileName = Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile(); 

      Schema schema = factory.newSchema(new File(fileName)); 
      Validator validator = schema.newValidator(); 

      // create a source from a file 
      StreamSource source = new StreamSource(new File(inputXml)); 

      // check input 

      validator.validate(source); 

每一次我得到了OutOfMemory

編輯

與XOM

SAXParserFactory factory = SAXParserFactory.newInstance(); 
      factory.setValidating(false); 
      factory.setNamespaceAware(true); 

      SchemaFactory schemaFactory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); 
      factory.setSchema(schemaFactory.newSchema(new Source[] {new StreamSource(Thread.currentThread().getContextClassLoader().getResource("xmlresource/XSD_final2.xsd").getFile())})); 
      SAXParser parser = factory.newSAXParser(); 
      XMLReader reader = parser.getXMLReader(); 
      reader.setErrorHandler(new SimpleErrorHandler()); 

      Builder builder = new Builder(reader); 
      builder.build(new FileInputStream(new File(inputXml))); 

還是內存使用率非常高,爲15MB XML - 堆 堆棧跟蹤的250MB:

Exception in thread "AWT-EventQueue-0" java.lang.OutOfMemoryError: Java heap space 
at java.util.Arrays.copyOf(Arrays.java:2367) 
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130) 
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114) 
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535) 
at java.lang.StringBuffer.append(StringBuffer.java:322) 
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.handleCharacters(XMLSchemaValidator.java:1574) 
at com.sun.org.apache.xerces.internal.impl.xs.XMLSchemaValidator.characters(XMLSchemaValidator.java:789) 
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:441) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835) 
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764) 
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123) 
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210) 
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568) 
at nu.xom.Builder.build(Unknown Source) 
at nu.xom.Builder.build(Unknown Source) 

編輯 我的XML具有大的base64字符串

回答

3

看看這篇關於來自Marco Tedone的XML反編組的文章see here。 根據他的結論,我會建議低內存消耗STax:

XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance(); 
    XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(fileInputStream); 
    Validator validator = schema.newValidator(); 
    validator.validate(new StAXSource(xmlStreamReader)); 
+1

感謝您的迴應。這仍然使用xerces,所以我仍然用'-Xmx250m'獲得OutOfMemory。到目前爲止,woodstox對我來說效果最好。 – bunnyjesse112 2012-03-21 07:53:40

0

這有可能是內存被用於模式,而不是源文檔。你沒有提到任何有關架構的內容。有些可以使用非常高的內存量,例如,如果在內容模型中有大的有限值minOccurs或maxOccurs。內存不足異常發生在什麼時候?

+0

感謝您的回覆。 Xsd有一定數量的最小/最大值發生,但並不複雜。我的xml有base64字符串,並在'AbstractStringBuilder'中看到outofmemory – bunnyjesse112 2012-03-20 18:16:13