2017-10-16 58 views
0

我嘗試使用Schematron驗證文檔。我使用schema for ISOSTS standardSchematronParseError:無效schematron架構(對於ISOSTS架構)

from lxml import etree 
from lxml.isoschematron import Schematron 


def validate(self, filename: str): 
    file = open(filename) 

    schema_filename = join('/path/to/ISOSTS_validation.sch') 
    schema_file = open(schema_filename) 

    # fixme it works. But fails with ISOSTS scheme 
    # schema_file = StringIO('''\ 
    #  <schema xmlns="http://purl.oclc.org/dsdl/schematron" > 
    #  <pattern id="sum_equals_100_percent"> 
    #   <title>Sum equals 100%.</title> 
    #   <rule context="Total"> 
    #   <assert test="sum(//Percent)=100">Sum is not 100%.</assert> 
    #   </rule> 
    #  </pattern> 
    #  </schema> 
    # ''') 

    sct_doc = etree.parse(schema_file) 
    schematron = Schematron(sct_doc)  ## <- FAIL !!! 

    doc = etree.parse(file) 
    result = schematron.validate(doc) 

    file.close() 
    schema_file.close() 

    return result 

validate('/path/to/feature_doc.xml') 

錯誤消息:

File "/var/www/.../venv/lib/python3.5/site-packages/lxml/isoschematron/__init__.py", line 279, in __init__ 
    schematron_schema_valid.error_log) 
lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param 
<string>:560:0:ERROR:RELAXNGV:RELAXNG_ERR_ELEMNAME: Expecting element schema, got variable 
<string>:0:0:ERROR:RELAXNGV:RELAXNG_ERR_INTEREXTRA: Extra element function in interleave 
<string>:42:0:ERROR:RELAXNGV:RELAXNG_ERR_CONTENTVALID: Element schema failed to validate content 

如何修復呢?

回答

1

我不確定這是非常有幫助的,但我不認爲問題出現在您的代碼中。我認爲問題在於lxml不支持XSLT-2。

您使用的架構需要符合2010 XSLT-2的ISO Schematron [1]。

在氧氣中打開模式並刪除querybinding=xslt2屬性會產生大量問題。這包括第553行(<xsl:param name="num-cols" as="xs:integer"/>)上的驗證錯誤:此元素不允許使用'屬性'。這是lxml在[2]上拋出解析錯誤的行。

lxml沒有實現XSTL-2,並明確聲明它只支持Schematron的「pure-XSLT-1.0 skeleton implementation」(來自http://lxml.de/validation.html#id2的信息)。

你可能會運氣不佳,試圖讓它與lxml一起工作。據我所知,還沒有一個兼容XSLT-2的Python XML解析器(如果有人知道這個解析器,那就太棒了)。

這有點破解,但你可以使用子進程使用外部工具(可能是關鍵+ libsaxon)來執行驗證。這可能是這裏唯一的解決方案。

[1]線被鏈接模式的35: <schema xmlns="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2"

[2] lxml.etree.SchematronParseError: invalid schematron schema: <string>:553:0:ERROR:RELAXNGV:RELAXNG_ERR_EXTRACONTENT: Element function has extra content: param

0

使用XSD模式from archivelxml.etree.XMLSchema解決:

def validate(self, filename: str): 
    file = open(filename) 

    schema_filename = '/path/to/ISOSTS.xsd' 
    schema_file = open(schema_filename) 

    sct_doc = etree.parse(schema_file) 
    xmlschema = etree.XMLSchema(sct_doc) 

    doc = etree.parse(file) 
    result = xmlschema.validate(doc) 

    file.close() 
    schema_file.close() 

    return result