2014-12-05 51 views
2

我發現了幾個解決我的問題的部分問題(請參閱herehere,但我在集成它們時遇到問題,我有一組要轉換的XML記錄。爲製表符分隔格式然而,並非所有的XML記錄具有所有領域,以及一些包含一個字段的多個實例使用XSLT將複雜的XML複製到TSV

兩個示例XML記錄:

<?xml version="1.0" encoding="UTF-8" ?> 
<marc:collection xmlns:marc="http://www.loc.gov/MARC21/slim" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd"> 
    <marc:record> 
     <marc:leader>02179 am a 002893u  </marc:leader> 
     <marc:controlfield tag="001">12789</marc:controlfield> 
     <marc:controlfield tag="005">20120521</marc:controlfield> 
     <marc:controlfield tag="007">cuuuu---auuuu</marc:controlfield> 
     <marc:controlfield tag="008">120521s|||| xx  o  0 u ||| |</marc:controlfield> 
     <marc:datafield tag="020" ind1=" " ind2=" "> 
      <marc:subfield code="a">9789089640574</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="100" ind1="1" ind2=" "> 
      <marc:subfield code="a">Rooij van ,Robert</marc:subfield> 
      <marc:subfield code="4">aut</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="245" ind1="1" ind2=" "> 
      <marc:subfield code="a">New Perspectives on Games and Interaction</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="260" ind1=" " ind2=" "> 
      <marc:subfield code="b">Amsterdam University Press</marc:subfield> 
      <marc:subfield code="c">2008</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="300" ind1=" " ind2=" "> 
      <marc:subfield code="a">1 electronic resource (330 p.)</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="520" ind1=" " ind2=" "> 
      <marc:subfield code="a">This volume is a collection of papers ...</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="650" ind1=" " ind2="0"> 
      <marc:subfield code="a">Mathematics</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="650" ind1=" " ind2="0"> 
      <marc:subfield code="a">Philosophy (General)</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="650" ind1=" " ind2="0"> 
      <marc:subfield code="a">Economic theory. Demography</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Economics</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Philosophy</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Mathematics</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Economie</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Filosofie</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="653" ind1=" " ind2=" "> 
      <marc:subfield code="a">Wiskunde</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="700" ind1="1" ind2=" "> 
      <marc:subfield code="a">Apt ,Krzysztof</marc:subfield> 
      <marc:subfield code="4">aut</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="856" ind1="4" ind2="0"> 
      <marc:subfield code="u">http://www.doabooks.org/doab?func=fulltext&amp;rid=12789</marc:subfield> 
      <marc:subfield code="z">Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc)</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="856" ind1="4" ind2="0"> 
      <marc:subfield code="u">http://www.oapen.org/download?type=document&amp;docid=340074</marc:subfield> 
     </marc:datafield> 
    </marc:record> 
    <marc:record> 
     <marc:leader>01452 am a 001933u  </marc:leader> 
     <marc:controlfield tag="001">15497</marc:controlfield> 
     <marc:controlfield tag="005">20140217</marc:controlfield> 
     <marc:controlfield tag="007">cuuuu---auuuu</marc:controlfield> 
     <marc:controlfield tag="008">140217s|||| xx  o  0 u ||| |</marc:controlfield> 
     <marc:datafield tag="020" ind1=" " ind2=" "> 
      <marc:subfield code="a">9788867050673</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="100" ind1="1" ind2=" "> 
      <marc:subfield code="a">Emanuele Haus</marc:subfield> 
      <marc:subfield code="4">aut</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="245" ind1="1" ind2=" "> 
      <marc:subfield code="a">Dynamics of an elastic satellite with internal friction.</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="260" ind1=" " ind2=" "> 
      <marc:subfield code="b">Ledizioni - LediPublishing</marc:subfield> 
      <marc:subfield code="c">2013</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="300" ind1=" " ind2=" "> 
      <marc:subfield code="a">1 electronic resource (p.)</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="520" ind1=" " ind2=" "> 
      <marc:subfield code="a">n this thesis, we study the dynamics...</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="546" ind1=" " ind2=" "> 
      <marc:subfield code="a">english</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="650" ind1=" " ind2="0"> 
      <marc:subfield code="a">Mathematics</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="856" ind1="4" ind2="0"> 
      <marc:subfield code="u">http://www.doabooks.org/doab?func=fulltext&amp;rid=15497</marc:subfield> 
      <marc:subfield code="z">Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa)</marc:subfield> 
     </marc:datafield> 
     <marc:datafield tag="856" ind1="4" ind2="0"> 
      <marc:subfield code="u">http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf</marc:subfield> 
     </marc:datafield> 
    </marc:record> 
</marc:collection> 

我一直在試圖適應XSLT從這個previous answer,到目前爲止幾乎沒有運氣:

<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xpath-default-namespace="http://www.loc.gov/MARC21/slim"> 
    <xsl:output method="text"/> 
    <xsl:variable name="delimiter" select="'&#09;'"/> 

    <xsl:strip-space elements="*"/> 
    <xsl:output method="text"/> 

    <xsl:key name="field" 
     match="/collection/record/datafield/subfield" 
     use="concat(../@tag,@code)"/> 

    <!-- variable containing the first occurrence of each field --> 
    <xsl:variable name="allFields" 
     select="/collection/record/datafield/subfield 
       [generate-id() 
       =generate-id(key('field', 
            concat(../@tag,@code))[1])]" /> 

    <xsl:template match="/"> 

     <xsl:for-each select="$allFields"> 
      <xsl:sort select="substring(concat(../@tag,@code),1,3)" 
         data-type="number"/> 
      <xsl:value-of select="concat(../@tag,@code)" /> 
      <xsl:if test="position() &lt; last()"> 
       <xsl:value-of select="$delimiter" /> 
      </xsl:if> 
     </xsl:for-each> 
     <xsl:text>&#10;</xsl:text> 
     <xsl:apply-templates select="*/*" /> 
    </xsl:template> 

    <xsl:template match="*"> 
     <xsl:variable name="this" select="." /> 

     <xsl:for-each select="$allFields"> 
      <xsl:sort 
       select="substring(concat(../@tag,@code),1,3)" 
       data-type="number"/> 
      <xsl:value-of 
       select="$this/*[@code = current()/@code]" /> 
      <xsl:if test="position() &lt; last()"> 
       <xsl:value-of select="$delimiter" /> 
      </xsl:if> 
     </xsl:for-each> 
     <xsl:text>&#10;</xsl:text> 
    </xsl:template> 
</xsl:stylesheet> 

在輸出我想實現的,頭部將包括leader其次是@tag唯一值由tag(串聯起來subfield/@code的子場),按升序排序:

leader 001 005 007 008 020a 100a 1004 245a 260b 260c 300a 520a 546a 650a 653a 700a 7004 856u 856z 

如果一個記錄有一個field/subfield組合多個值,我想concantenate他們在一起,例如:

653a 
Economics|Philosophy|Mathematics 

但是,如果某條記錄缺少某個特定字段,我只想輸出一個製表符,以保持對齊。

全樣本TSV輸出:

leader 001 005 007 008 020a 100a 1004 245a 260b 260c 300a 520a 546a 650a 653a 700a 7004 856u 856z           
02179 am a 002893u   12789 20120521 cuuuu---auuuu 120521s|||| xx  o  0 u ||| | 9789089640574 Rooij van ,Robert aut New Perspectives on Games and Interaction Amsterdam University Press 2008 1 electronic resource (330 p.) This volume is a collection of papers  Mathematics|Philosophy (General)|Economic theory. Demography Economics|Philosophy|Mathematics|Economie|Filosofie|Wiskunde Apt ,Krzysztof< aut http://www.doabooks.org/doab?func=fulltext&amp;rid=12789|http://www.oapen.org/download?type=document&amp;docid=340074 Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc)          
01452 am a 001933u   15497 20140217 cuuuu---auuuu 140217s|||| xx  o  0 u ||| | 9788867050673 Emanuele Haus aut Dynamics of an elastic satellite with internal friction. Ledizioni - LediPublishing 2013 1 electronic resource (p.) In this thesis, we study the dynamics of an elastic body english Mathematics    http://www.doabooks.org/doab?func=fulltext&amp;rid=15497|http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa)           
+0

你能發佈確切的完整輸出從您的(希望代表)例如預期? – 2014-12-05 16:25:45

回答

2

我建議你試試這樣說:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:marc="http://www.loc.gov/MARC21/slim" 
exclude-result-prefixes="marc"> 
<xsl:output method="text" encoding="UTF-8"/> 

<xsl:variable name="fields"> 
    <xsl:for-each-group select="/marc:collection/marc:record/marc:datafield" group-by="@tag"> 
     <xsl:sort select="@tag"/> 
      <xsl:for-each select="marc:subfield"> 
       <xsl:sort/> 
       <field tag="{current-grouping-key()}" code="{@code}">a</field> 
      </xsl:for-each> 
    </xsl:for-each-group> 
</xsl:variable> 

<xsl:template match="/"> 
    <!-- header --> 
    <xsl:for-each select="$fields/field"> 
     <xsl:value-of select="@tag"/> 
     <xsl:value-of select="@code"/> 
     <xsl:if test="position()!=last()"> 
      <xsl:text>&#9;</xsl:text> 
     </xsl:if> 
    </xsl:for-each> 
    <xsl:text>&#10;</xsl:text> 
    <!-- data --> 
    <xsl:for-each select="marc:collection/marc:record"> 
     <xsl:variable name="current-record" select="." /> 
     <xsl:for-each select="$fields/field"> 
      <xsl:value-of select="$current-record/marc:datafield[@tag=current()/@tag]/marc:subfield[@code=current()/@code]" separator="|"/> 
      <xsl:if test="position()!=last()"> 
       <xsl:text>&#9;</xsl:text> 
      </xsl:if> 
     </xsl:for-each> 
     <xsl:if test="position()!=last()"> 
      <xsl:text>&#10;</xsl:text> 
     </xsl:if> 
    </xsl:for-each> 
</xsl:template> 

</xsl:stylesheet> 

結果,當適用於您例如輸入:

020a 100a 1004 245a 260c 260b 300a 520a 546a 650a 653a 700a 7004 856z 856u 
9789089640574 Rooij van ,Robert aut New Perspectives on Games and Interaction 2008 Amsterdam University Press 1 electronic resource (330 p.) This volume is a collection of papers ...  Mathematics|Philosophy (General)|Economic theory. Demography Economics|Philosophy|Mathematics|Economie|Filosofie|Wiskunde Apt ,Krzysztof aut Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc) http://www.doabooks.org/doab?func=fulltext&rid=12789|http://www.oapen.org/download?type=document&docid=340074 
9788867050673 Emanuele Haus aut Dynamics of an elastic satellite with internal friction. 2013 Ledizioni - LediPublishing 1 electronic resource (p.) n this thesis, we study the dynamics... english Mathematics    Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa) http://www.doabooks.org/doab?func=fulltext&rid=15497|http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf 

注意:我不能在輸入或輸出找出「領頭羊」的角色。

2

你說「如果記錄中缺少一個特定的領域」 - 從此我推斷,你必須要導出的字段列表。 (所有的MARC?每個理論上可能的字段從000到999?只有你可以說,而你沒有說過。)如果你沒有一個你想要導出的字段的列表,那麼你的問題陳述是自發的,矛盾,你需要更好地理解這個問題。

例如,讓我們說,您想要導出變量$ fields中列出的字段。

<xsl:variable name="fields" as="xs:string*" 
    select="tokenize('001 005 007 008 020 
        100 245 260 260 300 
        520 546 650 653 700 
        856', '\s+')"/> 

您當前的問題是,你的輸出是由存在於輸入的領域,是許多XSLT程序員稱之爲「推」的樣式表形。您希望輸出由$字段中的字段列表形成,而不是由輸入形成 - 您希望這些XSLT程序員調用「拉」樣式表。當我們爲非XML系統(如電子表格)準備數據時,拉取樣式表很常見,這些數據對結構的變化不太好;他們在程序員程序員中也很常見,他們不知道其他方式來思考問題。這兩種方式都會導致一些XSLT程序員在拉樣式表中看不起他們的鼻子,但是如果您已經正確描述了您的問題,那麼拉樣式表就是您所需要的。

從目前爲止已經說過,您應該能夠看到您的問題是,模板正在/正在通過處理輸入構建輸出,使用<xsl:apply-templates select="*/*" />。如果輸入沒有546個字段,則沒有機會在他們已經出現的地方插入一個標籤,而沒有很多不必要的努力。

你想用迭代遍歷$ fields中的字段編號的結構來替換當前的迭代在grand-children上的apply-templates,並且爲每個字段編號發出一個標籤和任何其他適當的信息,其中另一個適當的信息取決於具有該號碼的字段是否存在於輸入中。在XSLT 3中。0您可以將模板應用於一系列值,因此您可以編寫<xsl:apply-templates select="$fields"/>,但在2.0中,這不是一個選項。 2.0中可用的選項包括:

  • 表示$字段不是字符串序列,而是元素序列;調用<xsl:apply-templates select="$fields"/>來遍歷所需的字段編號。您需要記住從輸入文檔傳遞一個節點(根是一個不錯的選擇),所以您可以從字段編號的模板中重新獲取它。

  • 調用具有$域作爲一個參數命名模板;在命名模板中,從列表中選取第一個字段編號,對其進行處理,然後遞歸調用同一個命名模板,並使用列表的其餘部分。如果沒有第一個字段號碼,字段序列是空的,並且您完成了。

  • 寫在相同的方式,剛纔描述的命名模板工作的遞歸函數。

  • 編寫處理一個場號一個MARC記錄的功能,並從一個XPath for表達叫它:

    <xsl:template match="marc:record"> 
        ... 
        <xsl:sequence select="for $fn in $fields 
        return my:one-field-one-record($fn, .) 
        "/> 
        ... 
    </xsl:template> 
    
+0

「*如果你沒有要導出*字段列表」我想OP說很清楚自己想要的任何記錄使用的所有獨特的領域? – 2014-12-05 16:30:04

+0

@ C.M.Sperberg-McQueen,謝謝。字段列表將是MARC記錄的給定文件中存在的一組唯一的字段/子字段對。我沒有預定義的列表。如果文件中的一條記錄有653字段,但另一條記錄缺少該字段,那麼該字段將在當前MARC記錄文件的上下文中從第二條記錄「丟失」。 – tat 2014-12-05 16:37:40

+0

@ michael.hor257k,你似乎已經比我更好地理解OP。即使在澄清之後,我在問題描述中看到這些信息也是不成功的。 – 2014-12-06 02:15:20

2

這是在XSLT 1.0可能的。

下面的解決方案是圍繞獨特標籤的廣泛文檔列表構建並遍歷該列表對每條記錄。實際上,即使特定標籤不存在於記錄中,也可以輸出分隔符。

<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:marc="http://www.loc.gov/MARC21/slim" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
> 
    <xsl:output method="text" encoding="Windows-1252" /> 

    <xsl:param name="hDelim" select="'&#x9;'" /><!-- vertical delimiter --> 
    <xsl:param name="vDelim" select="'&#xA;'" /><!-- horizontal delimiter --> 
    <xsl:param name="sDelim" select="'|'" /><!-- subfield delimiter --> 

    <!-- group tags by @tag + @code --> 
    <xsl:key name="kAllTags" match="marc:controlfield | marc:subfield" use=" 
    concat(@tag, ../@tag, @code) 
    " /> 
    <!-- group tags by record ID + @tag + @code --> 
    <xsl:key name="kRecordTags" match="marc:controlfield | marc:subfield" use=" 
    concat(generate-id(ancestor::marc:record), ':', @tag|../@tag, @code) 
    " /> 
    <!-- build a list of unique tags to iterate over --> 
    <xsl:variable name="uniqueTags" select=" 
    (//marc:controlfield | //marc:subfield)[ 
     generate-id() = generate-id(key('kAllTags', concat(@tag | ../@tag, @code))) 
    ] 
    " /> 

    <xsl:template match="marc:collection"> 
    <!-- write header line --> 
    <xsl:text>leader</xsl:text> 
    <xsl:value-of select="$hDelim" /> 

    <xsl:apply-templates select="$uniqueTags" mode="head"> 
     <xsl:sort select="concat(@tag|../@tag, @code)" /> 
    </xsl:apply-templates> 
    <xsl:value-of select="$vDelim" /> 

    <!-- write individual records --> 
    <xsl:apply-templates select="marc:record" /> 
    </xsl:template> 

    <xsl:template match="marc:record"> 
    <xsl:variable name="recordId" select="generate-id()" /> 

    <xsl:value-of select="marc:leader" /> 
    <xsl:value-of select="$hDelim" /> 

    <!-- for each unique tag, find the fields that have that tag on this record --> 
    <xsl:for-each select="$uniqueTags"> 
     <xsl:variable name="tagKey" select="concat($recordId, ':', @tag|../@tag, @code)" /> 
     <xsl:apply-templates select="key('kRecordTags', $tagKey)" mode="data" /> 
     <xsl:if test="position() != last()"><xsl:value-of select="$hDelim" /></xsl:if> 
    </xsl:for-each> 
    <xsl:if test="position() != last()"><xsl:value-of select="$vDelim" /></xsl:if> 
    </xsl:template> 

    <xsl:template match="marc:controlfield | marc:subfield" mode="head"> 
    <xsl:value-of select="concat(@tag|../@tag, @code)" /> 
    <xsl:if test="position() != last()"><xsl:value-of select="$hDelim" /></xsl:if> 
    </xsl:template> 

    <xsl:template match="marc:controlfield | marc:subfield" mode="data"> 
    <xsl:value-of select="normalize-space()" /> 
    <xsl:if test="position() != last()"><xsl:value-of select="$sDelim" /></xsl:if> 
    </xsl:template> 
</xsl:stylesheet> 

此模板生成,與輸入數據:

 
leader 001 005 007 008 020a 1004 100a 245a 260b 260c 300a 520a 546a 650a 653a 7004 700a 856u 856z 
02179 am a 002893u   12789 20120521 cuuuu---auuuu 120521s|||| xx o 0 u ||| | 9789089640574 Rooij van ,Robert aut New Perspectives on Games and Interaction Amsterdam University Press 2008 1 electronic resource (330 p.) This volume is a collection of papers ... Mathematics|Philosophy (General)|Economic theory. Demography Economics|Philosophy|Mathematics|Economie|Filosofie|Wiskunde Apt ,Krzysztof aut http://www.doabooks.org/doab?func=fulltext&rid=12789|http://www.oapen.org/download?type=document&docid=340074 Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial (CC by-nc) 
01452 am a 001933u   15497 20140217 cuuuu---auuuu 140217s|||| xx o 0 u ||| | 9788867050673 Emanuele Haus aut Dynamics of an elastic satellite with internal friction. Ledizioni - LediPublishing 2013 1 electronic resource (p.) n this thesis, we study the dynamics... Mathematics    http://www.doabooks.org/doab?func=fulltext&rid=15497|http://www.ledizioni.it/stag/wp-content/uploads/2014/02/tesi_haus.pdf Description of rights in Directory of Open Access Books (DOAB): Attribution Non-commercial Share Alike (CC by-nc-sa) english 
+0

「*這是可能的XSLT 1.0和*。」當然是 - 但問題是標籤的XSLT 2.0? – 2014-12-05 19:08:02

+4

這是,但我一開始就不想扔掉答案。另外,XSLT 2.0處理器可以運行1。0樣式表,所以它不完全脫離主題。 :) – Tomalak 2014-12-05 19:13:15