2016-03-02 30 views
0

我有一個SQL轉儲的CSV文件,我在BaseX 8.4中使用它。 CSV標頭包含一個扁平的SQL表結構表示。XQuery翻滾窗口:如何匹配首次出現的分組鍵?

CSV與報頭和第一行:

country_id,country_code,country_name,publisher_id,publisher_name,country id,year_began,year_ended,series_id,series_name,sort_name,publisher_id 
2,us,United States,78,Harvard University Press,2,1950,NULL,15,A New Series,New Series,78 

爲baseX CSV解析器產生以下XML表示:

<csv> 
    <record> 
    <country_id>2</country_id> 
    <country_code>us</country_code> 
    <country_name>United States</country_name> 
    <publisher_id>78</publisher_id> 
    <publisher_name>Harvard University Press</publisher_name> 
    <country_id>2</country_id> 
    <year_began>1950</year_began> 
    <year_ended>NULL</year_ended> 
    <series_id>15</series_id> 
    <series_name>A New Series</series_name> 
    <sort_name>New Series</sort_name> 
    <publisher_id>78</publisher_id> 
    </record> 
</csv> 

關於原始數據,我知道一個表的起點開始它的唯一ID,但這些ID名稱也會在其他表中作爲外鍵重複出現。

我想通過匹配表的唯一ID的第一次出現(忽略每個後續事件)來創建窗口/組,重建原始表結構。我至今不能工作,因爲它的ID的每一次出現,不只是第一個匹配:

輸出:

<tables> 
    <table id_name="country_id"> 
    <country_id>2</country_id> 
    <country_code>us</country_code> 
    <country_name>United States</country_name> 
    </table> 
    <table id_name="publisher_id"> 
    <publisher_id>78</publisher_id> 
    <publisher_name>Harvard University Press</publisher_name> 
    </table> 
    <table id_name="country_id"> 
    <country_id>2</country_id> 
    <year_began>1950</year_began> 
    <year_ended>NULL</year_ended> 
    </table> 
    <table id_name="series_id"> 
    <series_id>15</series_id> 
    <series_name>A New Series</series_name> 
    <sort_name>New Series</sort_name> 
    </table> 
    <table id_name="publisher_id"> 
    <publisher_id>78</publisher_id> 
    </table> 
</tables> 

所需的輸出:

<tables> 
    <table id_name="country_id"> 
    <country_id>2</country_id> 
    <country_code>us</country_code> 
    <country_name>United States</country_name> 
    </table> 
    <table id_name="publisher_id"> 
    <publisher_id>78</publisher_id> 
    <publisher_name>Harvard University Press</publisher_name>  
    <country_id>2</country_id> 
    <year_began>1950</year_began> 
    <year_ended>NULL</year_ended> 
    </table> 
    <table id_name="series_id"> 
    <series_id>15</series_id> 
    <series_name>A New Series</series_name> 
    <sort_name>New Series</sort_name>  
    <publisher_id>78</publisher_id> 
    </table> 
</tables> 
+1

這將有助於向我們展示了一些實際的XML輸入(最好是簡化的,我們不不需要遍歷所有15列),而不是期望我們從CSV中轉換出來時看起來像什麼樣子。您當前的查詢在遇到特定元素名稱時會啓動一個新組,並且我看不到與您的問題描述之間有任何相似之處。 –

+0

@MichaelKay XML輸入添加。 – tat

回答

1

我認爲您可能需要使用窗口化解決方案來初始化分段,然後對結果使用「group by」合併具有相同鍵的分段。

0

這方面的工作了一段時間後,我放棄了,乾脆決定用下劃線標記的ID名字的後續出現,就像這樣:

<csv> 
    <record> 
    <country_id>2</country_id> 
    <country_code>us</country_code> 
    <country_name>United States</country_name> 
    <publisher_id>78</publisher_id> 
    <publisher_name>Harvard University Press</publisher_name> 
    <_country_id>2</_country_id> 
    <year_began>1950</year_began> 
    <year_ended>NULL</year_ended> 
    <series_id>15</series_id> 
    <series_name>A New Series</series_name> 
    <sort_name>New Series</sort_name> 
    <_publisher_id>78</_publisher_id> 
    </record> 
</csv> 

這樣,在window表達作品如預期的;然後,我只需要剝離下劃線的元素名稱恢復到原來的形態:

<tables>{ 
     for tumbling window $w in /csv/record/* 
     start $s when $s/name() = ("country_id", 
            "publisher_id", 
            "series_id", 
            "issue_id", 
            "id_activity_fact", 
            "id_person_dim", 
            "id_location_dim", 
            "id_phys_loc_dim", 
            "id_letter_dim") 
     return 
      <table id_name="{$s/name()}">{ 
       for $e in $w 
       return 
        if (starts-with($e/name(), "_")) then 
         element {$e/substring-after(name(), "_")} { $e/string() } 
        else $e 
      }</table> 
}</tables> 

最終結果:

<tables> 
    <table id_name="country_id"> 
    <country_id>2</country_id> 
    <country_code>us</country_code> 
    <country_name>United States</country_name> 
    </table> 
    <table id_name="publisher_id"> 
    <publisher_id>78</publisher_id> 
    <publisher_name>Harvard University Press</publisher_name> 
    <country_id>2</country_id> 
    <year_began>1950</year_began> 
    <year_ended>NULL</year_ended> 
    </table> 
    <table id_name="series_id"> 
    <series_id>15</series_id> 
    <series_name>A New Series</series_name> 
    <sort_name>New Series</sort_name> 
    <publisher_id>78</publisher_id> 
    </table> 
</tables>