2015-02-23 95 views
1

我想創建一個csv文件,其中包含執行etl腳本期間獲得的值。例如。我從一個序列中獲得一個新的值,並且想把它附加到csv的名字上。聽起來很簡單,但我很堅持...Scriptella - 動態命名csv文件

我的腳本:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd"> 
<etl> 
    <description>Scriptella ETL</description> 
    <properties> 
     <include href="etl.properties"/> <!--Load from external properties file--> 
    </properties> 
    <!-- Connection declarations --> 
    <connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/> 
    <connection driver="jexl" id="jexl"/> 
    <connection id="log" driver="text"/> 

    <query connection-id="mypostgres"> 
     select nextval('transfer_id_seq') as tid 
     <script connection-id="jexl"> 
      etl.globals['transferID'] = tid; 
     </script> 
     <script connection-id="log"> 
      TransferID: ${etl.globals['transferID']} 
     </script> 
    </query> 

    <script connection-id="log"> 
     TransferID (Outside query): ${etl.globals['transferID']} 
    </script> 

    <connection id="transfer-csv" driver="csv" url="transfer_${etl.globals['transferID']}.csv"> 
     null_string= 
     quote= 
    </connection> 

    <script connection-id="transfer-csv"> 
     col1, col2, col3 
    </script> 
</etl> 

我的輸出:

C:\scriptella>scriptella 
C:\java\jdk1.8\bin\java.exe -cp ;C:\dev\scriptella-1.1\lib\commons-compiler-jdk.jar;C:\dev\scriptella-1.1\lib\commons-compiler.jar;C:\dev\scriptella-1.1\lib\commons-jexl.jar;C:\dev\scriptella-1. 
1\lib\commons-logging.jar;C:\dev\scriptella-1.1\lib\janino.jar;C:\dev\scriptella-1.1\lib\scriptella-core.jar;C:\dev\scriptella-1.1\lib\scriptella-drivers.jar;C:\dev\scriptella-1.1\lib\scriptella-tools 
.jar scriptella.tools.launcher.EtlLauncher 
23.02.2015 17:33:58 <WARNING> XML configuration warning in file:/C:/scriptella/etl.xml(35:7): The content of element type "etl" must match "(description?,properties?,connection*,(script*, 
query*)*)". 
23.02.2015 17:33:58 <INFO> Execution Progress.Initializing properties: 1% 
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=mypostgres, JdbcConnection{org.postgresql.jdbc4.Jdbc4Connection}, Dialect{PostgreSQL 9.3.2}, properties {}: 2% 
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=jexl, JexlConnection, Dialect{JEXL 2.0}, properties {}: 3% 
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=log, TextConnection, Dialect{Text 1.0}, properties {}: 4% 
23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5% 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] prepared: 6% 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] prepared: 7% 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] prepared: 10% 
23.02.2015 17:33:58 <INFO> Registered JMX mbean: scriptella:type=etl,url="file:/C:/scriptella/etl.xml" 
TransferID: 171 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/query[1] executed: 38% 
TransferID (Outside query): 171 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[1] executed: 66% 
23.02.2015 17:33:58 <INFO> Execution Progress./etl/script[2] executed: 95% 
23.02.2015 17:33:58 <INFO> Execution Progress.Complete 
23.02.2015 17:33:58 <INFO> Execution statistics: 
Executed 1 query, 4 scripts, 4 statements 
/etl/query[1]: Element successfully executed (1 statement). Working time 11 milliseconds. Avg throughput: 89,63 statements/sec. 
/etl/query[1]/script[1]: Element successfully executed. Working time 9 milliseconds. 
/etl/query[1]/script[2]: Element successfully executed (1 statement). Working time 4 milliseconds. Avg throughput: 206,37 statements/sec. 
/etl/script[1]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 432,13 statements/sec. 
/etl/script[2]: Element successfully executed (1 statement). Working time 2 milliseconds. Avg throughput: 447,04 statements/sec. 
Total working time: 0,26 second 
23.02.2015 17:33:58 <INFO> Successfully executed ETL file C:\scriptella\etl.xml 

正如你所看到的CSV文件名是錯誤的:

Directory of C:\scriptella 

23.02.2015 17:33 <DIR>   . 
23.02.2015 17:33 <DIR>   .. 
23.02.2015 11:28    282 etl.properties 
23.02.2015 17:32    1.239 etl.xml 
23.02.2015 17:33    133 transfer_transferID.csv 
       3 File(s)   1.654 bytes 
       2 Dir(s)  741.036.032 bytes free 

回答

0

由於Scriptella在啓動時(從您的5%日誌行開始)處理所有連接,因此無法擁有動態連接元素:

23.02.2015 17:33:58 <INFO> Execution Progress.Initialized connection id=transfer-csv, CsvConnection, Dialect{CSV 1.0}, properties {null_string=, quote=}: 5% 

最好的辦法是使用scriptella驅動程序,這將允許您調用另一個etl.xml作爲子程序(並無需全局,實際上):

etl.xml:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd"> 
<etl> 
    <description>Scriptella ETL</description> 
    <properties> 
     <include href="etl.properties"/> <!--Load from external properties file--> 
    </properties> 
    <!-- Connection declarations --> 
    <connection id="mypostgres" driver="$driver" url="$url" user="$user" password="$password" classpath="$classpath"/> 
    <connection id="log" driver="text"/> 
    <connection id="scriptella" driver="scriptella"/> 
    <query connection-id="mypostgres"> 
     select nextval('transfer_id_seq') as tid 
     <script connection-id="log"> 
      TransferID: $tid 
     </script> 
     <script connection-id="scriptella"> 
      dynamic.xml 
     </script> 
    </query> 
</etl> 

dynamic.xml:

<!DOCTYPE etl SYSTEM "http://scriptella.javaforge.com/dtd/etl.dtd"> 
<etl> 
    <connection id="transfer-csv" driver="csv" url="transfer_${tid}.csv"> 
     null_string= 
     quote= 
    </connection> 
    <script connection-id="transfer-csv"> 
     col1, col2, col3 
    </script> 
</etl> 

注意: $ {var}語法在dynamic.xml文件的連接URL中是必需的。

此外,沒有辦法將scriptella附加到csv文件(每次都會截斷),所以我認爲您要完成的任務可能需要重新考慮您的過程。 The Scriptella FAQ on Working with CSV Data建議使用HSQLDB文本表,這可能有所幫助 - 使用HSQLDB或H2來分階段導出數據可能會帶來性能上的提升,並使您的流程在長期運行中更易於維護。

+0

肖恩,謝謝你的回答。很有創意。它看起來是一個好的解決方案。重新思考我的過程......結果是不再使用Scriptella,因爲它使工作非常繁瑣,而且他們也停止了開發。 Groovy是我的選擇,但另一種腳本工具也可以工作。 – Buka 2015-05-04 12:16:35