2013-05-13 75 views
1

我有this spreadsheet。我想從它生成一些XML清單。使用AWK從電子表格創建XML?

這裏是電子表格的一部分:

enter image description here

下面是要生成的XML,名稱爲 「MST-5.3_tmp.xml」(文件名基於斷部分)

<?xml version="1.0" encoding="iso-8859-1"?> 

<activity type='cxp:jsp'> 
<handler>mindtap_mastery</handler> 

<!-- Section 5.3 Mastery --> 
<group threshold="1" name="Energy and Temperature Change to Specific Heat"> 
<items> 
    <item src="owms01h/gen.question.32027" title="Mastery Item 1"/> 
</items> 
</group> 
<group threshold="3" name="Specific Heat to Energy or Temperature"> 
<items> 
    <item src="owms01h/gen.question.32040" title="Mastery Item 1"/> 
    <item src="owms01h/gen.question.32041" title="Mastery Item 2"/> 
    <item src="owms01h/gen.question.32046" title="Mastery Item 3"/> 
    <item src="owms01h/gen.question.32048" title="Mastery Item 4"/> 
</items> 
</group> 
<group threshold="2" name="Thermal Equilibrium"> 
<items> 
    <item src="owms01h/gen.question.32378" title="Mastery Item 1"/> 
    <item src="owms01h/gen.question.32380" title="Mastery Item 2"/> 
</items> 
</group> 
<group threshold="2" name="Phase Change Energetics"> 
<items> 
    <item src="owms01h/gen.question.3737" title="Mastery Item 1"/> 
    <item src="owms01h/gen.question.3741" title="Mastery Item 2"/> 
    <item src="owms01h/gen.question.3752" title="Mastery Item 3"/> 
    <item src="owms01h/gen.question.3753" title="Mastery Item 4"/> 
</items> 
</group> 
<group threshold="2" name="Heating Curves - Calculations"> 
<items> 
    <item src="owms01h/gen.question.5640" title="Mastery Item 1"/> 
    <item src="owms01h/gen.question.5641" title="Mastery Item 2"/> 
    <item src="owms01h/gen.question.5642" title="Mastery Item 1"/> 
    <item src="owms01h/gen.question.5643" title="Mastery Item 2"/> 
</items> 
</group> 

</activity> 

我的目標是將電子表格導出爲製表符分隔的文本文件,並使用AWK創建xml。當「部分」列中存在值時,應該創建一個新文件。相鄰的「指令單元」列包含第一個「組」元素的名稱,該組的「項目」以相鄰的「間歇性項目名稱」列中的條目開始,如果下一行沒有「部分」或「指導單位「值,那麼它應該作爲一個項目添加到當前組,如果有」教學單位「值,但沒有」部分「,則應創建一個新組。開始和結束新文件,以及如何使用AWK跳過上述控件中的列/行

到目前爲止,我所擁有的是一個腳本,它創建了一個嵌套的文件,我上面描述的是什麼

#!/bin/bash 

awk -F "\t" '{ 
    if ($2) { 
    print "</items>"; 
    print "</group>"; 
    print "</activity>"; 
    print "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>" 
    print "<activity type='cxp:jsp'>"; 
    print "<handler>mindtap_mastery</handler>"; 
    print "<!--" $2 "-->"; 
    } 
    if ($3) { 
    print "<group threshold=\"1\" name=\"" $3 "\">"; 
    print "<items>"; 
    print "<item src=\"owms01h/" $4 "\" title=\"Mastery Item 1\"/>"; 
    } else { 
    print "<item src=\"owms01h/" $4 "\" title=\"Mastery Item 1\"/>"; 
    } 

}' 'Media Grid_Units 1-5.txt' >> master.xml 
+2

我相信,下載後答案將徹底改變,基於電子表格的形式把它叫做 - XLS/XLSX/ODF ......也許,你正在下載它csv/tab分隔格式,這是很好的。 – anishsane 2013-05-13 15:06:07

+0

是的,我解析數據作爲製表符分隔文本文件。 – 2013-05-13 17:52:05

+1

在awk中這樣做很辛苦。您是否考慮切換到支持直接解析Excel文件的腳本語言?例如http://stackoverflow.com/questions/6157114/easiest-way-to-read-excel-files-in-groovy – 2013-05-13 19:54:39

回答

1

您可以將此保存爲somefile.awkawk -F"\t" -f somefile.awk spreadsheet.tab

NR==1 || !$4 {next} # Skip the header and blank lines 

$2 { # New section 
    if (printingitems) { # close tags 
     print "</items>" >> filename; 
     print "</group>" >> filename; 
     print "</activity>" >> filename; 
    } 
    # Build new filename 
    split($2, part, " "); 
    filename = "mst-"part[2]"_tmp.xml"; 

    print "<?xml version=\"1.0\" encoding=\"iso-8859-1\"?>" >> filename; 
    print "<activity type='cxp:jsp'>" >> filename; 
    print "<handler>mindtap_mastery</handler>" >> filename; 
    print "<!--" $2 "-->" >> filename; 
    printingitems = 0; 
} 

$3 { # New group 
    if (printingitems) { 
    print "</items>" >> filename; 
    print "</group>" >> filename; 
    } 
    groupname = substr($3, 5, length($3)); 
    print "<group threshold=\"1\" name=\"" groupname "\">" >> filename; 
    print "<items>" >> filename; 
    printingitems = 1; 
} 

{ # new item 
    print "<item src=\"owms01h/" $4 "\" title=\"Mastery Item "printingitems++"\"/>" >> filename; 
} 

END { # this assumes all non-blank lines will have an item 
    print "</items>" >> filename; 
    print "</group>" >> filename; 
    print "</activity>" >> filename; 
}