2013-05-13 35 views
0

我有一個小問題,不知道從哪裏開始。 我有一個文本文件,其中包含以下信息。將文本格式化爲單獨的文件

MINI COOPER 2007, 30,000 miles, British Racing Green, full service history, metallic paint, alloys. Great condition. £5,995 ono Telephone xxxxx xxxxx 

我需要填充上述信息的格式如下

<advert> 
    <manufacturer></manufacturer> 
    <make></make> 
    <model></make> 
    <price></price> 
    <miles></miles> 
    <image></image> 
    <desc><![CDATA[desc> 
    <expiry></expiry> // Any point in the future 
    <url></url> // Optional 
</advert> 
<advert> 

輸出應該是。

</advert> 
<advert> 
    <manufacturer>MINI</manufacturer> 
    <make></make> 
    <model></make> 
    <price>5,995</price> 
    <miles>30000</miles> 
    <image></image> 
    <desc><![CDATA[2007, British Racing Green, full service history, metallic paint, alloys. Great condition.Telephone xxxxxx xxxxxx]]></desc> 
    <expiry>Todays date 13/05/2013</expiry> 
    <url></url> 
</advert> 

任何幫助將創建讚賞。

+0

一個'python'腳本,或者'gawk'腳本中使用'-F,'能有所幫助。你嘗試了什麼?沒有顯示你嘗試過的代碼,你將無法獲得幫助... – 2013-05-13 11:00:41

+0

我曾經有人指點我在正確的方向.. – 2013-05-13 11:11:40

+0

我確實指出了一些方向......但你必須學習足夠的關注他們。 – 2013-05-13 11:14:35

回答

1

由於有時逗號是字段的一部分,有時它們不是你不能使用逗號或其他任何字段作爲分隔符,所以你需要在GNU awk(對於gensub()和strftime())這樣的東西。 :

gawk '{ 
    print "<advert>" 
    printf "\t<manufacturer>%s</manufacturer>\n", $1 
    printf "\t<make></make>\n" 
    printf "\t<model></model>\n" 
    printf "\t<price>%s</price>\n", gensub(/.*£([[:digit:],]+).*/,"\\1","") 
    printf "\t<miles>%s</miles>\n", gensub(/.*[[:space:]]([[:digit:],]+)[[:space:]]+miles.*/,"\\1","") 
    printf "\t<image></image>\n" 
    printf "\t<desc><![CDATA[%s]]></desc>\n", gensub(/.*[[:space:]]+miles[[:space:]]*,[[:space:]]*(.*)/,"\\1","") 
    printf "\t<expiry>Todays date %s</expiry>\n", strftime("%d/%m/%Y") 
    printf "\t<url></url>\n" 
    print "</advert>" 
}' file 

我的編輯似乎窒息英鎊的跡象所以這裏是一個使用#符號,而不是運行上面的腳本:

$ cat file 
MINI COOPER 2007, 30,000 miles, British Racing Green, full service history, metallic paint, alloys. Great condition. #5,995 ono Telephone xxxxx xxxxx 

$ gawk '{ 
    print "<advert>" 
    printf "\t<manufacturer>%s</manufacturer>\n", $1 
    printf "\t<make></make>\n" 
    printf "\t<model></model>\n" 
    printf "\t<price>%s</price>\n", gensub(/.*#([[:digit:],]+).*/,"\\1","") 
    printf "\t<miles>%s</miles>\n", gensub(/.*[[:space:]]([[:digit:],]+)[[:space:]]+miles.*/,"\\1"," 
") 
    printf "\t<image></image>\n" 
    printf "\t<desc><![CDATA[%s]]></desc>\n", gensub(/.*[[:space:]]+miles[[:space:]]*,[[:space:]]*(. 
*)/,"\\1","") 
    printf "\t<expiry>Todays date %s</expiry>\n", strftime("%d/%m/%Y") 
    printf "\t<url></url>\n" 
    print "</advert>" 
}' file 
<advert> 
     <manufacturer>MINI</manufacturer> 
     <make></make> 
     <model></model> 
     <price>5,995</price> 
     <miles>30,000</miles> 
     <image></image> 
     <desc><![CDATA[British Racing Green, full service history, metallic paint, alloys. Great con 
dition. #5,995 ono Telephone xxxxx xxxxx]]></desc> 
     <expiry>Todays date 13/05/2013</expiry> 
     <url></url> 
</advert> 
0

下面是一些例子代碼,應該讓你去至少。的script.awk

awk -f script.awk file.txt 

內容:

{ 
    for (i=1;i<=NF;i++) { 

     if ($i == "miles,") { 
      miles = $(i - 1) 

      $i = $(i - 1) = "" 
     } 

     if ($i ~ /£/) { 
      price = substr($i, 2) 

      $i = $(i + 1) = "" 
     } 
    } 

    gsub(/ +/, " "); 

    print "<advert>" 
    print "\t<manufacturer>" $1 "</manufacturer>" 
    print "\t<make></make>" 
    print "\t<model></make>" 
    print "\t<price>" price "</price>" 
    print "\t<miles>" miles "</miles>" 
    print "\t<image></image>" 
    print "\t<desc><![CDATA[" $0 "]></desc>" 
    print "\t<expiry>" strftime("%d/%m/%Y") "</expiry>" 
    print "\t<url></url>" 
    print "</advert>" 
} 

結果:像運行

<advert> 
    <manufacturer>MINI</manufacturer> 
    <make></make> 
    <model></make> 
    <price>5,995</price> 
    <miles>30,000</miles> 
    <image></image> 
    <desc><![CDATA[MINI COOPER 2007, British Racing Green, full service history, metallic paint, alloys. Great condition. Telephone xxxxx xxxx]></desc> 
    <expiry>13/05/2013</expiry> 
    <url></url> 
</advert> 
+0

非常感謝大家。至少我有一些從哪裏開始。 – 2013-05-13 12:50:27

+0

Steve。非常感謝您的信息。 – 2013-05-13 13:08:47

相關問題