2011-02-16 164 views
1

我需要一些幫助來找到一個教程或示例,以獲取列表理解並將其與來自csv的數據文件進行合併,並將所有這些轉換爲xml文件。從閱讀各種Python書籍& pdfs像ditp,IYOCGwP,learnpythonthe hardway ,, lxml tut,認爲python和在線搜索我大部分的方式在那裏,所以我認爲。我只需要推動將所有東西捆綁在一起。我基本上採取了一個excel電子表格,我將其導出爲csv文件。 csv包含我需要映射到xml文件的記錄行。我對Python很陌生,以爲我會用我的小項目來學習這門語言。列出的代碼並不漂亮,但有效。我可以讀取一個csv文件並將其轉儲到列表中。我可以合併3個列表並輸出結果列表,我可以讓我的程序吐出一個幾乎按我需要的格式佈置的骨架xml。我將列出一個小樣本的實際輸出,以及我正在嘗試使用此代碼下面的xml完成​​的內容。對不起,如果這太冗長了,這是我的第一篇文章。如何將python列表理解轉換爲xml

import csv, datetime, os 
from lxml import etree 
from ElementTree_pretty import prettify 

f = os.path.getsize("SO.csv") 
fh = "SO.csv" 
rh = open(fh, "rU") 

rows = 0 
try: 
    rlist = csv.reader(rh) 
    reports = [] 
    for row in rlist: 
     '''print row.items()''' 
     rowStripped = [x.strip(' ') for x in row] 
     reports.append(rowStripped) 
     rows +=1 
except csv.Error, e: 
    sys.exit('file %s, line %d: %s' % (filename, reader.line_num, e)) 

finally: 
    rh.close() 

root = etree.Element("co_ehs") 
object = etree.SubElement(root, "object") 
event = etree.SubElement(object, "event") 
facets = etree.SubElement(event, "facets") 
categories = etree.SubElement(facets, "categories") 
instance = etree.SubElement(categories, "instance") 
property = etree.SubElement(instance, "property") 

facets = ['header','header','header','header','informational','header','informational'] 

categories =  ['processing','processing','processing','processing','short_title','file_num','short_narrative'] 

property = ['REPORT ID','NEXT REPORT ID','initial-event-date','number','title','summary-docket-num','description-story'] 

print('----------Printing Reports from CSV Data----------') 
print reports 
print('---------END OF CSV DATA-------------') 
print 
mappings = zip(facets, categories, property) 
print('----------Printing Mappings from the zip of facets, categories, property ----------') 
print mappings 
print('---------END OF List Comprehension-------------') 
print 
print('----------Printing the xml skeleton that will contain the mappings and the csv data ----------') 
print(etree.tostring(root, xml_declaration=True, encoding='UTF-8', pretty_print=True)) 
print('---------END OF XML Skeleton-------------') 


----My OUTPUT--- 
----------Printing Reports from CSV Data---------- 
[['1', '12-Dec-04', 'Vehicle Collision', '786689', 'No fault collision due to ice', '-1', '545671'], ['3', '15-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '4', '588456'], ['4', '17-Dec-04', 'OJT Injury', '87362', 'Paint fumes combusted causing 2nd degree burns', '-1', '58871'], ['1000', '12-Nov-05', 'Back Injury', '9854231', 'Lifting without a support device', '-1', '545671'], ['55555', '12-Jan-06', 'Foot Injury', '7936547', 'Office injury - heavy item dropped on foot', '-1', '545671']] 
---------END OF CSV DATA------------- 
----------Printing Mappings from the zip of facets, categories, property ---------- 
[('header', 'processing', 'REPORT ID'), ('header', 'processing', 'NEXT REPORT ID'), ('header', 'processing', 'initial-event-date'), ('header', 'processing', 'number'), ('informational', 'short_title', 'title'), ('header', 'file_num', 'summary-docket-num'), ('informational', 'short_narrative', 'description-story')] 
---------END OF List Comprehension------------- 
----------Printing the xml skeleton that will contain the mappings and the csv data ---------- 

    <?xml version='1.0' encoding='UTF-8'?> 
    <co_ehs> 
     <object> 
     <event> 
      <facets> 
      <categories> 
       <instance> 
       <property/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 
     </object> 
</co_ehs> 

---------END OF XML Skeleton------------- 
----------CSV DATA------------------ 
C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION 
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice" 
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device" 
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot" 

-----------What I want the xml output to look like---------------------- 
    <?xml version="1.0" encoding="UTF-8"?> 
    <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="co_ehs.xsd"> 
     <object id="3" object-type="ehs_report"> 
     <event event-tag="0"> 
      <facets name="header"> 
      <categories name="processing"> 
       <instance instance-tag="0"> 
       <property name="REPORT ID" value="1"/> 
       <property name="NEXT REPORT ID" value="-1"/> 
       <property name="initial-event-date" value="12-Dec-04"/> 
       <property name="number" value="545671"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_title"> 
       <instance-tag="0"> 
       <property name="title" value="Vehicle Collision"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="header"> 
      <categories name="file_num"> 
       <instance-tag="0"> 
       <property name="summary-docket-num" value="786689"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_narrative"> 
       <instance-tag="0"> 
       <property name="description-story" value="No fault collision due to ice"/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 
     </object> 
    </co_ehs> 
+0

什麼是對象的id屬性和事件的事件標籤屬性規則的左側最好的答案?我認爲事件標記只是一個計數器? – 2011-02-22 18:27:44

+0

@ ocaso-protal對象的id屬性是我理解的那個記錄的唯一id或整數。該對象類型的下一個記錄將等於或大於4.我相信事件標記以相似的方式使用,當它插入rdms時,爲每個事件標記賦予一個唯一的標識。我一直負責將csv文件和序列化到符合模式的xml中,以便最終xml將被饋送到rdms中。我想知道如何遍歷映射並在生成整個xml文檔之前將每個映射插入樹中。 – MWR 2011-02-23 13:12:53

+0

@ ocaso-protal我將csv從字典轉換爲列表,因此我的目標是遍歷每個映射並生成相應的標記,並遍歷整個列表,即所有csv數據並將適當的數據項插入到標籤。每個數據項都有一個構面,類別,實例標籤和屬性。 – MWR 2011-02-23 13:29:30

回答

0

這是我的解決方案。我使用lxml,因爲使用框架生成XML通常比使用字符串或模板文件更好。

缺少co_ehs的屬性,但這可以很容易地用一些set() -calls修復。我把它留給你做這件事。

BTW:你可以接受通過單擊選中標記的答案

import csv, datetime, os 
from lxml import etree 

def makeFacet(event, newheaders, ev, facetname, catname, count, nhposstart, nhposend): 
    facets = etree.SubElement(event, "facets", name=facetname) 
    categories = etree.SubElement(facets, "categories", name=catname) 
    instance = etree.SubElement(categories, "instance") 
    instance.set("instance-tag", count) 

    for i in range(nhposstart, nhposend): 
     property = etree.SubElement(instance, "property") 
     property.set("name", newheaders[i]) 
     property.set("value", ev[i].strip()) 


# read the csv 
fh = "SO.csv" 
rh = open(fh, "rU") 

try: 
    rlist = list(csv.reader(rh)) 
except csv.Error as e: 
    sys.exit("file %s, line %d: %s" % (filename, reader.line_num, e)) 
finally: 
    rh.close() 

# generate the xml 

# newheaders is a mapping of the csv column names, because they don't correspondent w/ the XML 
newheaders = ["REPORT_ID","NEXT_REPORT_ID","initial-event-date","number","title","summary-docket-num", "description-story"] 

root = etree.Element("co_ehs") 

object = etree.SubElement(root, "object") 

object.set("id", "3") # Not sure about this one 
object.set("object-type", "ehs-report") 

for c, ev in enumerate(rlist[1:]): 
    event = etree.SubElement(object, "event") 
    event.set("event-tag", "%s"%c) 
    makeFacet(event, newheaders, ev, "header", "processing", "%s"%c, 0, 4) 
    makeFacet(event, newheaders, ev, "informational", "short-title", "%s"%c, 4, 5) 
    makeFacet(event, newheaders, ev, "header", "file_num", "%s"%c, 5, 6) 
    makeFacet(event, newheaders, ev, "informational", "short_narrative", "%s"%c, 6, 7) 

print(etree.tostring(root, xml_declaration=True, encoding="UTF-8", pretty_print=True)) 
0

我創建了名稱的文件'pattern.txt'及以下內容(此縮進)。

請注意8 %s放置在戰略位置。

 <event event-tag="%s"> 
      <facets name="header"> 
      <categories name="processing"> 
       <instance instance-tag="0"> 
       <property name="REPORT ID" value="%s"/> 
       <property name="NEXT REPORT ID" value="%s"/> 
       <property name="initial-event-date" value="%s"/> 
       <property name="number" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_title"> 
       <instance-tag="0"> 
       <property name="title" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="header"> 
      <categories name="file_num"> 
       <instance-tag="0"> 
       <property name="summary-docket-num" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
      <facets name="informational"> 
      <categories name="short_narrative"> 
       <instance-tag="0"> 
       <property name="description-story" value="%s"/> 
       </instance> 
      </categories> 
      </facets> 
     </event> 

我創建的文件'SO.csv'與如下因素的內容:

C_ID,NEXT_C_ID,C_DATE,C_NUMBER,C_EVENT,C_DOCKETNUM,C_DESCRIPTION 
1,-1,12-Dec-04,545671,Vehicle Collision,786689,"No fault collision due to ice" 
3,4,15-Dec-04,588456,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
4,-1,17-Dec-04,58871,OJT Injury,87362,"Paint fumes combusted causing 2nd degree burns" 
1000,-1,12-Nov-05,545671,Back Injury,9854231,"Lifting without a support device" 
55555,-1,12-Jan-06,545671,Foot Injury,7936547,"Office injury - heavy item dropped on foot" 

,我跑了下面的代碼:

import csv 

rid = csv.reader(open('SO.csv','rb')) 
rid.next() 

with open('pattern.txt') as f: 
    pati = f.read() 

xmloutput = [' <?xml version="1.0" encoding="UTF-8"?>', 
      ' <co_ehs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" '\ 
      'xsi:noNamespaceSchemaLocation="co_ehs.xsd">', 
      '  <object id="3" object-type="ehs_report">'] 

for i,row in enumerate(rid): 
    row[0:0] = str(i) 
    xmloutput.append(pati % tuple(row)) 

print '\n'.join(xmloutput) 

這是否幫助你嗎?