2013-10-17 133 views
0

我有300個XML文件,每個文件中都有一個路徑(請參閱代碼),我想用Python創建一個路徑列表(.CSV)。從xml文件獲取路徑(Python)

<da:AdminData> 
    <da:Datax /> 
    <da:DataID>223</da:DataID> 
    <da:Date>2013-08-19</da:Date> 
    <da:Time>13:27:25</da:Time> 
    <da:Modification>2013-08-19</da:Modification> 
    <da:ModificationTime>13:27:25</da:ModificationTime> 
    **<da:Path>D:\08\06\xxx-aaa_20130806_111339.dat</da:Path>** 
    <da:ID>xxx-5225-fff</da:ID> 

我寫了下面的代碼,但子目錄

import os, glob, re, time, shutil 

xmlpath = r'D:' 

outfilename = "result.csv" 


list = glob.glob(os.path.join(xmlpath,'*.xml')) 




output = "" 

for file in list : 

    fh = open(file) 
    text = fh.read() 
    pattern = "<da:Path>(.*)</da:Path>" 
    pattern = re.compile(pattern); 
    a = pattern.search(text) 

    if a: 
     output += '\n' + a.group(1) 




logfile = open(outfile, "w") 
logfile.write(output) 
logfile.close() 
+1

您不應該使用正則表達式來分析xml。改用適當的xml解析器。子目錄中的樣本入口是怎樣的? – pajton

回答

0

不起作用要glob的遞歸,最好是使用os.walkfnmatch.fnmatch的組合。例如:

import os 
import fnmatch 


def recursive_glob(rootdir, pattern): 
    matching_files = [] 
    for d, _, fnames in os.walk(rootdir): 
     matching_files.extend(
      os.path.join(d, fname) for fname in fnames 
      if fnmatch.fnmatch(fname, pattern) 
     ) 
    return matching_files 


xmlfiles = recursive_glob(r"D:\", "*.xml") 
+0

我得到一個空的列表:( – user2889987