2016-06-01 182 views

回答

2

對於較新的格式,它們通常只是壓縮xml,因此您可以使用標準庫來解壓縮和解析xml。獲取文檔創建者的一些代碼先前是posted as an answer on stackoverflow

import zipfile, lxml.etree 

# open zipfile 
zf = zipfile.ZipFile('my_doc.docx') 
# use lxml to parse the xml file we are interested in 
doc = lxml.etree.fromstring(zf.read('docProps/core.xml')) 
# retrieve creator 
ns={'dc': 'http://purl.org/dc/elements/1.1/'} 
creator = doc.xpath('//dc:creator', namespaces=ns)[0].text 

對於較舊的格式,你可能想看看hachoir-metadata library