2013-01-11 68 views
3

我試圖從Python中的字符串與此代碼的幫助解析多個日期多個日期,解析使用dateutil

from dateutil.parser import _timelex, parser 
a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 " 
p = parser() 
info = p.info 
def timetoken(token): 
    try: 
    float(token) 
    return True 
    except ValueError: 
    pass 
    return any(f(token) for f in (info.jump,info.weekday,info.month,info.hms,info.ampm,info.pertain,info.utczone,info.tzoffset)) 

def timesplit(input_string): 
    batch = [] 
    for token in _timelex(input_string): 
    if timetoken(token): 
     if info.jump(token): 
     continue 
     batch.append(token) 
    else: 
     if batch: 
     yield " ".join(batch) 
     batch = [] 
    if batch: 
    yield " ".join(batch) 

for item in timesplit(a): 
    print "Found:", item 
    print "Parsed:", p.parse(item) 

和代碼從字符串作爲第二次約會,並採取秒半給我這個錯誤,

raise ValueError, "unknown string format" 

ValueError: unknown string format 
當我改變「下半場」到「第三半」或「半來回」

那麼它工作的所有罰款。

任何人都可以幫我解析這個字符串嗎?

回答

2

解析器無法處理"second"通過timesplit發現,如果將fuzzy參數去是True,它不會破壞,但也不會產生任何有意義的事。

from cStringIO import StringIO 
for item in timesplit(StringIO(a)): 
    print "Found:", item 
    print "Parsed:", p.parse(StringIO(item),fuzzy=True) 

出來:

Found: 12 10 2012 
Parsed: 2012-12-10 00:00:00 
Found: second 
Parsed: 2013-01-11 00:00:00 
Found: 20 10 2012 
Parsed: 2012-10-20 00:00:00 

你必須修復timesplitting或處理錯誤:

OPT1:

失去info.hmstimetoken

OPT2:

from cStringIO import StringIO 
for item in timesplit(StringIO(a)): 
    print "Found:", item 
    try: 
     print "Parsed:", p.parse(StringIO(item)) 
    except ValueError: 
     print 'Not Parsed!' 

出來:

Found: 12 10 2012 
Parsed: 2012-12-10 00:00:00 
Found: second 
Not Parsed! 
Parsed: Found: 20 10 2012 
Parsed: 2012-10-20 00:00:00 
0

如果你只需要日期,可以用正則表達式提取出來,並與日期的作品。

a = "Approve my leave from first half of 12/10/2012 to second half of 20/10/2012 " 

import re 
pattern = re.compile('\d{2}/\d{2}/\d{4}') 
pattern.findall(a) 
['12/10/2012', '20/10/2012']