2014-09-23 46 views
0

我有網址解析形式:正則表達式提取的類別/子類別的網址與Python

www.my-journal.com/category/sub-category/sub-sub-category/title 
www.my-journal.com/category/sub-category/sub-sub-category 
www.my-journal.com/category/sub-category/ 
www.my-journal.com/category/ 
www.my-journal.com 

與改變的類別,亞類和子子類別。

我可以使用什麼正則表達式來提取類別,子類別和子類別存在時的正確表達式?有沒有更好的方法來使用這些變量?

回答

3

爲什麼你不只是在/

categories = url.split('/')[1:] 
分割字符串
2
>>> txt = 'www.my-journal.com/category/sub-category/sub-sub-category/title' 
>>> re.findall(r'/[^/]*', txt) 
['/category', '/sub-category', '/sub-sub-category', '/title'] 

如果只上到3級,那麼也許:

>>> iter = re.finditer(r'/([^/]*)', txt) 
>>> for _, m in zip(range(3), iter): 
...  print(m.group(1)) 
... 
category 
sub-category 
sub-sub-category