2015-12-30 69 views
-4

下面的代碼是ruby表達式。我想將其轉換爲python代碼。我該怎麼做?如何將ruby正則表達式更改爲python正則表達式

add_zzim\(\'(.*?)\',\'(.*?)\',\'(?<param>.*?)\',.* 

來源:

<li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li> 
<li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li> 
<li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li> 
+0

您可以用做第三個元素https://regex101.com – vks

回答

0

這裏是一個非正則表達式的方法。

要提取onclick屬性值,我們打算使用BeautifulSoupHTML解析器;提取add_zzim()參數值 - ast.literal_eval()

完整的工作例如:

from ast import literal_eval 

from bs4 import BeautifulSoup 

data = """ 
<ul> 
    <li class="num" onClick="add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');" title="contents.">14913089</li> 
    <li class="num" onClick="add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');" title="contents.">14913012</li> 
    <li class="num" onClick="add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');" title="contents.">14913045</li> 
</ul> 
""" 

soup = BeautifulSoup(data, "html.parser") 

for li in soup.select("li.num"): 
    args = literal_eval(li["onclick"].replace("add_zzim", "").rstrip(";")) 
    print(args) 

打印:

('BD_AD_08', '14913089', 'helloooo', '3586312774', 'test') 
('BD_AD_08', '14913012', 'helloooo', '3586312774', 'test') 
('BD_AD_08', '14913041', 'helloooo', '3586312774', 'test') 
1
import re 
p = re.compile(ur'add_zzim\(\'(.*?)\',\'(.*?)\',\'(.*?)\',.*') 
test_str = u"<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913089','helloooo','3586312774','test');\" title=\"contents.\">14913089</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913012','helloooo','3586312774','test');\" title=\"contents.\">14913012</li>\n<li class=\"num\" onClick=\"add_zzim('BD_AD_08','14913041','helloooo','3586312774','test');\" title=\"contents.\">14913045</li>\n" 

for i in re.findall(p, test_str): 
    print(i[2]) 

這會給你列出,然後你可以爲 'PARAM'

+0

謝謝..但我想提取「helloooo」文本只。 –

+0

@차재엽你去了 –

相關問題