在正則表達式匹配中修改一個組

所以我有一個除了我的Django（v 1.5）模型的函數，它需要一個文本體並且找到我的所有標記，比如將正確的用戶轉換爲所有其他人。在正則表達式匹配中修改一個組

下面的函數目前可行，但需要我使用note_tags ='。*？\ r \ n'，因爲標籤組0找到所有標籤，無論用戶的暱稱是否在那裏。所以我很好奇我將如何使用這些組，以便我可以刪除所有無用的標籤而無需修改RegEx。

def format_for_user(self, user): 
    body = self.body 
    note_tags = '<note .*?>.*?</note>\r\n' 
    user_msg = False 
    if not user is None: 
     user_tags = '(<note %s>).*?</note>' % user.nickname 
     user_tags = re.compile(user_tags) 
     for tag in user_tags.finditer(body): 
      if tag.groups(1): 
       replacement = str(tag.groups(1)[0]) 
       body = body.replace(replacement, '<span>') 
       replacement = str(tag.group(0)[-7:]) 
       body = body.replace(replacement, '</span>') 
       user_msg = True 
       note_tags = '<note .*?>.*?</span>\r\n' 
    note_tags = re.compile(note_tags) 
    for tag in note_tags.finditer(body): 
     body = body.replace(tag.group(0), '') 
    return (body, user_msg)

來源

2014-09-20 badisa

有你[使用're'解析您的HTML（原因http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self- contains-tags/1732454＃1732454）而不是像'BeautifulSoup'這樣的實際HTML庫？並不是說你想要做什麼是不可能的，但考慮到這對於HTML庫來說是微不足道的，而且你不知道如何編寫正則表達式，而且必須做一些笨拙的事情，比如剝離掉一個字母的前7個字符字符串和你的代碼有一個錯誤，因爲你使用'str.replace'的東西可能會多次出現，等等...... – abarnert 2014-09-20 05:14:21

沒有意識到有一個選擇。將檢查美麗的湯。 – badisa 2014-09-20 19:31:21

所以abarnert是正確的，我不應該使用正則表達式來分析我的HTML，而是我應該使用沿着BeautifulSoup線的東西。

所以我使用了BeautifulSoup，這是由此產生的代碼，並解決了Regex有很多問題。

def format_for_user(self, user): 
    body = self.body 
    soup = BeautifulSoup(body) 
    user_msg = False 
    if not user is None: 
     user_tags = soup.findAll('note', {"class": "%s" % user.nickname}) 
     for tag in user_tags: 
      tag.name = 'span' 
    all_tags = soup.findAll('note') 
    for tag in all_tags: 
     tag.decompose() 
    soup = soup.prettify() 
    return (soup, user_msg)

來源

2014-09-29 03:32:07 badisa

在正則表達式匹配中修改一個組

回答

相關問題