正則表達式抓取可變數量的組

這不是一個問如何使用re.findall()或全局修飾符(?g)或\g。這被詢問如何匹配n基團與一個正則表達式的表達式中，用規則n和5 之間3正則表達式抓取可變數量的組

：

需要忽略與第一個非空格字符行作爲#（註釋）
需要得到至少三個項目，始終以：ITEM1，ITEM2，ITEM3
- class ITEM1(stuff)
- model = ITEM2
- fields = (ITEM3)
需要獲得下列任何比賽，如果存在的話（UNKNOWN秩序，並可以丟失）
- write_once_fields = (ITEM4)
- required_fields = (ITEM5)
需要知道匹配的是哪個，所以或者按順序檢索匹配，如果沒有匹配則返回None，或者檢索對。

我的問題是如果這是可行的，以及如何？

我已經得到了這麼多，但它沒有處理評論或未知的順序，或者如果一些項目丟失，並且當你看到下一個class定義時停止搜索這個特定的正則表達式。 https://www.regex101.com/r/cG5nV9/8

(?s)\nclass\s(.*?)(?=\() 
.*? 
model\s=\s(.*?)\n 
.*? 
(?=fields.*?\((.*?)\)) 
.*? 
(?=write_once_fields.*?\((.*?)\)) 
.*? 
(?=required_fields.*?\((.*?)\))

我需要一個條件嗎？

感謝您的任何提示。

來源

2015-03-02 ehacinom

這似乎不太適合正則表達式。你應該解析這個。 – 2015-03-02 20:34:29

@AdamSmith有沒有辦法解析它沒有正則表達式？ – ehacinom 2015-03-02 21:02:09

AdamSmith是正確的，只是循環遍歷文件中的行，如果它們以'＃'開始，則跳過，然後選擇一個函數根據第一個單詞解析行。你可以建立一個對或任何其他的列表並驗證結果。 – jjm 2015-03-02 21:06:56

我會做這樣的事情：

from collections import defaultdict 
import re 

comment_line = re.compile(r"\s*#") 
matches = defaultdict(dict) 

with open('path/to/file.txt') as inf: 
    d = {} # should catch and dispose of any matching lines 
      # not related to a class 
    for line in inf: 
     if comment_line.match(line): 
      continue # skip this line 
     if line.startswith('class '): 
      classname = line.split()[1] 
      d = matches[classname] 
     if line.startswith('model'): 
      d['model'] = line.split('=')[1].strip() 
     if line.startswith('fields'): 
      d['fields'] = line.split('=')[1].strip() 
     if line.startswith('write_once_fields'): 
      d['write_once_fields'] = line.split('=')[1].strip() 
     if line.startswith('required_fields'): 
      d['required_fields'] = line.split('=')[1].strip()

你也許可以用正則表達式匹配這樣做更容易。

comment_line = re.compile(r"\s*#") 
class_line = re.compile(r"class (?P<classname>)") 
possible_keys = ["model", "fields", "write_once_fields", "required_fields"] 
data_line = re.compile(r"\s*(?P<key>" + "|".join(possible_keys) + 
         r")\s+=\s+(?P<value>.*)") 

with open(... 
    d = {} # default catcher as above 
    for line in ... 
     if comment_line.match(line): 
      continue 
     class_match = class_line.match(line) 
     if class_match: 
      d = matches[class_match.group('classname')] 
      continue # there won't be more than one match per line 
     data_match = data_line.match(line) 
     if data_match: 
      key,value = data_match.group('key'), data_match.group('value') 
      d[key] = value

但是這可能更難理解。因人而異。

來源

2015-03-02 21:48:22

正則表達式抓取可變數量的組

回答

相關問題