無法理解findall（）和分組（Python）

我目前正在閱讀「使用Python自動化枯燥的東西」一書，但被困在CH7項目中的一行代碼中。我在這裏無法理解作者的邏輯。無法理解findall（）和分組（Python）

問題可以在最後找到。項目：電話號碼和電子郵件地址提取器。 https://automatetheboringstuff.com/chapter7

的項目概要是：

你的電話和電子郵件地址提取需要做到以下幾點：

-Gets文本關閉剪貼板。

- 查找文本中的所有電話號碼和電子郵件地址。

- 將它們粘貼到剪貼板上。

下面的代碼：

import re, pyperclip 

#extracts phone number 
phoneRegex = re.compile(r'''(
    (\d{3}|\(\d{3}\))?    # area code -> either 561 or (561) 
    (\s|-|\.)?      # separator (if there is) 
    (\d{3})       # first 3 digits 
    (\s|-|\.)      # separator 
    (\d{4})       # last 4 digits 
    (\s*(ext|x|ext.)\s*(\d{2,5}))? # extension 
    )''', re.VERBOSE) 

#extracts email 
emailRegex= re.compile(r'''(
    [a-zA-Z0-9._%+-]+    # username 
    @        # @symbol 
    [a-zA-Z0-0._%+-]+    # domain name 
    (\.[a-zA-Z]{2,4})    # dot something 
    )''',re.VERBOSE) 

# find matches in clipboard text. 
text = str(pyperclip.paste())    #paste all string in to 'text' string 
matches = [] 
for groups in phoneRegex.findall(text):    
    phoneNum= '-'.join([groups[1],groups[3],groups[5]]) #group 1- > area code, group 2-> separation, group 3 -> 699 etc 
    if groups[8] != ' ': 
     phoneNum += ' x' + groups[8] 
    matches.append(phoneNum) 

for groups in emailRegex.findall(text): 
    matches.append(groups[0]) 

#Copy results to the clipboard. (our new string) 

if len(matches) > 0: 
    pyperclip.copy('\n'.join(matches)) 
    print('Copied to clipboard:') 
    print('\n'.join(matches)) 
else: 
    print('No phone numbers of email addresses found.')

在那裏我stucked是這個細分市場：

for groups in phoneRegex.findall(text):    
     phoneNum= '-'.join([groups[1],groups[3],groups[5]]) #area code, first 3 digits, last 4 digits of phone number 
     if groups[8] != ' ': 
      phoneNum += ' x' + groups[8] 
     matches.append(phoneNum)

筆者解釋說，這些都是區號，前3位，而最後4位數字這是從電話號碼提取：

groups[1],groups[3],groups[5]

但這對我來說沒有意義。請注意，for循環遍歷每個元素，'groups'不是整個列表，它只是列表中的一個元素。所以，組[1]將是第一個元素的第二個數字，而不是實際的元素。

只是爲了說明我的問題比較好，這裏是另一個例子：

num= re.compile(r'(\d+)') 
for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 1'): 
    print(groups)

輸出：

for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 1'): 
    print(groups[0])

輸出：

小號o組[0]不是元素，只是元素的一個數字。
希望這是有道理的，因爲我很難理解他的推理。任何幫助，將不勝感激。

UPDATE：好像基團[0]是元組的第一個元素

num= re.compile(r'(\d+)\D+(\d+)\D+(\d+)') 
for groups in num.findall('Extract all 23 numbers 444 from 2414 at, 10,434,555'): 
    groups[0]

輸出：

23 
10

來源

2016-10-03 tadm123

與多於一個的組運行實驗在正則表達式中查看差異，然後閱讀['re.findall'文檔]（https://docs.python.org/2/library/re.html#re.findall）。 – user2357112

電話號碼的正則表達式與電子郵件完全是虛假的，並且無法識別有效的電子郵件地址和大多數電話號碼。例如。電子郵件的本地部分可以有[更多字符]（http://stackoverflow.com/a/2049510/1307905）。我地址簿中約5％的電話號碼與手機正則表達式匹配。如果這本書沒有明確提到這個缺陷，我不會相信它的其他作用。 – Anthon

正在撓撓我的頭幾個小時。只需要一點指導，我現在看到了邏輯。 @ user2357112非常感謝。 – tadm123

所述的findall（）總是返回元組的列表，並且可以返回每個元組使用for循環一個接一個！

for groups in phoneRegex.findall(text):    
    phoneNum= '-'.join([groups[1],groups[3],groups[5]]) 
    print(groups) #you can add one more line to check it out

結果是：

('800.420.7240', '800', '.', '420', '.', '7240', '', '', '') #one of the tuple in groups

拳頭組（組（0））的每個匹配的將是整個正則表達式：

>>>phoneNumRegex = re.compile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)') 
>>>mo = phoneNumRegex.search('My number is 415-555-4242.') 
>>>mo.group(0) 
'415-555-4242'

來源

2017-04-25 07:12:24 Handcho

無法理解findall（）和分組（Python）

回答

相關問題