Python - 試圖捕獲一行，正則表達式或分割的中間

我有一個文本文件，其中包含一些名稱和電子郵件以及其他內容。我想要捕獲電子郵件地址。Python - 試圖捕獲一行，正則表達式或分割的中間

我不知道這是分裂還是正則表達式問題。

下面是一些示例行：

[name]bill billy [email][email protected] [dob]01.01.81 
[name]mark hilly [email][email protected] [dob]02.11.80 
[name]gill silly [email][email protected] [dob]03.12.79

我希望能夠做一個循環，打印所有的電子郵件地址。

謝謝。

來源

2013-05-10 gjels

是獲得電子郵件，你永遠不會想要做的唯一的事情，或者是可能的，你可能曾經想與信息後做多？如果是後者，我想你一定希望Blender的答案。任何依賴普通'split'來拆分字段的東西（就像這裏的大多數答案一樣）將永遠不會用於'name';任何依賴於''''左右分割'''的東西可能比正則表達式更復雜（儘管我想證明是錯誤的）。 – abarnert 2013-05-10 21:39:30

我想我可能想稍後使用這個名字來製作一個更具體的郵件 – gjels 2013-05-11 09:40:26

我會使用一個正則表達式：

import re 

data = '''[name]bill billy [email][email protected] [dob]01.01.81 
[name]mark hilly [email][email protected] [dob]02.11.80 
[name]gill silly [email][email protected] [dob]03.12.79''' 

group_matcher = re.compile(r'\[(.*?)\]([^\[]+)') 

for line in data.split('\n'): 
    o = dict(group_matcher.findall(line)) 
    print o['email']

\[簡直是[ 。
(.*?)是一個非貪婪的捕獲組。它「擴展」以捕捉文本。
\]實際上是]
(是一個捕獲組的開始。
[^\[]與除[之外的任何內容匹配。
+重複最後一個模式任意次數。
)關閉捕獲組。

來源

2013-05-10 21:31:36 Blender

是否有一個網站，很容易描述（r'\ [（。*？）\]（[^ \ [] +）'）的運作'想要了解它，而不是僅僅刻它。但這很好！ – gjels 2013-05-11 09:39:02

@gjels：不是我所知道的。請參閱我的編輯以獲取簡要說明。 – Blender 2013-05-11 16:30:32

@gjels：Python在其文檔中有一個[HOWTO]（http://docs.python.org/2/howto/regex.html），但它可能不是最適合初學者的教程;嘗試Google搜索其他人。也得到一個正則表達式的瀏覽器程序（每個平臺上有數十億個，再加上一大堆在線的程序），這將幫助你玩弄事物。只要確保你學習了Python語法，perl/PCRE就足夠接近了，但grep，emacs等等非常不同。 – abarnert 2013-05-12 10:18:51

您可以通過分割空間，然後搜索與[email]開始元素：

line = '[name]bill billy [email][email protected] [dob]01.01.81' 
items = line.split() 
for item in items: 
    if item.startswith('[email]'): 
     print item.replace('[email]', '', 1)

來源

2013-05-10 21:26:02

for line in lines: 
    print line.split("]")[2].split(" ")[0]

來源

2013-05-10 21:29:40 Jani

@karthikr不，它不是 – cmd 2013-05-10 22:19:18

@karthikr：我想你忘記了''[name''將是分割中的第一個值。 – abarnert 2013-05-10 22:33:03

@abarnert他早些時候有一個不同的答案，我的評論是爲了這個 – karthikr 2013-05-10 22:53:03

你可以通過子到split，而不僅僅是單個字符，所以：

email = line.partition('[email]')[-1].partition('[')[0].rstrip()

這比使用簡單的split的解決方案，這將在可以有空間在價值領域工作的優勢，在具有不同的順序的東西（即使他們有[email]作爲最後一個字段）線等

要概括它：

def get_field(line, field): 
    return line.partition('[{}]'.format(field)][-1].partition('[')[0].rstrip()

但是，我認爲它比正則表達式解決方案更復雜。此外，它一次只能搜索一個字段，而不是一次搜索所有字段（不會使其變得更加複雜）。爲了得到兩個字段，你最終會解析每個行兩次，像這樣：

for line in data.splitlines(): 
    print '''{} "babysat" Dan O'Brien on {}'''.format(get_field(line, 'name'), 
                 get_field(line, 'dob'))

（我可能誤解了DOB場，當然）

來源

2013-05-10 21:42:25 abarnert

說你有行的文件。

import re 

f = open("logfile", "r") 
data = f.read() 

for line in data.split("\n"): 
    match=re.search("email\](?P<id>.*)\[dob", line) 
    if match: 
      # either store or print the emails as you like 
      print match.group('id').strip(), "\n"

這就是所有（嘗試它，對於python 3 n上面記得打印是一個函數使這些變化）！

從樣本數據輸出：

[email protected] 

[email protected] 

[email protected] 

>>>

來源

2013-05-10 21:43:46

如果你只是匹配這樣的固定字符串，使用正則表達式沒有真正的優勢。將這與Blender的答案比較一下，它可以得到所有字段的名稱和值，而不僅僅是硬編碼的任何一個，而且更加健壯（例如，重新排列列，使'email'在'dob'之後出現之前）。 – abarnert 2013-05-10 21:48:07

我認爲它是一個具體的問題來解決不是一個普遍的問題。泛化擴展了「範圍」和健壯性都是關於「按照範圍調整」。 – 2013-05-10 21:50:34

如果你只是想解決特定的問題，匹配固定的文本字符串，正則表達式不會在簡單的分割之上添加任何東西，它只是使得它更慢，並且無故無法讀取。例如，比較JaniSOF的解決方案。 – abarnert 2013-05-10 21:53:53

Python - 試圖捕獲一行，正則表達式或分割的中間

回答

相關問題