提取跟隨者計數的正則表達式模式

我從一個字符串中提取跟隨者數字，如下面的數字。第一種模式似乎不適用於單個數字。是否因爲第一個模式檢查數字後面是否有字符，並且第一行沒有其他字符4和Followers之間？第二種模式工作得很好。提取跟隨者計數的正則表達式模式

import re 

text = """ 
4 Followers 
330 Followers 
23.5k Followers 
67k Followers 
25m Followers 
""" 
print(re.compile(r'(\d.+) Followers').findall(text)) 
print(re.compile(r'(\d+|\d.+) Followers').findall(text))

來源

2017-09-25 Lukasz Salitra

你應該真的簡單地使用https://regex101.com/ – Idos

只是分割空間，並獲得第一個數組.. – StefansArya

@Idos這是我用過的。剛接觸正則表達式並試圖理解我提出的更好的模式。 –

\d.+模式匹配一個數字，然後匹配換行符字符以外的一個或多個字符。

看起來你想匹配一個數字，然後除了空格之外的任何東西，直到一個空白，然後Followers。

使用

text = """ 
4 Followers 
330 Followers 
23.5k Followers 
67k Followers 
25m Followers 
""" 
print(re.findall(r'\b(\d\S*) Followers', text)) 
# => ['4', '330', '23.5k', '67k', '25m']

見regex和Python demo。

詳細

\b - 字邊界
(\d\S*) - 第1組：一個數字，然後0+非空白符號
Followers - 文字串。

如果輸入被格式化好了，你也可以分割字符串：

[x.split()[0] for x in text.split('\n')]

見Python demo（輸出：['4', '330', '23.5k', '67k', '25m']）。

來源

2017-09-25 23:04:52

我將使用這個模式，因爲'text'字符串只是作爲一個虛擬模板使用，這個模式用於從一個被刮掉的網站中提取跟隨者數量，正則表達式恰好比bs4快得多。可靠的答案，謝謝！ –

我相信你的問題是，爲什麼與2種模式的不同結果...

它不是第一個模式不匹配個位數，它是
第一模式需要一個數字，然後前Followers前2個字符。

我沒有看到任何忽略與正則表達式的相關空白修改，
所以正則表達式是真的

        (\d.+)[ ]Followers 
            ^^
      where this ----------------| | 
      expects at least 1 char,  | 
      and this ----------------------| 
      expects 1 more 
      =================================== 
      total is minimum 2 chars between digit and Followers

的秒正則表達式的工作原理是它之所以預計只有1個字符
數字和追隨者之間

     (\d+|\d.+)[ ]Followers 
         ^ ^
       digit ----|  | 
       1 char-------------|

來源

2017-09-26 02:07:03 sln

提取跟隨者計數的正則表達式模式

回答

相關問題