正則表達式匹配所有字的序列

我需要一個python正則表達式來匹配字符串中所有（非空）字的序列，假設word是非空白字符的任意非空序列。正則表達式匹配所有字的序列

的東西，將這樣工作：

s = "ab cd efg" 
re.findall(..., s) 
# ['ab', 'cd', 'efg', 'ab cd', 'cd efg', 'ab cd efg']

最近我到這個用regex模塊，但仍不我想：

regex.findall(r"\b\S.+\b", s, overlapped=True) 
# ['ab cd efg', 'cd efg', 'efg']

而且，僅僅是明確，我不要想在那裏有'ab efg'。

來源

2017-09-26 machaerus

由於正則表達式是貪婪的，你不能匹配''AB cd''因爲任何重複的正則表達式將一路匹配到年底 – HyperNeutrino

爲什麼不單曲。 split（）'滿足你的需求？ – wwii

喜歡的東西：

matches = "ab cd efg".split() 
matches2 = [" ".join(matches[i:j]) 
      for i in range(len(matches)) 
      for j in range(i + 1, len(matches) + 1)] 
print(matches2)

輸出：

['ab', 'ab cd', 'ab cd efg', 'cd', 'cd efg', 'efg']

來源

2017-09-26 16:51:12

你可以做的是匹配所有的字符串和它們的空白，然後加入連續切片在一起。（這類似於Maxim的做法，雖然我沒有獨立開發這一問題，但是保留空白）

import regex 
s = "ab cd efg" 
subs = regex.findall(r"\S+\s*", s) 
def combos(l): 
	out = [] 
	for i in range(len(subs)): 
		for j in range(i + 1, len(subs) + 1): 
			out.append("".join(subs[i:j]).strip()) 
	return out 
print(combos(subs))

Try it online!

這首先找到其匹配的字，接着空白的任何量都\S+\s*，和然後獲取所有連續片，加入它們，並從右側刪除空白。

如果空格總是單個空格，只需使用Maxim的方法;它更簡單，更快，但不保留空格。

來源

2017-09-26 16:58:22 HyperNeutrino

沒有正則表達式：

import itertools 
def n_wise(iterable, n=2): 
    "s -> (s0,s1), (s1,s2), (s2, s3), ..." 
    iterables = itertools.tee(iterable, n) 
    for k, it in enumerate(iterables): 
     for _ in range(k): 
      next(it, None) 
    return zip(*iterables) 

def foo(s): 
    s = s.split() 
    for n in range(1, len(s)+1): 
     for thing in n_wise(s, n=n): 
      yield ' '.join(thing) 

s = "ab cd efg hj" 
result = [thing for thing in foo(s)] 
print(result) 

>>> 
['ab', 'cd', 'efg', 'hj', 'ab cd', 'cd efg', 'efg hj', 'ab cd efg', 'cd efg hj', 'ab cd efg hj'] 
>>>

來源

2017-09-26 17:15:35 wwii

正則表達式匹配所有字的序列

回答

相關問題