正則表達式分割超過160個字符的message_txt

我想將消息傳遞系統的消息文本拆分成最多160個字符長的以空格結尾的序列，除非它是最後一個序列，那麼它可以以任何形式結束只要它等於或少於160個字符。正則表達式分割超過160個字符的message_txt

這個重新表達'。{1,160} \ s'幾乎可以工作，但是它會削減信息的最後一個單詞，因爲通常信息的最後一個字符不是空格。

我也試過'。{1,160} \ s |。{1,160}'，但這不起作用，因爲最後的序列只是最後一個空格之後的剩餘文本。有沒有人有關於如何做到這一點的想法？

例：

two_cities = ("It was the best of times, it was the worst of times, it was " + 
     "the age of wisdom, it was the age of foolishness, it was the " + 
     "epoch of belief, it was the epoch of incredulity, it was the " + 
     "season of Light, it was the season of Darkness, it was the " + 
     "spring of hope, it was the winter of despair, we had " + 
     "everything before us, we had nothing before us, we were all " + 
     "going direct to Heaven, we were all going direct the other " + 
     "way-- in short, the period was so far like the present period," + 
     " that some of its noisiest authorities insisted on its being " + 
     "received, for good or for evil, in the superlative degree of " + 
     "comparison only.") 


chunks = re.findall('.{1,160}\s|.{1,160}', two_cities) 
print(chunks)

將返回

['這是最好的時代，這是最壞的時代;那是智慧的年頭，那是愚昧的年頭，它是信仰的時代，那是' '懷疑的時代，那是光明的季節，那是黑暗的季節，是希望的春天，是絕望的冬天，我們擁有一切，我們'， '在我們面前沒有任何東西，我們都直接上天堂，我們都直接反過來 - 簡而言之，這段時間是s Ø遠像本期間，」， ‘它的一些最嘈雜當局堅持其正在接收，是好是壞，在比較的最高級’， ‘而已。’]

，其中最終名單中的元素應該是

'它的一些最喧囂的權威人士堅持要以最高級別的比較來接受它，不管是好還是壞。

不是'唯一'。

來源

2015-01-09 money_dance

你可以添加一些例子。 – anubhava 2015-01-09 22:28:38

[不需要正則表達式。]（https://docs.python.org/2/library/textwrap.html#textwrap.wrap） – 2015-01-09 22:28:50

試試這個 - .{1,160}(?:(?<=[ ])|$)

.{1,160}      # 1 - 160 chars 
(?: 
     (?<= [ ])     # Lookbehind, must end with a space 
    | $        # or, be at End of String 
)

信息 -

默認情況下，引擎會嘗試匹配160個字符（貪婪）。
然後它檢查表達式的下一部分。

看後面強制最後一個字符匹配.{1,160}是一個空格。
或者，如果在字符串末尾，則不執行。

如果lookbehind失敗，而不是在字符串末尾，引擎將回溯到159個字符，然後再次檢查。這重複直到斷言通過。

來源

2015-01-09 22:37:46 sln

Ayyy工作感謝一堆！你能解釋一下嗎？ – 2015-01-09 22:45:11

@money_dance - 增加了更多解釋。 – sln 2015-01-09 22:54:14

爲什麼你需要一個向後看？這似乎工作正常： ''。{1,160}（？：\ s | $）'' – 2015-01-09 23:11:01

您應該避免使用正則表達式，因爲它們效率低下。

我建議是這樣的：（see it in action here）

list = [] 
words = two_cities.split(" ") 

for i in range(0, len(words)): 
    str = [] 
    while i < len(words) and len(str) + len(words[i]) <= 160: 
     str.append(words[i] + " ") 
     i += 1 
    list.append(''.join(str)) 

print list

這將創建的所有單詞的列表，拆分上的空間。

如果該單詞適合字符串，它會將其添加到字符串中。當它不能時，它將它添加到列表中並啓動一個新的字符串。最後，你有一個結果清單。

來源

2015-01-09 23:07:21 mbomb007

或者只使用'import textwrap; textwrap.wrap（two_cities，160）' 但我不認爲使用正則表達式有什麼問題。在很多情況下，這是最簡單最優雅的解決方案。 – 2015-01-09 23:16:23

編輯爲使用'str'的列表，因爲它更快。有誰知道我的代碼是否可以用列表解析重寫？ – mbomb007 2015-01-10 04:45:21

正則表達式分割超過160個字符的message_txt

回答

相關問題