我有一個很長的字符串,我需要在組中進行解析,但需要更多地控制它。使用Python解析文本正則表達式re.findall
import re
RAW_Data = "Name Multiple Words Testing With 1234 Numbers and this stuff* ((Bla Bla Bla (Bla Bla) A40 & A41)) Name Multiple Words Testing With 3456 Numbers and this stuff2* ((Bla Bla Bla (Bla Bla) A42 & A43)) Name Multiple Words Testing With 78910 Numbers and this stuff3* ((Bla Bla Bla (Bla Bla) A44 & A45)) Name Multiple Words Testing With 1234 Numbers and this stuff4* ((Bla Bla Bla (Bla Bla) A46 & A47)) Name Multiple Words Testing With 1234 Numbers and this stuff5* ((Bla Bla Bla (Bla Bla) A48 & A49)) Name Multiple Words Testing With 1234 Numbers and this stuff6* ((Bla Bla Bla (Bla Bla) A50 & A51)) Name Multiple Words Testing With 1234 Numbers and this stuff7* ((Bla Bla Bla (Bla Bla) A52 & A53)) Name Multiple Words Testing With 1234 Numbers and this stuff8* ((Bla Bla Bla (Bla Bla) A54 & A55)) Name Multiple Words Testing With 1234 Numbers and this stuff9* ((Bla Bla Bla (Bla Bla) A56 & A57)) Name Multiple Words Testing With 1234 Numbers and this stuff10* ((Bla Bla Bla (Bla Bla) A58 & A59)) Name Multiple Words Testing With 1234 Numbers and this stuff11* ((Bla Bla Bla (Bla Bla) A60 & A61)) Name Multiple Words Testing With 1234 Numbers and this stuff12* ((Bla Bla Bla (Bla Bla) A62 & A63)) Name Multiple Words Testing With 1234 Numbers and this stuff13* ((Bla Bla Bla (Bla Bla) A64 & A65)) Name Multiple Words Testing With 1234 Numbers and this stuff14* ((Bla Bla Bla (Bla Bla) A66 & A67)) Name Multiple Words Testing With 1234 Numbers and this stuff15* ((Bla Bla Bla (Bla Bla) A68 & A69)) Name Multiple Words Testing With 1234 Numbers and this stuff16*"
fromnode = re.findall('(.*?)(?=\*\s)', RAW_Data)
print fromnode
del fromnode
del RAW_Data
的結果是: '名稱多字測試使用1234號這東西', '','((唧唧歪歪(BLA BLA)A40 & A41))名稱多字測試使用3456號和這東西2'........等等。
我似乎無法捕捉到只有串像「名稱多字測試使用3456號這東西」,並省略都喜歡的琴絃「((唧唧歪歪(BLA BLA)A40 A41 &)) 」。任何幫助將非常感激。
'Bla ...'的東西是否總是在括號內,'Mul ... Name'的字眼總是相同的? – schwobaseggl
你只想要括號外的東西嗎? – Laurel
是的,Bla Bla Bla的東西總是在雙括號內構成。那裏還有一組單括號。我使用另一個re.findall(('\(\((。*?)\)',RAW_Data)來捕獲這些部分,現在我想忽略它們。同樣的,雖然我在這裏扔了一些文字,有多個單詞,空格和數字,就像是一種捕捉所有的東西 – user1457123