2017-07-26 103 views
3

我有一個字符串,它看起來像下面:如何分割一個字符串並在python中返回其分隔符?

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 

我想字母分離並以列表的字典把相關的值與(即A-Z或A-Z)。 每組號碼都與字母相關聯。例如,

'M' 與47482,14,7I7等

'I' 被關聯與4,1等相關聯

'H' 與236792

相關聯我最終的數據結構會像

dict = { 
     M:[47482, 14, 717], 
     I:[4, 1], 
     H:[236792] 

    } 

我嘗試:

import re 
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 
tmp = re.split('[a-zA-Z]', string1) 
print(tmp) 

我無法將這些字母作爲分隔符。需要幫助來創建數據結構。

+0

你失去信價值'你的正則表達式階段M',調整你的正則表達式,包括它在你的字符串的結尾,所以你將返回'47482M'爲第一個。 –

+1

您的字符串不會說'717M',而是'7I7M' - 即717與M沒有關聯,但7與I和M都關聯。 – Raniz

回答

6

你在正確的軌道上,但你應該使用一個稍微不同的正則表達式,並使用re.findall。就像這樣:

In [1]: string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 

In [2]: import re, collections 

In [3]: p = re.compile("([0-9]+)([A-Za-z])") 

In [4]: dct = collections.defaultdict(list) 

In [5]: for number, letter in p.findall(string1): 
    ...:  dct[letter].append(number) 
    ...:  

In [6]: dct 
Out[6]: 
defaultdict(list, 
      {'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3'], 
      'H': ['236792'], 
      'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 
      'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13']}) 

這將查找所有對數字後面字符串中的信,並把所有那些對與字母的關鍵一本字典,重複的數字是允許的。

0

不使用正則表達式:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 


d = {} 
str_num = '' 
for c in string1: 
    if c.isdigit(): 
     str_num += c 
    else: 
     if not c in d: 
      d[c] = [] 
     d[c].append(int(str_num)) 
     str_num = '' 

print(d) 
>>> {'I': ['4', '7', '1', '4', '2', '7', '7', '22', '3', '3', '2', '4', '11', '3', '3', '15'], 'H': ['236792'], 'M': ['47482', '14', '7', '26', '25', '20', '11', '17', '7', '14', '35', '30', '15', '16', '4', '15', '37', '24', '5', '27', '35', '10', '5', '24', '175', '13'], 'D': ['8', '1', '17', '5', '7', '1', '5', '6', '3']} 
1

另一種解決方案,而無需用戶正則表達式:

import string 
string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 

result = dict() 
tempValue = '' 
for char in string1: 

    if char not in string.ascii_letters: 
     tempValue += char 

    else: 

     if char not in result: 
      result[char] = [] 

     result[char].append(int(tempValue)) 
     tempValue = '' 

print(result) 

結果:

{ 
    'M': [47482, 14, 7, 26, 25, 20, 11, 17, 7, 14, 35, 30, 15, 16, 4, 15, 37, 24, 5, 27, 35, 10, 5, 24, 175, 13], 
    'I': [4, 7, 1, 4, 2, 7, 7, 22, 3, 3, 2, 4, 11, 3, 3, 15], 
    'D': [8, 1, 17, 5, 7, 1, 5, 6, 3], 
    'H': [236792] 
} 
1

如果你不想使用正則表達式,你可以寫你自己的方法。

myDict = {} 
num_string = '' 

for char in string1: 
    if char.isalpha(): 
     myDict.setdefault(char,[]).append(int(num_string)) 
     num_string = '' 
    else if char.isdigit(): 
     num_string += char 

注:請不要使用關鍵字dict來引用變量。

0

還沒有rexexp:

string1 = "47482M4I14M7I7M1I26M8D25M4I20M2I11M7I17M7I7M22I14M3I35M3I30M1D15M2I16M17D4M5D15M7D37M1D24M5D5M6D27M4I35M11I10M3I5M3I24M15I175M3D13M236792H" 
abc = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' 

s = '' 
for k in string1: 
    if k.isalpha(): 
     print('found', k, 'value', s) 
     #add to dict here 
     s = '' 
    else: 
     s += k 
相關問題