上的繩子

我需要找到一個更快的方法來找到一個8-11字符串的互換，以下列方式單一的交換：上的繩子

給定一個字符串'STDILGNLYE'，找到所有的字母一個字母互換：

list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 
      'F', 'P', 'S', 'T', 'W', 'Y', 'V']

即，對於字符串中的每個字母，替換原字符串中的每個字母有一個在list_aa。輸出將是：

ATDILGNLYE 
RTDILGNLYE 
NTDILGNLYE 
... 
SADILGNLYE 
SRDILGNLYE 
SNDILGNLYE 
... 
... 
STDILGNLYV

對於總共200個新字符串（每個位置在字符串中每個位置20個交換）。我有什麼至今：需要

def _create_swaps(original_str): 
    list_peps = [] 
    for i in range(len(original_str)): 
     for k in range(len(list_AA)): 
      list_peps.append(_insert_aa(original_str, i, list_aa[k])) 

    #remove original string 
    return [i for i in list_peps if i != original_str] 


def _insert_aa(string, index, aa): 
    list_string_elements = list(string) 
    del list_string_elements[index] 
    hash_string.insert(index, aa) 
    return "".join(hash_string)

因爲這需要重複〜10 ** 6倍，這是一個大項目最慢的一步。有沒有辦法以更快的方式找到這樣的交換（通過消除"".join，插入，步驟/通過找到交換）？

參考：

ncalls tottime percall cumtime percall filename:lineno(function) 
185275200 330.286 0.000 429.295 0.000 models.py:233(_insert_aa) 
975240  147.322 0.000 616.979 0.001 models.py:225(_create_swaps) 
185280201/185280197 59.137 0.000 59.138 0.000 {method 'join' of 'str' objects} 
185275208 39.875 0.000 39.875 0.000 {method 'insert' of 'list' objects} 
975240  21.027 0.000 21.027 0.000 models.py:231(<listcomp>) 
186746064 18.516 0.000 18.516 0.000 {method 'append' of 'list' objects}

來源

2016-12-02 Carlo Mazzaferro

你需要發出的所有生成的字符串，或者只是指望他們？ – Steve

@Steve我需要所有的字符串。正如你從'_create_swaps'的返回調用中看到的那樣，它會返回除原始字符串之外的所有創建的字符串。 –

您可能想嘗試找出一種方法，用'map（）'替換其中一個操作...參見[本文]（https://www.python.org/doc/essays/list2str/）循環效率...當然，性能總是比理論好，儘管... –

這儘管你已經選擇了一個答案（它不是最pythonic），但它是你正在尋找的更清晰的版本。

你不應該使用範圍來獲得迭代的索引，如果你想對它進行pythonic，你應該使用枚舉。

>>> def swaps(s, lst): 
... for index, _ in enumerate(s): 
...  for letter in lst: 
...  temp = list(s) 
...  temp[index] = letter 
...  yield ''.join(temp) 
... 
>>> list_AA = ['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V'] 
>>> s = 'STDILGNLYE' 
>>> 
>>> for _ in swaps(s, list_AA): 
... print _ 
... 
ATDILGNLYE 
RTDILGNLYE 
NTDILGNLYE 
.......... 
GTDILGNLYE 
HTDILGNLYE 
ITDILGNLYE

此外，在python3一個簡單的方法：

>>> def swaps(s, lst): 
... for i, _ in enumerate(s): 
...  yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst] 
... 
>>> swaps(s,list_AA) 
<generator object swaps at 0x10c9205c8> 
>>> a=_ 
>>> next(a) 
'ATDILGNLYE' 
>>> next(a) 
'RTDILGNLYE' 
>>> next(a) 
'NTDILGNLYE' 
>>> next(a) 
'DTDILGNLYE'

編輯：犧牲速度的解決方案/可讀性

def swap3(s, lst): 
    for i, _ in enumerate(s): 
     head, tail = s[:i], s[i+1:] 
     yield from ['%s%s%s'%(head,c,tail) for c in lst]

而且繼承人臺鉗所有三個hmark測試：

s='STDILGNLYE' 
list_AA=['A', 'R', 'N', 'D', 'C', 'Q', 'E', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 
     'P', 'S', 'T', 'W', 'Y', 'V'] 

# the correct sample size 
list_new = list_AA * (10**6 // len(list_AA)) 

def swaps0(string, replacements): 
    for i in range(len(string)): 
     head = string[:i] 
     tail = string[i+1:] 
     for letter in replacements: 
      yield head + letter + tail 

def swaps1(s, lst): 
    for i, _ in enumerate(s): 
    yield from ['%s%s%s' % (s[:i], x, s[i+1:]) for x in lst] 

def swaps2(s, lst): 
    for index, _ in enumerate(s): 
    for letter in lst: 
     temp = list(s) 
     temp[index] = letter 
     yield ''.join(temp) 

timeit [_ for _ in swaps0(s, list_new)] 
timeit [_ for _ in swaps1(s, list_new)] 
timeit [_ for _ in swaps2(s, list_new)] 


In [9]: timeit [_ for _ in swaps0(s, list_new)] 
1 loop, best of 3: 2.61 s per loop 
In [10]: timeit [_ for _ in swaps1(s, list_new)] 
1 loop, best of 3: 6.57 s per loop 
In [11]: timeit [_ for _ in swaps2(s, list_new)] 
1 loop, best of 3: 8.61 s per loop

它值得嗎？我想說這取決於你期望這個樣本規模增長多少，以及你運行代碼的頻率。

如果代碼不會頻繁運行（例如，每小時幾百次）並且樣本大小不會按指數規律增長（大約爲10 50或10 100），那麼我會說爲了可讀性去。

如果這將隨着樣本量的增加而經常進行計算，請進行性能分析。

最後，我們留下了一個折衷的解決方案結合了頭/尾分裂列舉：

def swap3(s, lst): 
    for i, _ in enumerate(s): 
     head, tail = s[:i], s[i+1:] 
     yield from ['%s%s%s'%(head,c,tail) for c in lst] 

In [16]: timeit [_ for _ in swap3(s, list_new)] 
1 loop, best of 3: 3.99 s per loop

來源

2016-12-02 18:27:30

我喜歡列舉的想法。但是，切片和連接速度更快。 timeit變體= [在generate_all_variants v實現V（S，list_AA）] 10000環路，最好的3：在互換v實現V（S，list_AA）]每循環34.3微秒 timeit變體= 1000循環，最好每個迴路3：271μs – Steve

@steve我用更簡單的方法使用Python3更新了我的答案 –

另外，python的'zen'是簡單易讀的代碼比具有微優化的醜陋代碼更好。優化就是這樣一個微觀優化。您需要替換大量的字符，以使其顯着更快。 –

這應該是更快：

def _insert_aa(string, index, aa): 
    return string[0:index] + aa + string[index+1:]

編輯：你只能一次切頭尾和重用這樣的：

def generate_all_variants(string, replacements): 
    for i in range(len(string)): 
     head = string[:i] 
     tail = string[i+1:] 
     for letter in replacements: 
      yield head + letter + tail 

for variant in generate_all_variants("abcd", ['1', '2', '3']): 
    print(variant)

來源

2016-12-02 17:58:21 Steve

'「」.join'總是更快，然後連接使用'+' –

您的編輯似乎是我正在尋找的解決方案。仍然，爲什麼堅持'+'而不是'「」.join'？ –

連接函數採用一個參數，通常是一個列表，但創建列表需要時間。 – Steve

回答

相關問題