優化一種查找字符串的所有排列的方法

我解決了一個難題，但需要優化我的解決方案。難題說我要取一個字符串S，查找其字符的所有排列，對結果進行排序，然後返回該列表中出現的其中一個基於索引的索引號S。優化一種查找字符串的所有排列的方法

例如，字符串'bac'出現在其排列列表中的第3個位置：['abc', 'acb', 'bac', 'bca', 'cab', 'cba']。

我的問題是，謎題限制我的執行時間爲500毫秒。其中一個測試用例通過「BOOKKEEPER」作爲輸入，需要4.2秒才能完成。

我採用了一種（可能是天真的）動態編程方法，使用memoization，使用一個由特定字符集的特定排列組成的字典，但這還不夠。

我的瓶頸是什麼？

我正在分析，看看我能否回答我自己的問題，但我邀請那些直接看到問題的人幫助我理解我是如何減緩這種情況的。

編輯：我的解決方案似乎超過itertools.permutations。 10秒以上輸入「問題」。但公平地說，這包括時間打印，所以這可能不是一個公平的比較。即便如此，我寧願提交具有競爭力表現的手寫解決方案，以瞭解爲什麼我的選擇不如選擇模塊。

memo = {} 

def hash(word): 
    return ''.join(sorted(word)) 

def memoize(word, perms): 
    memo[hash(word)] = perms 
    return perms 

def permutations(word, prefix = None): 
    """Return list of all possible permutatons of given characters""" 
    H = hash(word) 

    if H in memo: 
     return [s if prefix is None else prefix + s for s in memo[H]] 

    L = len(word) 

    if L == 1: 
     return [word] if prefix is None else [prefix + word] 

    elif L == 2: 
     a = word[0] + word[1] 
     b = word[1] + word[0] 

     memoize(word, [a, b]) 

     if prefix is not None: 
      a = prefix + a 
      b = prefix + b 

     return [a, b] 

    perms = [] 
    for i in range(len(word)): 
     perms = perms + permutations(word[:i] + word[i+1:], word[i]) 

    memoize(word, perms) 

    return [prefix + s for s in perms] if prefix is not None else perms 


def listPosition(word): 
    """Return the anagram list position of the word""" 
    return sorted(list(set(permutations(word)))).index(word) + 1 

print listPosition('AANZ')

來源

2016-09-22 Sage Gerard

'itertools.permutations'如何執行？請參閱'timeit'模塊或IPython魔術'％timeit' –

@WayneWerner哦，忘記提及了。更糟的是。 10s +用於輸入「問題」 –

@WayneWerner請參閱編輯 –

提供我自己的答案，假設優化代碼的好方法是首先不使用它。由於我強烈地強調了如何加快我發佈的代碼的速度，所以我鼓勵其他人改善這一點。

@Evert發佈的評論：

我想你能拿出一個公式來計算輸入字的位置的基礎上，字母排序（因爲列表是按字母順序排序）的這些信。如果我正確理解這個難題，它只會要求返回輸入的位置，而不是所有的排列。所以你要抓住一些筆和紙，並找出這個問題的表述。

按照這種推理，從其他類似的建議中，我想更多的計數組合基於一種方法：

from math import factorial as F 
from operator import mul 

def permutations(s): 
    return F(len(s))/reduce(mul, [F(s.count(c)) for c in set(s)], 1) 

def omit(s,index): 
    return s[:index] + s[index+1:] 

def listPosition(s): 
    if (len(s) == 1): 
     return 1 

    firstletter = s[0] 
    predecessors = set([c for c in s[1:] if c < firstletter]) 
    startIndex = sum([permutations(omit(s, s.index(c))) for c in predecessors]) 

    return startIndex + listPosition(s[1:])

這產生正確的輸出，在高速通過拼圖（業績指標不記錄，但明顯不同）。實際上並未產生單個字符串排列。

以一個例子輸入QUESTION：

我們知道，無論出現在列表中的「問題」，它將以「Q」之前來字母開頭的所有排列後出現。同樣可以說是子線下的線。

我找到firstletter = 'Q'之前的字母，它存儲在predecessors。 set可防止對重複字母的輸入進行重複計數。

然後，我們假設predecessors中的每個字母都充當前綴。如果我從字符串中省略前綴並查找其餘字母的排列總和，則我們發現在初始輸入的第一個字母之前必須出現的排列數。遞歸，然後對結果進行求和，最終得到開始位置。

來源

2016-09-22 02:17:38

您的瓶頸在於N項列表的排列數爲N！（N因子）。隨着輸入量的增加，這個數字會非常快速地增長。

您可以做的第一個優化是您不必存儲所有排列。這是一個遞歸解決方案，可以生成已排序的所有排列。「訣竅」是在生成排列之前對單詞的字母進行排序。

def permutations_sorted(list_chars): 
    if len(list_chars) == 1: # only one permutation for a 1-character string  
    yield list_chars 
    elif len(list_chars) > 1: 
    list_chars.sort() 
    for i in range(len(list_chars)): 
     # use each character as first position (i=index)       
     head_char = None 
     tail_list = [] 
     for j,c in enumerate(list_chars): 
     if i==j: 
      head_char = c 
     else: 
      tail_list.append(c) 
     # recursive call, find all permutations of remaining      
     for tail_perm in permutations_sorted(tail_list): 
     yield [ head_char ] + tail_perm 

def puzzle(s): 
    print "puzzle %s" % s 
    results = [] 
    for i,p_list in enumerate(permutations_sorted(list(s))): 
    p_str = "".join(p_list) 
    if p_str == s: 
     results.append(i+1) 
    print "string %s was seen at position%s %s" % (
    s, 
    "s" if len(results) > 1 else "", 
    ",".join(["%d" % i for i in results]) 
) 
    print "" 


if __name__ == '__main__': 
    puzzle("ABC")

但是，當輸入較大時，該程序需要很長時間才能運行。在我的計算機（2.5 GHz英特爾核i5）

輸入= 「ABC」（3個字符）：0.03秒
輸入= 「問題」（8位）：0.329秒
輸入=「的問題「（9個字符）：2.848秒
輸入=‘會計’（10個字符）：30.47秒

到的唯一方法‘的時鐘節拍’是要弄清楚的方式來計算所述串的位置沒有生成所有的排列。

查看上面Evert的評論。

N.B.當輸入包含重複的字母時，初始字符串會在多個位置出現。我假設你只需要報告第一次發生。

來源

2016-09-22 01:13:05

請參閱@cdlane的評論。我的程序沒有給出正確的答案。但是，結論仍然成立。 –

我相信答案是不會產生所有的排列或排序。讓我們保持它的簡單，看看它是如何比較的性能代價：

import itertools 

def listPosition(string): 
    seen = set() 

    target = tuple(string) 

    count = 1; 

    for permutation in itertools.permutations(sorted(string)): 
     if permutation == target: 
      return count 
     if permutation not in seen: 
      count += 1 
      seen.add(permutation) 

print(listPosition('BOOKKEEPER'))

的時間設置（單位：秒）

  Sage/Evert Mine Sage  Answer 
QUESTIONS  0.02  0.18 0.45  98559 
BOOKKEEPER 0.03  0.11 2.10  10743 
ZYGOTOBLAST 0.03  24.4 117(*) 9914611 

(*) includes ~25 second delay between printing of answer and program completion

從科學PROG的代碼的輸出不產生與其他兩個一致的答案因爲它產生了更大的指數和多個指數，所以我沒有包括它的時間長度。

來源

2016-09-22 01:31:12 cdlane

你是對的，我的程序是不正確的：當一些字母被重複時（例如「BOOKKEEPER」有兩個「O」，兩個「K」和三個「E」），我的程序不會刪除多次出現的結果字因此較大的指數）。但是，結論是一樣的：*不要生成所有的排列*。 –

優化一種查找字符串的所有排列的方法

回答

相關問題