python3如何找到另一個子串的字符串的最大前綴

我需要查找另一個字節對象s2的子串的字節對象s1的最大前綴（從頭開始的字節串）並以s2和長度返回起始位置。在這種情況下，s2也恰好與s1重疊。python3如何找到另一個子串的字符串的最大前綴

最佳結果是最接近s2結尾的最長前綴。

我試圖用下面的bytes.rfind方法來實現這個。

注：這是試圖找到最大的前綴在字節的索引index開始反對src是存在先前在SRC index最多的前maxOffset字節內。因此，s1是src[index:]，s2是src[index-maxOffset:index+maxLength-1]。 maxLength是我感興趣搜索的前綴的最大長度。

def crl(index, src, maxOffset, maxLength): 
    """ 
    Returns starting position in source before index from where the max runlength is detected. 
    """ 
    src_size = len(src) 
    if index > src_size: 
     return (-1, 0) 
    if (index+maxLength) > src_size: 
     maxLength = src_size - index 
    startPos = max(0, index-maxOffset) 
    endPos = index+maxLength-1 
    l = maxLength 

    while l>1: 
     if src[index:index+l] in src[startPos:index+l-1]: 
      p = src.rfind(src[index:index+l], startPos, index+l-1) 
      return (p,l) 
     l -= 1 
    return (-1, 0)

我也試圖手工編寫這下面，因爲以前的實現是非常緩慢

def ocrl(index, src, maxOffset, maxLength): 
    """ 
    Returns starting position in source before index from where the max runlength is detected. 
    """ 
    size = len(src) 
    if index>=size: 
     return (-1, 0) 
    startPos = index - 1 # max(0, index-maxOffset) 
    stopPos = max(0, index-maxOffset) 
    runs = {} 
    while(startPos >= stopPos): 
     currRun = 0 
     pos = startPos 
     while src[pos] == src[index+currRun]: 
      currRun += 1 
      pos += 1 
      if currRun == maxLength: 
       return (startPos, maxLength) #found best possible run 
      if (pos >= size) or ((index+currRun) >= size): 
       break 
     if (currRun > 0) and (currRun not in runs.keys()): 
      runs[currRun] = startPos 
     startPos -= 1 

    if not runs: 
     return (-1, 0) 
    else: 
     # Return the index from where the longest run was found 
     return (runs[max(runs.keys())], max(runs.keys()))

雖然第二個實施速度更快，它仍然是非常慢的，我相信效率低下。我怎樣才能讓這個效率更高，運行更快？

來源

2017-06-05 Madhuri koushik è Koushik

確實的最後一個索引['os.path.commonprefix'（HTTPS：//docs.python .org/3.6/library/os.path.html）讓你的函數更快？ –

如果字符串的字節值超出可打印的ascii範圍，是否對'os.path.commonprefix'有影響？它會將字符視爲'。'嗎？或'/'以特殊的方式？ –

我沒有看到任何[源代碼]（https://hg.python.org/cpython/file/e3474ed80a5e/Lib/posixpath.py），這應該會導致你在這些問題上遇到任何麻煩。 –

在我看來，您可以使用修改後的Knuth-Morris-Pratt字符串搜索算法來匹配子字符串，只要它能夠提醒找到的最長匹配即可。

我不確定是否有向前轉移的好處，因爲當您找到匹配時，您需要繼續搜索更長的匹配（除非匹配了整個字符串）。第二串

來源

2017-06-05 13:36:09

向後的原因是probem的更大背景 - 目標是檢測最接近s1開始的最長前綴。如果相同的前綴在s2中出現兩次，那麼第二個出現的就是正確的答案。 –

@MadhurikoushikèKoushik：我知道。我不認爲這會產生真正的影響，因爲在大多數情況下，您必須繼續搜索。 –

構建suffix array並搜索數組中的第一個字符串，選擇最長的公共前綴

來源

2017-06-05 13:57:21 MBo

謝謝@MBo。很快就會嘗試。我的代表顯然太低，無法爲您的答案+1。 –

python3如何找到另一個子串的字符串的最大前綴

回答

相關問題