列表中的優雅發現子列表

給定一個包含噪音包圍的已知圖案的列表，是否有一種優雅的方式來獲得與該模式相同的所有項目。請參閱下面的我的粗略代碼。列表中的優雅發現子列表

list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5] 
known_pattern = [1,2,3,4] 
res = [] 


for i in list_with_noise: 
    for j in known_pattern: 
     if i == j: 
      res.append(i) 
      continue 

print res

我們會得到2, 1, 2, 3, 4, 2, 1, 2, 3, 4, 1, 2, 3, 4, 4, 3

獎金：避免附加我如果全模式不存在（即，允許1,2,3,4而不是1,2,3）

例子：

find_sublists_in_list([7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5],[1,2,3,4]) 

[1,2,3,4],[1,2,3,4],[1,2,3,4] 


find_sublists_in_list([7,2,1,2,3,2,1,2,3,6,9,9,1,2,3,4,7,4,3,1,2,6],[1,2,3,4]) 

[1,2,3],[1,2,3],[1,2,3]

名單包含一個名爲元組。

來源

2012-04-11 rikAtee

盛宴你的眼睛http://en.wikipedia.org/wiki/String_searching_algorithm – YXD 2012-04-11 13:30:09

你可能想給一些輸入的例子和相應的預期輸出。目前形式的問題尚不清楚。 – NPE 2012-04-11 13:33:09

這是一個問題或編程練習？ – 2012-04-11 13:33:46

我知道這個問題在5個月大，已經「接受」，但是使用類似的問題搜索了一個非常類似的問題，使我想到了這個問題，所有的答案似乎都有一些相當重要的問題，再加上我很無聊，想試試我的答案，所以我只是想喋喋不休我發現了什麼。

問題的第一部分，據我所知，非常簡單：只需返回原始列表，並將所有不在「模式」中的元素濾除。下面這種想法，我第一個想到的代碼中使用的過濾器（）函數：

def subfinder(mylist, pattern): 
    return list(filter(lambda x: x in pattern, mylist))

我要說的是，這種解決方案肯定是比原來的解決方案更簡潔，但它不是任何更快，或者至少沒有明顯，如果沒有很好的理由使用它們，我會盡量避免使用lambda表達式。其實，最好的解決辦法我能想出涉及到一個簡單列表理解：

def subfinder(mylist, pattern): 
    pattern = set(pattern) 
    return [x for x in mylist if x in pattern]

該解決方案既更加優雅顯著比原來快：修真比原來快了約120％，而鑄造在我的測試中，模式變成了第一次顛簸，速度高達320％。

現在的獎金：我就直接進入它，我的解決方案如下：

def subfinder(mylist, pattern): 
    matches = [] 
    for i in range(len(mylist)): 
     if mylist[i] == pattern[0] and mylist[i:i+len(pattern)] == pattern: 
      matches.append(pattern) 
    return matches

這是史蒂芬Rumbalski的「低效的班輪」的變化，即，增加的「mylist [i] == pattern [0]」檢查並且得益於python的短路評估，比原始語句和itertools版本（以及我所知道的所有其他提供的解決方案）要快得多，以及它甚至支持重疊模式。所以你去了。

來源

2012-09-25 05:37:35 mintchkin

不錯！（len（mylist） - len（patern）+1）雖然爲了避免不必要的比較：例如，如果len（mylist）是1000，len（pattern）是50，那麼當i = 951那麼len（mylist [i + len（pattern）]）將是49，並且len（pattern）將會是50，所以沒有辦法可以匹配 – rikAtee 2012-09-28 23:49:41

也是，列表理解它 – rikAtee 2012-09-29 00:05:34

我發現第二個例子也發現了無序的子列表。 – 2017-02-25 20:33:15

這將讓你的問題的「紅利」的一部分：

pattern = [1, 2, 3, 4] 
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5] 
cursor = 0 
found = [] 
for i in search_list: 
    if i == pattern[cursor]: 
     cursor += 1 
     if cursor == len(pattern): 
      found.append(pattern) 
      cursor = 0 
    else: 
     cursor = 0

對於非獎金：

pattern = [1, 2, 3, 4] 
search_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5] 
cursor = 0 
found = [] 
for i in search_list: 
    if i != pattern[cursor]: 
     if cursor > 0: 
      found.append(pattern[:cursor]) 
     cursor = 0 
    else: 
     cursor += 1

最後，一個處理重疊：

def find_matches(pattern_list, search_list): 
    cursor_list = [] 
    found = [] 
    for element in search_list: 
     cursors_to_kill = [] 
     for cursor_index in range(len(cursor_list)): 
      if element == pattern_list[cursor_list[cursor_index]]: 
       cursor_list[cursor_index] += 1 
       if cursor_list[cursor_index] == len(pattern_list): 
        found.append(pattern_list) 
        cursors_to_kill.append(cursor_index) 
      else: 
       cursors_to_kill.append(cursor_index) 
     cursors_to_kill.reverse() 
     for cursor_index in cursors_to_kill: 
      cursor_list.pop(cursor_index) 
     if element == pattern_list[0]: 
      cursor_list.append(1) 
    return found

來源

2012-04-11 13:36:17

@rikAtee請注意，此解決方案將在問題的註釋中由EOL提到的重疊條件中返回2而不是3個結果。 – 2012-04-11 13:59:06

@rikAtee，不，我們沒有實時追加，只有在找到完整匹配後才追加，因此我們需要追加整個模式。 – 2012-04-11 14:00:25

謝謝你，我看到 – rikAtee 2012-04-11 14:04:28

list_with_noise = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5] 
string_withNoise = "".join(str(i) for i in list_with_noise) 
known_pattern = [1,2,3,4] 
string_pattern = "".join(str(i) for i in known_pattern) 
string_withNoise.count(string_pattern)

來源

2012-04-11 13:37:16 user850498

與Roman的解決方案相同，但強度稍差（僅接受一位數值）。 – EOL 2012-04-12 05:47:33

鑑於：

a_list = [7,2,1,2,3,4,2,1,2,3,4,9,9,1,2,3,4,7,4,3,1,2,3,5] 
pat = [1,2,3,4]

這裏是一個低效的一個班輪：

res = [pat for i in range(len(a_list)) if a_list[i:i+len(pat)] == pat]

這裏是一個更有效的itertools版本：

from itertools import izip_longest, islice 

res = [] 
i = 0 

while True: 
    try: 
     i = a_list.index(pat[0], i) 
    except ValueError: 
     break 
    if all(a==b for (a,b) in izip_longest(pat, islice(a_list, i, i+len(pat)))): 
     res.append(pat) 
     i += len(pat) 
    i += 1

來源

2012-04-11 14:25:03

爲什麼不用'pat == a_list [i：i + len（pat）]'替換'all（a == b ...）'？這也會使你的代碼更加健壯，因爲列表'a_list'可能包含'None'元素，這會破壞'izip_longest（）'測試（因爲缺少元素會產生「None」）。 – EOL 2012-04-12 05:46:10

@EOL使用'all'背後的想法是避免創建子列表。「無」的唯一風險是如果'pat'包含'None'，在這種情況下可以使用'izip_longest'的'fillvalue'參數。沒有使用'izip'，因爲避免匹配列表末尾的部分模式。 – 2012-04-12 14:30:28

該問題的一種習慣性的，可組合的解決方案。

首先，我們需要借an itertools recipe，consume（其消耗和從一個迭代丟棄元件的給定數目。然後我們採取itertools配方pairwise，並將其擴展到一個nwise功能使用consume：

import itertools 

def nwise(iterable, size=2): 
    its = itertools.tee(iterable, size) 
    for i, it in enumerate(its): 
     consume(it, i) # Discards i elements from it 
    return zip(*its)

現在，我們有，解決問題的獎勵是很容易的：

def find_sublists_in_list(biglist, searchlist): 
    searchtup = tuple(searchlist) 
    return [list(subtup) for subtup in nwise(biglist, len(searchlist)) if subtup == searchtup] 

    # Or for more obscure but faster one-liner: 
    return map(list, filter(tuple(searchlist).__eq__, nwise(biglist, len(searchlist))))

同樣，一個更簡潔更快的（如果不那麼漂亮的）解決方案的主要問題取代：

def subfinder(mylist, pattern): 
    pattern = set(pattern) 
    return [x for x in mylist if x in pattern]

有：

def subfinder(mylist, pattern): 
    # Wrap filter call in list() if on Python 3 and you need a list, not a generator 
    return filter(set(pattern).__contains__, mylist)

此相同的行爲方式，但避免了需要到暫時set存儲的名稱，然後推所有的過濾工作都是C.

來源

2018-02-03 00:26:56 ShadowRanger

列表中的優雅發現子列表

回答

相關問題