的Python的句子串滑動窗口

我在尋找與窗口大小N.單詞組成的字符串的滑動窗口分流的Python的句子串滑動窗口

輸入：「我愛美食，我喜歡喝」，窗口大小3

輸出：「我愛的食物」，「愛的食物和」，「食品和我」，「我喜歡」 .....]

所有窗口滑動的建議是圍繞序列字符串，沒有條款。盒子裏有東西嗎？

2017-03-16 user1025852

這裏是我終於做到：高清find_ngrams（input_list，N）：返回ZIP（* [input_list [我： ]爲我在範圍（n）]） – user1025852

您可以使用具有不同偏移量的迭代器並將它們全部壓縮。

>>> arr = "I love food. blah blah".split() 
>>> its = [iter(arr), iter(arr[1:]), iter(arr[2:])] #Construct the pattern for longer windowss 
>>> zip(*its) 
[('I', 'love', 'food.'), ('love', 'food.', 'blah'), ('food.', 'blah', 'blah')]

您可能需要使用izip，如果你有長句，也可以是普通的舊環（像其他的答案）。

來源

2017-03-16 19:18:14 SuperSaiyan

def token_sliding_window(str, size): 
    tokens = str.split(' ') 
    for i in range(len(tokens)- size + 1): 
     yield tokens[i: i+size]

來源

2017-03-16 19:17:51

基於下標串序列的方法：

def split_on_window(sequence="I love food and I like drink", limit=4): 
    results = [] 
    split_sequence = sequence.split() 
    iteration_length = len(split_sequence) - (limit - 1) 
    max_window_indicies = range(iteration_length) 
    for index in max_window_indicies: 
     results.append(split_sequence[index:index + limit]) 
    return results

樣本輸出：

>>> split_on_window("I love food and I like drink", 3) 
['I', 'love', 'food'] 
['love', 'food', 'and'] 
['food', 'and', 'I'] 
['and', 'I', 'like'] 
['I', 'like', 'drink']

這裏有一個備選答案由@SuperSaiyan啓發：

from itertools import izip 

def split_on_window(sequence, limit): 
    split_sequence = sequence.split() 
    iterators = [iter(split_sequence[index:]) for index in range(limit)] 
    return izip(*iterators)

樣本輸出：

>>> list(split_on_window(s, 4)) 
[('I', 'love', 'food', 'and'), ('love', 'food', 'and', 'I'), 
('food', 'and', 'I', 'like'), ('and', 'I', 'like', 'drink')]

基準：

Sequence = I love food and I like drink, limit = 3 
Repetitions = 1000000 
Using subscripting -> 3.8326420784 
Using izip -> 5.41380286217 # Modified to return a list for the benchmark.

來源

2017-03-16 19:24:35 ospahiu

的Python的句子串滑動窗口

回答

相關問題