將字符串劃分爲python

我編寫了從語料庫中提取單詞的代碼，然後對它們進行標記並與句子進行比較。輸出是Bag of Words（如果單詞在句子1中，如果不是0）。將字符串劃分爲python

import nltk 
import numpy as np 
from nltk import FreqDist 
from nltk.corpus import brown 


news = brown.words(categories='news') 
news_sents = brown.sents(categories='news') 

fdist = FreqDist(w.lower() for w in news) 
vocabulary = [word for word, _ in fdist.most_common(100)] 
num_sents = len(news_sents) 

for i in range(num_sents): 
    features = {} 
    for word in vocabulary: 
     features[word] = int(word in news_sents[i]) 

    bow = "".join(str(n) for n in list(features.values())) 
    f = open("D:\\test\\Vector.txt", "a") 
    print(bow, file=f) 
    f.close()

在這種情況下，輸出字符串的長度爲100個字符。我想將它分割成任意長度的塊，併爲其分配塊數。例如：

print(i+1, chunk_id, bow, sep="\t", end="\n", file=f)

其中i + 1是句號。爲了展示我的意思，讓我們取長度爲12 >>「110010101111」和「011011000011」的字符串。它應該看起來像：

來源

2016-02-29 Masyaf

的重複數據刪除技術在談論名單，但解決方案將字符串工作了。 – timgeb

石斑魚功能從itertools documentation似乎是你在找什麼：

def grouper(iterable, n, fillvalue=None): 
    "Collect data into fixed-length chunks or blocks" 
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx 
    args = [iter(iterable)] * n 
    return izip_longest(fillvalue=fillvalue, *args)

來源

2016-02-29 11:01:55 dav1d

將字符串劃分爲python

回答

相關問題