我就如何建立從字符串字典的問題有點多語言/ NLP傾向於比Creating a dictionary from a string有沒有一種更簡單的方法來從字符串建立字典,然後矢量化字符串? Python的
鑑於串句的列表,有沒有更簡單的建立一個獨特的詞典,然後向量化方式字符串句子?我知道有外部庫這樣做像gensim
但我想避免它們。我一直在做這樣說:
from itertools import chain
def getKey(dic, value):
return [k for k,v in sorted(dic.items()) if v == value]
# Vectorize will return a list of tuples and each tuple is made up of
# (<position of word in dictionar>,<number of times it occurs in sentence>)
def vectorize(sentence, dictionary): # is there simpler way to do this?
vector = []
for word in sentence.split():
word_count = sentence.lower().split().count(word)
dic_pos = getKey(dictionary, word)[0]
vector.append((dic_pos,word_count))
return vector
s1 = "this is is a foo"
s2 = "this is a a bar"
s3 = "that 's a foobar"
uniq = list(set(chain(" ".join([s1,s2,s3]).split()))) # is there simpler way for this?
dictionary = {}
for i in range(len(uniq)): # can this be done with dict(list_comprehension)?
dictionary[i] = uniq[i]
v1 = vectorize(s1, dictionary)
v2 = vectorize(s2, dictionary)
v3 = vectorize(s3, dictionary)
print v1
print v2
print v3
我不知道你的最終目標是什麼,但我可以告訴你以下幾個問題:你做一個**集**,你變成了一個**列表**,然後你變成一個**字典**並繼續從**字典**而不是**鍵**中查找**值**,並且它們都是來自**列表**的位置結果,您可以爲每個查詢構建**! ** – 2013-03-14 00:10:05