列表/數組字符串到numpy浮點數組

-2

我是scikit學習和numpy的新手。我怎麼能代表我的數據集由列表/字符串數組組成，例如列表/數組字符串到numpy浮點數組

[["aa bb","a","bbb","à"], [bb cc","c","ddd","à"], ["kkk","a","","a"]]

給一個numpy數組的dtype float？

來源

2016-12-13 Ignatius Ezeani

whaat ???將字符串轉換爲浮點數？順便說一下，它與sklearn無關 – MMF

好吧，也許我沒有使用正確的術語，但@datawrestler瞭解我的問題，並給出了一個非常有用的建議。不管怎麼說，還是要謝謝你。 –

我認爲你所尋找的是你的單詞的數字表示。您可以使用gensim並將每個單詞映射到令牌id，然後從中創建您的numpy陣列，如下所示：

import numpy as np 
from gensim import corpora 

toconvert = [["aa bb","a","bbb","à"], ["bb", "cc","c","ddd","à"], ["kkk","a","","a"]] 

# convert your list of lists into token id's. For example, 'aa bb' could be represented as a 2, a as a 1, etc. 
tdict = corpora.Dictionary(toconvert) 

# given nested structure, you can append nested numpy arrays 
newlist = [] 
for l in toconvert: 
    tmplist = [] 
    for word in l: 
     # append to intermediate list the id for the given word under observation 
     tmplist.append(tdict.token2id[word]) 
    # convert to numpy array and append to main list 
    newlist.append(np.array(tmplist).astype(float)) # type float 

print(newlist) # desired output: [array([ 2., 0., 1., 0.]), array([ 5., 3., 4., 6., 0.]), array([ 7., 0., 8., 0.])] 

# and to see what id's represent which strings: 
tdict[0] # 'a'

來源

2016-12-14 02:33:12 datawrestler

感謝@datawrestler爲您提供的答案。這非常有用。 –

列表/數組字符串到numpy浮點數組

回答

相關問題