2017-10-07 58 views
0

我正在使用Keras構建一個LSTM模型。我用TfidVectorizer()將我的數據框轉換爲單詞標記。 tfidvectorizer()的變換方法返回當我送入LSTM層的csr_matrix,我總是得到一個錯誤預期ndim = 3,找到ndim = 2,如何將稀疏矩陣饋送到keras中的LSTM層?

「ValueError異常:輸入0是與層lstm_1不相容:預期NDIM = 3,實測NDIM = 2」

下面

是我的Python代碼

dfTest = pd.read_csv("C:\\ML\\test.csv", 
       dtype={'url': np.str, 'name': np.str, 'verdict': np.int32}, 
       error_bad_lines = False, sep=',', delimiter=',', header=0, 
       names=['url', 'name', 'verdict']) 

dataFrame = dfTest['url'] + " "+ dfTest['name'] 
target = dfTest['verdict'] 
lstData = [] 
for row in dataFrame: 
    row = row.replace('http://www.', ' ') 
    row = row.replace('.', ' ') 
    row = row.replace('/', ' ') 
    row = row.replace('com', ' ') 
    lstData.append(row) 

print(lstData) 


tk1 = TfidfVectorizer(max_features = 1000); 

tk1.fit(lstData) 
matrix = tk1.transform(lstData) 

print(matrix.shape) 
print(matrix) 


#data = np.reshape(data, data.shape + (1,)) 
target = np.reshape(target, target.shape + (1,)) 
print(target.shape) 
print(target) 

model1 = Sequential() 
model1.add(LSTM(128, dropout_W=0.2, dropout_U=0.2, input_shape= (5,))) 
model1.add(Dense(1)) 
model1.add(Activation('sigmoid')) 

model1.compile(loss='binary_crossentropy', optimizer='rmsprop') 
model1.fit(matrix, y=target, batch_size=200, nb_epoch=5, verbose=1, 
validation_split=0.2, shuffle=True) 

我是新ML的世界,PL幫助找出我在做什麼錯在這裏。 可能預先感謝。

回答

0

我已經將稀疏矩陣轉換爲數組並改變了它的尺寸,然後它完美地工作。

這裏是完整的代碼。

dfTest = pd.read_csv("C:\\ML\\test.csv", 
       dtype={'url': np.str, 'name': np.str, 'verdict': np.int32}, 
       error_bad_lines = False, sep=',', delimiter=',', header=0, 
       names=['url', 'name', 'verdict']) 

dataFrame = dfTest['url'] + " "+ dfTest['name'] 
target = dfTest['verdict'] 

tk1 = TfidfVectorizer(max_features = 1000); 

tk1.fit(dataFrame) 
matrix = tk1.transform(dataFrame) 

matrix = matrix.toarray() 
data = np.reshape(matrix, matrix.shape + (1,)) 
target = np.reshape(target, target.shape + (1,)) 
print(target) 
print(data.shape) 

model1 = Sequential() 
model1.add(LSTM(128, dropout_W=0.2, dropout_U=0.2, input_shape= 
data.shape[1:])) 
model1.add(Dense(1)) 
model1.add(Activation('sigmoid')) 

model1.compile(loss='binary_crossentropy', optimizer='rmsprop') 
model1.fit(data, y=target, batch_size=200, nb_epoch=5, verbose=1, 
validation_split=0.2, shuffle=True)