0
我想實現一個簡單的RNN來預測整數序列中的下一個整數。所以,我有一個數據集是如下:越來越ValueError:「只能使用多元索引的元組索引」
Id Sequence
1 1,0,0,2,24,552,21280,103760,70299264,5792853248,587159944704
2 1,1,5,11,35,93,269,747,2115,5933,16717,47003,132291,372157,1047181,2946251,8289731,23323853,65624397,184640891,519507267,1461688413,4112616845,11571284395,32557042499,91602704493,257733967693
4 0,1,101,2,15,102,73,3,40,16,47,103,51,74,116,4,57,41,125,17,12,48,9,104,30,52,141,75,107,117,69,5,148,58,88,42,33,126,152,18,160,13,38,49,55,10,28,105,146,31,158
5 1,4,14,23,42,33,35,34,63,66,87,116,84,101,126,164,128,102,135,143,149,155,203,224,186,204,210,237,261,218,219,286,257,266,361,355,336,302,374,339,371,398,340,409,348,388,494,436,407,406
6 1,1,2,5,4,2,6,13,11,4,10,10,12,6,8,29,16,11,18,20,12,10,22,26,29,12,38,30,28,8,30,61,20,16,24,55,36,18,24,52,40,12,42,50,44,22,46,58,55,29,32,60,52,38,40,78,36,28,58,40,60,30,66,125,48,20,66,80,44,24
9 0,31,59,90,120,151,181,212,243,273,304,334,365,396,424,455,485,516,546,577,608,638,669,699,730,761,789,820,850,881,911,942,973,1003,1034,1064,1095,1126,1155,1186,1216,1247,1277,1308,1339,1369,1400,1430
10 1,1,2,5,13,36,111,347,1134,3832,13126,46281,165283,598401,2202404,8168642,30653724,116082962,442503542,1701654889,6580937039,25603715395,100223117080,394001755683,1556876401398,6178202068457,24608353860698,98421159688268,394901524823138,1589722790850089
12 0,0,0,0,112,40286,5485032,534844548,45066853496,3538771308282,267882021563464,19861835713621616,1453175611052688600,105278656040052332838,7564280930105061931496
到目前爲止我的代碼是:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import math
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import SimpleRNN
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from keras.preprocessing.sequence import pad_sequences
def stoarray(data = [], sep = ','):
return data.map(lambda x: np.array(x.split(sep), dtype=float))
def create_dataset(dataset, window_size=1):
dataX, dataY = [], []
for i in range(len(dataset)-window_size-1):
a = dataset[i:(i+window_size), 0]
dataX.append(a)
dataY.append(dataset[i + window_size, 0]) #gives the ValueError : Can only tuple index with multi index
return np.array(dataX), np.array(dataY)
# fix random seed for reproducibility
np.random.seed(7)
# loading data
colna = ['id', 'seq']
train_data = pd.read_csv('G:/Python/integer_sequencing/testfile.csv', header=1)
train_data.columns = colna
dataset = train_data['seq']
#print(dataset)
window_size = 1
X_train, Y_train = create_dataset(dataset, window_size)
print(X_train.head(5))
print(Y_train.head(5))
我試圖用X_train每個序列分割作爲輸入,它由一套完整的除上一學期和Y_train被視爲輸出將只包含最後一個數字。 但是當我運行代碼時,我得到了ValueError:只能使用帶有MultiIndex的元組索引。 任何人都可以解釋它對我的代碼意味着什麼,我必須做些什麼來解決它。
PS - 我新的堆棧溢出和深度學習,所以我會很感激,如果你可以建議和幫助我的問題的格式。
始終把完整的錯誤消息(回溯)的問題 - 還有其他有用inforamtion,即。它顯示哪條線出現問題。 – furas
在'dataset [i:(i + window_size),0]'你使用元組'i:(i + window_size),0'來獲取數據,但是它不允許這樣做。您可以只使用單個整數,如'dataset [0]'或切片'dataset [i:(i + window_size)]'。在你的代碼中'dataset'意思是'train_data ['seq']',它是DataFrame的單列 - 它是'Series',而不是'DataFrame'只有一列。使用'print(type(dataset))'來查看它。 – furas
我在問題中添加了錯誤消息。希望它有幫助。 –