2017-11-25 199 views
0

我想實現一個簡單的RNN來預測整數序列中的下一個整數。所以,我有一個數據集是如下:越來越ValueError:「只能使用多元索引的元組索引」

Id Sequence 
1 1,0,0,2,24,552,21280,103760,70299264,5792853248,587159944704 
2 1,1,5,11,35,93,269,747,2115,5933,16717,47003,132291,372157,1047181,2946251,8289731,23323853,65624397,184640891,519507267,1461688413,4112616845,11571284395,32557042499,91602704493,257733967693 
4 0,1,101,2,15,102,73,3,40,16,47,103,51,74,116,4,57,41,125,17,12,48,9,104,30,52,141,75,107,117,69,5,148,58,88,42,33,126,152,18,160,13,38,49,55,10,28,105,146,31,158 
5 1,4,14,23,42,33,35,34,63,66,87,116,84,101,126,164,128,102,135,143,149,155,203,224,186,204,210,237,261,218,219,286,257,266,361,355,336,302,374,339,371,398,340,409,348,388,494,436,407,406 
6 1,1,2,5,4,2,6,13,11,4,10,10,12,6,8,29,16,11,18,20,12,10,22,26,29,12,38,30,28,8,30,61,20,16,24,55,36,18,24,52,40,12,42,50,44,22,46,58,55,29,32,60,52,38,40,78,36,28,58,40,60,30,66,125,48,20,66,80,44,24 
9 0,31,59,90,120,151,181,212,243,273,304,334,365,396,424,455,485,516,546,577,608,638,669,699,730,761,789,820,850,881,911,942,973,1003,1034,1064,1095,1126,1155,1186,1216,1247,1277,1308,1339,1369,1400,1430 
10 1,1,2,5,13,36,111,347,1134,3832,13126,46281,165283,598401,2202404,8168642,30653724,116082962,442503542,1701654889,6580937039,25603715395,100223117080,394001755683,1556876401398,6178202068457,24608353860698,98421159688268,394901524823138,1589722790850089 
12 0,0,0,0,112,40286,5485032,534844548,45066853496,3538771308282,267882021563464,19861835713621616,1453175611052688600,105278656040052332838,7564280930105061931496 

到目前爲止我的代碼是:

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
import math 
from keras.models import Sequential 
from keras.layers import Dense 
from keras.layers import SimpleRNN 
from sklearn.preprocessing import MinMaxScaler 
from sklearn.metrics import mean_squared_error 
from keras.preprocessing.sequence import pad_sequences 

def stoarray(data = [], sep = ','): 
    return data.map(lambda x: np.array(x.split(sep), dtype=float)) 

def create_dataset(dataset, window_size=1): 
    dataX, dataY = [], [] 
    for i in range(len(dataset)-window_size-1): 
     a = dataset[i:(i+window_size), 0] 
     dataX.append(a) 
     dataY.append(dataset[i + window_size, 0]) #gives the ValueError : Can only tuple index with multi index 
    return np.array(dataX), np.array(dataY) 

# fix random seed for reproducibility 
np.random.seed(7) 

# loading data 
colna = ['id', 'seq'] 
train_data = pd.read_csv('G:/Python/integer_sequencing/testfile.csv', header=1) 
train_data.columns = colna 
dataset = train_data['seq'] 
#print(dataset) 
window_size = 1 
X_train, Y_train = create_dataset(dataset, window_size) 

print(X_train.head(5)) 
print(Y_train.head(5)) 

我試圖用X_train每個序列分割作爲輸入,它由一套完整的除上一學期和Y_train被視爲輸出將只包含最後一個數字。 但是當我運行代碼時,我得到了ValueError:只能使用帶有MultiIndex的元組索引。 任何人都可以解釋它對我的代碼意味着什麼,我必須做些什麼來解決它。

回溯電話: enter image description here

PS - 我新的堆棧溢出和深度學習,所以我會很感激,如果你可以建議和幫助我的問題的格式。

+0

始終把完整的錯誤消息(回溯)的問題 - 還有其他有用inforamtion,即。它顯示哪條線出現問題。 – furas

+0

在'dataset [i:(i + window_size),0]'你使用元組'i:(i + window_size),0'來獲取數據,但是它不允許這樣做。您可以只使用單個整數,如'dataset [0]'或切片'dataset [i:(i + window_size)]'。在你的代碼中'dataset'意思是'train_data ['seq']',它是DataFrame的單列 - 它是'Series',而不是'DataFrame'只有一列。使用'print(type(dataset))'來查看它。 – furas

+0

我在問題中添加了錯誤消息。希望它有幫助。 –

回答

1

您有i:(i+window_size), 0的問題dataset[i:(i+window_size), 0]

在您的代碼dataset意味着train_data['seq']這是單柱 - 單維Series - 但是你用i:(i+window_size), 0像二維DataFrame

只能使用單一的整數像dataset[0]或切片dataset[i:(i+window_size)]