我目前正在研究並試圖改進一個python腳本,目的是預測股市走勢(非常簡單)。lstm神經網絡異常輸出相同的值
的問題:我得到同樣的產值,我真的不明白爲什麼,因爲它假設更喜歡這張圖(紅線爲預測我應該得到和藍線是真實數據):https://i.stack.imgur.com/dvQvY.png
下面是代碼:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from pandas import datetime
import math, time
import itertools
from sklearn import preprocessing
import datetime
from operator import itemgetter
from sklearn.metrics import mean_squared_error
from math import sqrt
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.layers.recurrent import LSTM
def get_stock_data(stock_name, normalized=0):
url = 'http://chart.finance.yahoo.com/table.csv?s=%s&a=11&b=15&c=2011&d=29&e=10&f=2016&g=d&ignore=.csv' % stock_name
col_names = ['Date','Open','High','Low','Close','Volume','Adj Close']
stocks = pd.read_csv(url, header=0, names=col_names)
df = pd.DataFrame(stocks)
date_split = df['Date'].str.split('-').str
df['Year'], df['Month'], df['Day'] = date_split
df["Volume"] = df["Volume"]/10000
df.drop(df.columns[[0,3,5,6, 7,8,9]], axis=1, inplace=True)
return df
stock_name = 'GOOGL'
df = get_stock_data(stock_name,0)
df.head()
today = datetime.date.today()
file_name = stock_name+'_stock_%s.csv' % today
df.to_csv(file_name)
df['High'] = df['High']/100
df['Open'] = df['Open']/100
df['Close'] = df['Close']/100
df.head(5)
def load_data(stock, seq_len):
amount_of_features = len(stock.columns)
data = stock.as_matrix() #pd.DataFrame(stock)
sequence_length = seq_len + 1
result = []
for index in range(len(data) - sequence_length):
result.append(data[index: index + sequence_length])
result = np.array(result)
row = round(0.9 * result.shape[0])
train = result[:int(row), :]
x_train = train[:, :-1]
y_train = train[:, -1][:,-1]
x_test = result[int(row):, :-1]
y_test = result[int(row):, -1][:,-1]
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features))
x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features))
return [x_train, y_train, x_test, y_test]
def build_model2(layers):
d = 0.2
model = Sequential()
model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True))
model.add(Dropout(d))
model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False))
model.add(Dropout(d))
model.add(Dense(16,init='uniform',activation='relu'))
model.add(Dense(1,init='uniform',activation='linear'))
model.compile(loss='mse',optimizer='adam',metrics=['accuracy'])
return model
window = 22
X_train, y_train, X_test, y_test = load_data(df[::-1], window)
print("X_train", X_train.shape)
print("y_train", y_train.shape)
print("X_test", X_test.shape)
print("y_test", y_test.shape)
model = build_model2([3,window,1])
model.fit(
X_train,
y_train,
batch_size=512,
nb_epoch=500,
validation_split=0.1,
verbose=1)
trainScore = model.evaluate(X_train, y_train, verbose=0)
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore[0], math.sqrt(trainScore[0])))
testScore = model.evaluate(X_test, y_test, verbose=0)
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore[0],math.sqrt(testScore[0])))
# print(X_test[-1])
diff=[]
ratio=[]
p = model.predict(X_test)
for u in range(len(y_test)):
pr = p[u][0]
ratio.append((y_test[u]/pr)-1)
diff.append(abs(y_test[u]- pr))
#print(u, y_test[u], pr, (y_test[u]/pr)-1, abs(y_test[u]- pr))
import matplotlib.pyplot as plt2
plt2.plot(p,color='red', label='prediction')
plt2.plot(y_test,color='blue', label='y_test')
plt2.legend(loc='upper left')
plt2.show()
輸出我得到:https://i.stack.imgur.com/6TVRb.png
(我已經試圖改變批量大小和時期數)
我目前使用的(MacOSX上塞拉利昂):
的Python 3.6.0(默認情況下,2017年1月2日,18時14分29秒) [GCC 4.2.1兼容蘋果LLVM 8.0.0(clang- 800.0.42.1)]在達爾文 輸入「copyright」,「credits」或「license()」以獲取更多信息。 警告:使用中的Tcl/Tk(8.5.9)版本可能不穩定。
在前面的代碼使用的每個模塊是最新(2017年4月24日)
我可能已經忘記了一些相關信息,不要猶豫,問我。
感謝
什麼使你相信它應該輸出你鏈接的原始圖像? – Grr
因爲它做了一次,至少我希望腳本輸出一些不同於簡單的直線的東西(例如遵循平均值或類似的東西)。什麼讓我覺得有一個問題是,現在,不管我輸入什麼樣的數據(蘋果股票,谷歌,...),輸出總是一條紅色的直線。我是新來的神經網絡,說實話,我不知道應該是什麼樣的預期產出。如果這是一種正常行爲,我如何才能找到觸發正確行爲的原因。 – Frank
您是否試圖將數據縮放爲平均值= 0? – lz96