2017-04-24 448 views
1

我目前正在研究並試圖改進一個python腳本,目的是預測股市走勢(非常簡單)。lstm神經網絡異常輸出相同的值

的問題:我得到同樣的產值,我真的不明白爲什麼,因爲它假設更喜歡這張圖(紅線爲預測我應該得到和藍線是真實數據):https://i.stack.imgur.com/dvQvY.png

下面是代碼:

import numpy as np 
import matplotlib.pyplot as plt 
import pandas as pd 
from pandas import datetime 
import math, time 
import itertools 
from sklearn import preprocessing 
import datetime 
from operator import itemgetter 
from sklearn.metrics import mean_squared_error 
from math import sqrt 
from keras.models import Sequential 
from keras.layers.core import Dense, Dropout, Activation 
from keras.layers.recurrent import LSTM 

def get_stock_data(stock_name, normalized=0): 
    url = 'http://chart.finance.yahoo.com/table.csv?s=%s&a=11&b=15&c=2011&d=29&e=10&f=2016&g=d&ignore=.csv' % stock_name 

    col_names = ['Date','Open','High','Low','Close','Volume','Adj Close'] 
    stocks = pd.read_csv(url, header=0, names=col_names) 
    df = pd.DataFrame(stocks) 
    date_split = df['Date'].str.split('-').str 
    df['Year'], df['Month'], df['Day'] = date_split 
    df["Volume"] = df["Volume"]/10000 
    df.drop(df.columns[[0,3,5,6, 7,8,9]], axis=1, inplace=True) 
    return df 

stock_name = 'GOOGL' 
df = get_stock_data(stock_name,0) 
df.head() 

today = datetime.date.today() 
file_name = stock_name+'_stock_%s.csv' % today 
df.to_csv(file_name) 

df['High'] = df['High']/100 
df['Open'] = df['Open']/100 
df['Close'] = df['Close']/100 
df.head(5) 

def load_data(stock, seq_len): 
    amount_of_features = len(stock.columns) 
    data = stock.as_matrix() #pd.DataFrame(stock) 
    sequence_length = seq_len + 1 
    result = [] 
    for index in range(len(data) - sequence_length): 
     result.append(data[index: index + sequence_length]) 

    result = np.array(result) 
    row = round(0.9 * result.shape[0]) 
    train = result[:int(row), :] 
    x_train = train[:, :-1] 
    y_train = train[:, -1][:,-1] 
    x_test = result[int(row):, :-1] 
    y_test = result[int(row):, -1][:,-1] 

    x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], amount_of_features)) 
    x_test = np.reshape(x_test, (x_test.shape[0], x_test.shape[1], amount_of_features)) 

    return [x_train, y_train, x_test, y_test] 

def build_model2(layers): 
     d = 0.2 
     model = Sequential() 
     model.add(LSTM(128, input_shape=(layers[1], layers[0]), return_sequences=True)) 
     model.add(Dropout(d)) 
     model.add(LSTM(64, input_shape=(layers[1], layers[0]), return_sequences=False)) 
     model.add(Dropout(d)) 
     model.add(Dense(16,init='uniform',activation='relu'))   
     model.add(Dense(1,init='uniform',activation='linear')) 
     model.compile(loss='mse',optimizer='adam',metrics=['accuracy']) 
     return model 

window = 22 
X_train, y_train, X_test, y_test = load_data(df[::-1], window) 
print("X_train", X_train.shape) 
print("y_train", y_train.shape) 
print("X_test", X_test.shape) 
print("y_test", y_test.shape) 

model = build_model2([3,window,1]) 

model.fit(
    X_train, 
    y_train, 
    batch_size=512, 
    nb_epoch=500, 
    validation_split=0.1, 
    verbose=1) 

trainScore = model.evaluate(X_train, y_train, verbose=0) 
print('Train Score: %.2f MSE (%.2f RMSE)' % (trainScore[0], math.sqrt(trainScore[0]))) 

testScore = model.evaluate(X_test, y_test, verbose=0) 
print('Test Score: %.2f MSE (%.2f RMSE)' % (testScore[0],math.sqrt(testScore[0]))) 

# print(X_test[-1]) 
diff=[] 
ratio=[] 
p = model.predict(X_test) 
for u in range(len(y_test)): 
    pr = p[u][0] 
    ratio.append((y_test[u]/pr)-1) 
    diff.append(abs(y_test[u]- pr)) 
    #print(u, y_test[u], pr, (y_test[u]/pr)-1, abs(y_test[u]- pr)) 

import matplotlib.pyplot as plt2 

plt2.plot(p,color='red', label='prediction') 
plt2.plot(y_test,color='blue', label='y_test') 
plt2.legend(loc='upper left') 
plt2.show() 

輸出我得到:https://i.stack.imgur.com/6TVRb.png

(我已經試圖改變批量大小和時期數)

我目前使用的(MacOSX上塞拉利昂):

的Python 3.6.0(默認情況下,2017年1月2日,18時14分29秒) [GCC 4.2.1兼容蘋果LLVM 8.0.0(clang- 800.0.42.1)]在達爾文 輸入「copyright」,「credits」或「license()」以獲取更多信息。 警告:使用中的Tcl/Tk(8.5.9)版本可能不穩定。

在前面的代碼使用的每個模塊是最新(2017年4月24日)

我可能已經忘記了一些相關信息,不要猶豫,問我。

感謝

+0

什麼使你相信它應該輸出你鏈接的原始圖像? – Grr

+0

因爲它做了一次,至少我希望腳本輸出一些不同於簡單的直線的東西(例如遵循平均值或類似的東西)。什麼讓我覺得有一個問題是,現在,不管我輸入什麼樣的數據(蘋果股票,谷歌,...),輸出總是一條紅色的直線。我是新來的神經網絡,說實話,我不知道應該是什麼樣的預期產出。如果這是一種正常行爲,我如何才能找到觸發正確行爲的原因。 – Frank

+0

您是否試圖將數據縮放爲平均值= 0? – lz96

回答

0

變化nb_epoch=500nb_epoch=2 ,你會看到LSTM將正常工作。