時間序列分類

可以訪問數據集在此鏈接https://drive.google.com/file/d/0B9Hd-26lI95ZeVU5cDY0ZU5MTWs/view?usp=sharing 時間序列分類

我的任務是預測一個行業的基金的價格變動。它的上升或下降並不重要，我只想知道它是上漲還是下跌。所以我把它定義爲分類問題。

由於這個數據集是一個時間序列數據，我遇到了很多問題。我已經閱讀過有關這些問題的文章，比如我不能使用k-fold交叉驗證，因爲這是時間序列數據。你不能忽略數據的順序。

我的代碼如下：

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import datetime 
from sklearn.linear_model import LinearRegression 
from math import sqrt 
from sklearn.svm import LinearSVC 
from sklearn.svm import SVCenter code here 

lag1 = pd.read_csv(#local file path, parse_dates=['Date']) 

#Trend : if price going up: ture, otherwise false 
lag1['Trend'] = lag1.XLF > lag1.XLF.shift() 
train_size = round(len(lag1)*0.50) 
train = lag1[0:train_size] 
test = lag1[train_size:] 

variable_to_use= ['rGDP','interest_rate','private_auto_insurance','M2_money_supply','VXX'] 
y_train = train['Trend'] 
X_train = train[variable_to_use] 
y_test = test['Trend'] 
X_test = test[variable_to_use] 

#SVM Lag1 

this_C = 1.0 
clf = SVC(kernel = 'linear', C=this_C).fit(X_train, y_train) 
print('XLF Lag1 dataset') 
print('Accuracy of Linear SVC classifier on training set: {:.2f}' 
.format(clf.score(X_train, y_train))) 
print('Accuracy of Linear SVC classifier on test set: {:.2f}' 
.format(clf.score(X_test, y_test))) 

#Check prediction results 
clf.predict(X_test)

首先，是我的方法就在這裏：第一生成的真假列？如果我只是簡單地將這一欄提供給它，我恐怕機器不能理解這一欄。我是否應該首先執行迴歸，然後比較數字結果以生成上升或下降列表？

訓練集的準確性非常低：0.58我得到一個數組，其中包含clf.predict（X_test）的所有特徵，我不知道爲什麼我會得到所有特徵。

我不知道得出的準確度是以哪種方式計算的：例如，我認爲我目前的準確度只計算真假的數量，但忽略它們的順序？由於這是時間序列數據，忽略訂單是不正確的，並且沒有提供關於預測價格變動的信息。假設我在測試集中有40個示例，並且我有20個Tures，我會得到50％的準確度。但是我認爲這些真理並不處於正確的位置，因爲它出現在基本真理集中。（告訴我，如果我錯了）

我也在考慮使用梯度增強樹來做分類，會更好嗎？

來源

2017-08-08 Dylan

對這些數據進行一些預處理可能會有幫助。第一步可能會去是這樣的：

df = pd.read_csv('YOURLOCALFILEPATH',header=0) 
#more code than your method but labels rows as 0 or 1 and easy to output to new file for later reference 
df['Date'] = pd.to_datetime(df['date'], unit='d') 
df = df.set_index('Date') 
df['compare'] = df['XLF'].shift(-1) 
df['Label'] np.where(df['XLF']>df['compare'), 1, 0) 
df.drop('compare', axis=1, inplace=True)

第二步可以通過餵養它到你的模型之前，你的縮放功能輸入使用sklearn的built in scalers, such as the MinMax scaler之一，對數據進行預處理。

來源

2017-11-01 22:58:20

你可以添加示例代碼嗎？ –

時間序列分類

回答

相關問題