tensorflow神經網絡多層感知迴歸例如

我想寫一個MLP與TensorFlow（我剛開始學，所以道歉的代碼！）爲多元迴歸（無MNIST，請）。這裏是我的MWE，我選擇使用sklearn的linnerud數據集。（實際上我使用的是一個更大的數據集，在這裏我只使用一層，因爲我想讓MWE變小，但如果需要，我可以添加）。順便說一句，我在train_test_split中使用shuffle = False，因爲實際上我正在使用時間序列數據集。tensorflow神經網絡多層感知迴歸例如

MWE

######################### import stuff ########################## 
import numpy as np 
import pandas as pd 
import tensorflow as tf 
from sklearn.datasets import load_linnerud 
from sklearn.model_selection import train_test_split 


######################## prepare the data ######################## 
X, y = load_linnerud(return_X_y = True) 
X_train, X_test, y_train, y_test = train_test_split(X, y, shuffle = False, test_size = 0.33) 


######################## set learning variables ################## 
learning_rate = 0.0001 
epochs = 100 
batch_size = 3 


######################## set some variables ####################### 
x = tf.placeholder(tf.float32, [None, 3], name = 'x') # 3 features 
y = tf.placeholder(tf.float32, [None, 3], name = 'y') # 3 outputs 

# input-to-hidden layer1 
W1 = tf.Variable(tf.truncated_normal([3,300], stddev = 0.03), name = 'W1') 
b1 = tf.Variable(tf.truncated_normal([300]), name = 'b1') 

# hidden layer1-to-output 
W2 = tf.Variable(tf.truncated_normal([300,3], stddev = 0.03), name= 'W2')  
b2 = tf.Variable(tf.truncated_normal([3]), name = 'b2') 


######################## Activations, outputs ###################### 
# output hidden layer 1 
hidden_out = tf.nn.relu(tf.add(tf.matmul(x, W1), b1)) 

# total output 
y_ = tf.nn.relu(tf.add(tf.matmul(hidden_out, W2), b2)) 


####################### Loss Function ######################### 
mse = tf.losses.mean_squared_error(y, y_) 


####################### Optimizer  ######################### 
optimizer = tf.train.GradientDescentOptimizer(learning_rate = learning_rate).minimize(mse) 


###################### Initialize, Accuracy and Run ################# 
# initialize variables 
init_op = tf.global_variables_initializer() 

# accuracy for the test set 
accuracy = tf.reduce_mean(tf.square(tf.subtract(y, y_))) # or could use tf.losses.mean_squared_error 

#run 
with tf.Session() as sess: 
    sess.run(init_op) 
    total_batch = int(len(y_train)/batch_size) 
    for epoch in range(epochs): 
     avg_cost = 0 
     for i in range(total_batch): 
       batch_x, batch_y = X_train[i*batch_size:min(i*batch_size + batch_size, len(X_train)), :], y_train[i*batch_size:min(i*batch_size + batch_size, len(y_train)), :] 
       _, c = sess.run([optimizer, mse], feed_dict = {x: batch_x, y: batch_y}) 
       avg_cost += c/total_batch 
     print('Epoch:', (epoch+1), 'cost =', '{:.3f}'.format(avg_cost)) 
    print(sess.run(mse, feed_dict = {x: X_test, y:y_test}))

此打印出這樣的事情

... 
Epoch: 98 cost = 10992.617 
Epoch: 99 cost = 10992.592 
Epoch: 100 cost = 10992.566 
11815.1

所以，很顯然有些不妥。我懷疑問題是在成本函數/準確性還是在我使用批次的方式，但我不能完全弄清楚它。

來源

2017-10-19 Euler_Salter

也許問題之一是，我不使用正規化？ –

我試過做'regularizer1 = tf.nn.l2_loss（W1）'和'regularizer2 = tf.nn.l2_loss（W2）''，然後將它們添加到損失函數'mse = tf.losses.mean_squared_error y，y_）+ 0.001 * regularizer1 + 0.001 * regularizer2'但它只會變得更糟.. –

據我所知，該模型是學習。我試圖調整一些超參數（最顯着的是 - 學習速度和隱藏層大小）並獲得更好的結果。下面是完整的代碼：

######################### import stuff ########################## 
import numpy as np 
import pandas as pd 
import tensorflow as tf 
from sklearn.datasets import load_linnerud 
from sklearn.model_selection import train_test_split 

######################## prepare the data ######################## 
X, y = load_linnerud(return_X_y=True) 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False) 

######################## set learning variables ################## 
learning_rate = 0.0005 
epochs = 2000 
batch_size = 3 

######################## set some variables ####################### 
x = tf.placeholder(tf.float32, [None, 3], name='x') # 3 features 
y = tf.placeholder(tf.float32, [None, 3], name='y') # 3 outputs 

# hidden layer 1 
W1 = tf.Variable(tf.truncated_normal([3, 10], stddev=0.03), name='W1') 
b1 = tf.Variable(tf.truncated_normal([10]), name='b1') 

# hidden layer 2 
W2 = tf.Variable(tf.truncated_normal([10, 3], stddev=0.03), name='W2') 
b2 = tf.Variable(tf.truncated_normal([3]), name='b2') 

######################## Activations, outputs ###################### 
# output hidden layer 1 
hidden_out = tf.nn.relu(tf.add(tf.matmul(x, W1), b1)) 

# total output 
y_ = tf.nn.relu(tf.add(tf.matmul(hidden_out, W2), b2)) 

####################### Loss Function ######################### 
mse = tf.losses.mean_squared_error(y, y_) 

####################### Optimizer  ######################### 
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(mse) 

###################### Initialize, Accuracy and Run ################# 
# initialize variables 
init_op = tf.global_variables_initializer() 

# accuracy for the test set 
accuracy = tf.reduce_mean(tf.square(tf.subtract(y, y_))) # or could use tf.losses.mean_squared_error 

# run 
with tf.Session() as sess: 
    sess.run(init_op) 
    total_batch = int(len(y_train)/batch_size) 
    for epoch in range(epochs): 
    avg_cost = 0 
    for i in range(total_batch): 
     batch_x, batch_y = X_train[i * batch_size:min(i * batch_size + batch_size, len(X_train)), :], \ 
         y_train[i * batch_size:min(i * batch_size + batch_size, len(y_train)), :] 
     _, c = sess.run([optimizer, mse], feed_dict={x: batch_x, y: batch_y}) 
     avg_cost += c/total_batch 
    if epoch % 10 == 0: 
     print 'Epoch:', (epoch + 1), 'cost =', '{:.3f}'.format(avg_cost) 
    print sess.run(mse, feed_dict={x: X_test, y: y_test})

輸出：

Epoch: 1901 cost = 173.914 
Epoch: 1911 cost = 171.928 
Epoch: 1921 cost = 169.993 
Epoch: 1931 cost = 168.110 
Epoch: 1941 cost = 166.277 
Epoch: 1951 cost = 164.492 
Epoch: 1961 cost = 162.753 
Epoch: 1971 cost = 161.061 
Epoch: 1981 cost = 159.413 
Epoch: 1991 cost = 157.808 
482.433

我想你可以調整它更進一步，但它並沒有因爲數據是如此之小，是有意義的。雖然我沒有嘗試正規化，但我相信你會需要L2 reg或dropout來避免過度配合。

來源

2017-10-19 15:00:29 Maxim

謝謝！你會有其他建議嗎？另外，你認爲使用餘弦相似性而不是點積是值得的嗎？例如（我甚至不確定這是在張量流中做正確的方式）'hidden_out = tf.nn.relu（tf.add（tf.divide（tf.matmul（x，W1），tf.multiply（tf .norm（x），tf.norm（W1））），b1））'和'y_ = tf.nn.relu（tf.add（tf.divide（tf.matmul（hidden_out，W2），tf.multiply tf.norm（hidden_out），tf.norm（W2））），b2））' –

至於實現，請參閱https://stackoverflow.com/q/43357732/712995 – Maxim

@Euler_Salter很難給出具體提示，在實際時間序列數據（我相信你的目標不是Linnerrud數據集）。一般來說，我會考慮添加蝙蝠神經層和/或輟學，這有助於防止過度擬合，並且往往學得更快。考慮其他激活功能：ELU，SELU。當你用一個隱藏層達到硬限制時，也許是時候進入深層網絡。但是每個模型都需要仔細檢查，然後才能做出決定 - 漸變如何流動，激活的分佈是什麼等 – Maxim

tensorflow神經網絡多層感知迴歸例如

回答

相關問題