3

我正在建立一個神經網絡來學習識別來自MNIST的手寫數字。我已經證實反向傳播可以完美地計算梯度(梯度檢查給出了錯誤< 10^-10)。神經網絡MNIST:反向傳播是正確的,但培訓/測試的準確性非常低

看來,無論我如何訓練權重,成本函數總是傾向於大約3.24-3.25(從不低於該值,從上方接近),訓練/測試集精度非常低(約爲11%測試裝置)。看來最終的h值都非常接近0.1並且相互之間。

我找不到爲什麼我的程序無法產生更好的結果。我想知道是否有人可以看看我的代碼,請告訴我發生這種情況的任何原因。非常感謝您的幫助,我非常感謝!

這裏是我的Python代碼:

import numpy as np 
import math 
from tensorflow.examples.tutorials.mnist import input_data 

# Neural network has four layers 
# The input layer has 784 nodes 
# The two hidden layers each have 5 nodes 
# The output layer has 10 nodes 
num_layer = 4 
num_node = [784,5,5,10] 
num_output_node = 10 

# 30000 training sets are used 
# 10000 test sets are used 
# Can be adjusted 
Ntrain = 30000 
Ntest = 10000 

# Sigmoid Function 
def g(X): 
    return 1/(1 + np.exp(-X)) 

# Forwardpropagation 
def h(W,X): 
    a = X 
    for l in range(num_layer - 1): 
     a = np.insert(a,0,1) 
     z = np.dot(a,W[l]) 
     a = g(z) 
    return a  

# Cost Function 
def J(y, W, X, Lambda): 
    cost = 0 
    for i in range(Ntrain): 
     H = h(W,X[i]) 
     for k in range(num_output_node):    
      cost = cost + y[i][k] * math.log(H[k]) + (1-y[i][k]) * math.log(1-H[k]) 
    regularization = 0 
    for l in range(num_layer - 1): 
     for i in range(num_node[l]): 
      for j in range(num_node[l+1]): 
       regularization = regularization + W[l][i+1][j] ** 2 
    return (-1/Ntrain * cost + Lambda/(2*Ntrain) * regularization) 

# Backpropagation - confirmed to be correct 
# Algorithm based on https://www.coursera.org/learn/machine-learning/lecture/1z9WW/backpropagation-algorithm 
# Returns D, the value of the gradient 
def BackPropagation(y, W, X, Lambda): 
    delta = np.empty(num_layer-1, dtype = object) 
    for l in range(num_layer - 1): 
     delta[l] = np.zeros((num_node[l]+1,num_node[l+1])) 
    for i in range(Ntrain): 
     A = np.empty(num_layer-1, dtype = object) 
     a = X[i] 
     for l in range(num_layer - 1): 
      A[l] = a 
      a = np.insert(a,0,1) 
      z = np.dot(a,W[l]) 
      a = g(z) 
     diff = a - y[i] 
     delta[num_layer-2] = delta[num_layer-2] + np.outer(np.insert(A[num_layer-2],0,1),diff) 
     for l in range(num_layer-2): 
      index = num_layer-2-l 
      diff = np.multiply(np.dot(np.array([W[index][k+1] for k in range(num_node[index])]), diff), np.multiply(A[index], 1-A[index])) 
      delta[index-1] = delta[index-1] + np.outer(np.insert(A[index-1],0,1),diff) 
    D = np.empty(num_layer-1, dtype = object) 
    for l in range(num_layer - 1): 
     D[l] = np.zeros((num_node[l]+1,num_node[l+1])) 
    for l in range(num_layer-1): 
     for i in range(num_node[l]+1): 
      if i == 0: 
       for j in range(num_node[l+1]): 
        D[l][i][j] = 1/Ntrain * delta[l][i][j] 
      else: 
       for j in range(num_node[l+1]): 
        D[l][i][j] = 1/Ntrain * (delta[l][i][j] + Lambda * W[l][i][j]) 
    return D 

# Neural network - this is where the learning/adjusting of weights occur 
# W is the weights 
# learn is the learning rate 
# iterations is the number of iterations we pass over the training set 
# Lambda is the regularization parameter 
def NeuralNetwork(y, X, learn, iterations, Lambda): 

    W = np.empty(num_layer-1, dtype = object) 
    for l in range(num_layer - 1): 
     W[l] = np.random.rand(num_node[l]+1,num_node[l+1])/100 
    for k in range(iterations): 
     print(J(y, W, X, Lambda)) 
     D = BackPropagation(y, W, X, Lambda) 
     for l in range(num_layer-1): 
      W[l] = W[l] - learn * D[l] 
    print(J(y, W, X, Lambda)) 
    return W 

mnist = input_data.read_data_sets("MNIST_data/", one_hot=True) 

# Training data, read from MNIST 
inputpix = [] 
output = [] 

for i in range(Ntrain): 
    inputpix.append(2 * np.array(mnist.train.images[i]) - 1) 
    output.append(np.array(mnist.train.labels[i])) 

np.savetxt('input.txt', inputpix, delimiter=' ') 
np.savetxt('output.txt', output, delimiter=' ') 

# Train the weights 
finalweights = NeuralNetwork(output, inputpix, 2, 5, 1) 

# Test data 
inputtestpix = [] 
outputtest = [] 

for i in range(Ntest): 
    inputtestpix.append(2 * np.array(mnist.test.images[i]) - 1) 
    outputtest.append(np.array(mnist.test.labels[i])) 

np.savetxt('inputtest.txt', inputtestpix, delimiter=' ') 
np.savetxt('outputtest.txt', outputtest, delimiter=' ') 

# Determine the accuracy of the training data 
count = 0 
for i in range(Ntrain): 
    H = h(finalweights,inputpix[i]) 
    print(H) 
    for j in range(num_output_node): 
     if H[j] == np.amax(H) and output[i][j] == 1: 
      count = count + 1 
print(count/Ntrain) 

# Determine the accuracy of the test data 
count = 0 
for i in range(Ntest): 
    H = h(finalweights,inputtestpix[i]) 
    print(H) 
    for j in range(num_output_node): 
     if H[j] == np.amax(H) and outputtest[i][j] == 1: 
      count = count + 1 
print(count/Ntest) 
+0

你可以改變一個標籤[python](https://stackoverflow.com/questions/tagged/python)嗎?因此代碼將被適當突出顯示。 – ahmedus

回答

3

你的網絡是微小的,5元,使其基本上是線性模型。將其增加到每層256個。

請注意,平凡的線性模型有768 * 10 + 10(偏差)參數,加起來可達7690個浮點數。另一方面,你的神經網絡有768 * 5 + 5 + 5 * 5 + 5 + 5 * 10 + 10 = 3845 + 30 + 60 = 3935。換句話說,儘管是非線性神經網絡,它實際上是一個比一個微不足道的邏輯迴歸應用於這個問題。邏輯迴歸本身可以獲得大約11%的誤差,因此你不能期望擊敗它。當然,這不是一個嚴格的論點,但應該給你一些直覺,說明爲什麼它不起作用。

第二個問題涉及到其他超參數,你似乎可以用:(?是2)

  • 巨大的學習速度應該多階0.0001
  • 很少的訓練迭代(你?只是執行5個時期)
  • 你的正則化參數是巨大的(它被設置爲1),那麼你的網絡重罰再次學習任何東西, - 將其更改爲幅度要小東西爲了
+0

看來主要問題是我使用了錯誤的激活函數。但是,每層添加更多的神經元對準確性有很大的幫助。 我只是想知道,如何選擇所有的超參數和每層神經元的數量?非常感謝您的幫助! – user8384788

+0

乙狀結腸激活是好的,不是最好的,但對於MNIST應該沒問題。設置超參數有點瘋狂 - 有一些拇指規則(如能夠告訴你明顯不夠用),但通常主要是模型和/或大量搜索/檢查不同事物的經驗。例如,如果訓練錯誤比你的模型卡住可能缺乏能力(因此神經元太少)等等,但是這裏沒有「硬」規則 – lejlot

+0

有趣的是,當我設置超參數時,我會記住這些。非常感謝你的一切,我真的很感激! – user8384788

0

NN架構很可能不適合。也許,學習率是高/低。或者正規化參數存在大部分問題。