用convnet識別驗證碼，如何定義丟失函數

我有一個小型的研究項目，我嘗試解碼一些驗證碼圖片。我用Tensorflow 0.9 convnet實施的基礎上，MNIST例子（https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py）用convnet識別驗證碼，如何定義丟失函數

我的代碼可以在GitHub上https://github.com/ksopyla/decapcha/blob/master/decaptcha_convnet.py

我嘗試做重現描述的想法：

「多位數利用深度卷積神經網絡從街景圖像識別號碼「Goodfellow at al（https://arxiv.org/pdf/1312.6082.pdf）
」具有主動深度學習的CAPTCHA識別「Stark at al（https://vision.in.tum.de/_media/spezial/bib/stark-gcpr15.pdf）

其中特定的字符序列被編碼爲一個二進制向量。在我的情況下的驗證碼中包含最多20個拉丁字符，每個字符被編碼爲63暗淡的二元載體，其中1個比特被設定在位置，根據：

數字「0-9」 - 在0- 1位9
大字母 'AZ' - 1在位置10-35
小字母 'AZ' - 1個atposition 36-61
位置62被保留用於空白字符 ''（更短的話，然後20個字符是填充''多達20個）

所以最後當我連接所有20個字符時，我得到了20 * 63昏暗的矢量，我的網絡應該學習。我的主要問題是如何爲優化器定義適當的損失函數。

架構我的網絡的：

CONV 3x3x32 - > RELU - >池（K = 2） - >差
CONV 3x3x64 - > RELU - >池（K = 2） - >差
CONV 3x3x64 - > RELU - >池（K = 2） - >差
FC 1024 - > RELU - >差
輸出20 * 63 -

所以我的主要問題是如何爲優化器定義損失以及如何評估模型。我有嘗試這樣的事情

# Construct model 
pred = conv_net(x, weights, biases, keep_prob) 

# Define loss and optimizer 

#split prediction for each char it takes 63 continous postions, we have 20 chars 
split_pred = tf.split(1,20,pred) 
split_y = tf.split(1,20,y) 


#compute partial softmax cost, for each char 
costs = list() 
for i in range(20): 
    costs.append(tf.nn.softmax_cross_entropy_with_logits(split_pred[i],split_y[i])) 

#reduce cost for each char 
rcosts = list() 
for i in range(20): 
    rcosts.append(tf.reduce_mean(costs[i])) 

# global reduce  
loss = tf.reduce_sum(rcosts) 
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(loss) 


# Evaluate model 

# pred are in format batch_size,20*63, reshape it in order to have each  character prediction 
# in row, then take argmax of each row (across columns) then check if it is  equal 
# original label max indexes 
# then sum all good results and compute mean (accuracy) 

#batch, rows, cols 
p = tf.reshape(pred,[batch_size,20,63]) 
#max idx acros the rows 
#max_idx_p=tf.argmax(p,2).eval() 
max_idx_p=tf.argmax(p,2) 

l = tf.reshape(y,[batch_size,20,63]) 
#max idx acros the rows 
#max_idx_l=tf.argmax(l,2).eval() 
max_idx_l=tf.argmax(l,2) 

correct_pred = tf.equal(max_idx_p,max_idx_l) 
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))enter code   here

我儘量拆分從輸出的每個字符並做SOFTMAX及cross_entropy每個字符separatelly，然後將所有的成本。但我已經將tensorflow函數與普通的python列表混合在一起，我可以這樣做嗎？張量流引擎會理解這一點嗎？我可以使用哪些tensorflow函數來代替python列表？

精度以類似的方式計算，輸出重新整形爲20x63，我從每一行取得argmax，而不是與真正編碼的char進行比較。

當我運行這個損失函數正在減少，但準確度上升然後下降。此圖顯示了它的樣子https://plon.io/files/57a0a7fb4bb1210001ca0476

我將不勝感激任何進一步的評論，我已經做出的錯誤或想法實施。

來源

2016-08-02 ksopyla

在較新的TF版本中，您可以使用Python列表作爲'reduce_sum'的輸入。這相當於首先在Python列表中調用'tf.pack'將其轉換爲TensorFlow張量。精確度圖看起來很奇怪，但是請注意，當交叉熵損失非常大時，如果交錯熵爲百萬，那麼交叉熵的降低不一定會提高準確性。我會添加L2罰款正規化器，並嘗試等到交叉熵接近於零。此外，它有助於開始簡單的問題（即只有數字），以瞭解等待多久的意義 –

我想知道這個問題 'loss = tf.nn.sigmoid_cross_entropy_with_logits（pred，y）' would not更合適。以前的方法使用'softmax_cross_entrophy_with_logits'，但類應該是異常的，所以我分割每個字符計算softmax_cross_entropy並且按順序對所有20個字符求和。 – ksopyla

真正的問題是數據規範化，我的Xdata是矩陣[N，D]當我規範化圖像，然後網絡開始學習模式 'x_mean = Xdata.mean（axis = 0） x_std = Xdata.std（axis = 0 ） X =（Xdata-x_mean）/（x_std + 0.00001）' – ksopyla

真正的問題是我的網絡卡住了，網絡輸出對於任何輸入都是不變的。

當我將損失函數更改爲loss = tf.nn.sigmoid_cross_entropy_with_logits(pred,y)並正常化輸入時，網絡開始學習模式。

標準化（。減去平均值和由STD除）有很大幫助，

擴展數據是矩陣[N，d]

x_mean = Xdata.mean(axis=0) 
x_std = Xdata.std(axis=0) 
X = (Xdata-x_mean)/(x_std+0.00001)

數據預處理是關鍵，這是值得讀http://cs231n.github.io/neural-networks-2/#data-preprocessing

來源

2016-08-23 10:04:48 ksopyla

用convnet識別驗證碼，如何定義丟失函數

回答

相關問題