2017-04-20 155 views
2

我試圖通過Tensorflow寫Keras 2 LSTM使用自定義損失函數:異常在Tensorflow功能作爲Keras定製損失

model.compile(loss=in_top_k_loss, optimizer='rmsprop', metrics=[bin_crossent_true_only, 'binary_crossentropy', 'mean_squared_error', 'accuracy']) 

我的訓練集有不同大小的時間維度的例子,因此,我使用train_on_batch,其中每批只包含具有相同時間維度的實例。批量大小爲256 下面的代碼拋出的第一個時代一個很討厭的例外(當train_on_batch首次調用):

# takes 2 1D arrays of equal length, returns a single value (the negative of my own "precision" measure) 
def in_top_k_loss_single(y_true, y_pred): 
    y_true_labels = tf.cast(tf.transpose(tf.where(y_true > 0))[0], tf.int32) 
    y_pred = tf.reshape(y_pred, [1, tf.shape(y_pred)[0]]) 
    y_topk_tensor = tf.nn.top_k(y_pred, k=7) 
    y_topk_ixs = y_topk_tensor[0][0][:7] 
    y_topk = y_topk_tensor[1][0][:7] 
    y_topk_len = tf.cast(tf.count_nonzero(y_topk_ixs), tf.int32) 
    y_topk = y_topk[:y_topk_len] 
    y_topk0 = tf.expand_dims(y_topk, 1) 
    y_true_labels0 = tf.expand_dims(y_true_labels, 0) 
    re = tf.cast(tf.reduce_any(tf.equal(y_topk0, y_true_labels0), 1), tf.int32)/tf.range(1,y_topk_len+1) 
    return (-1) * tf.where(tf.equal(tf.reduce_sum(y_pred), tf.constant(0.0)), tf.constant(0.0), tf.cast(tf.reduce_mean(re),tf.float32)) 

# takes 2 matrices of equal sizes, 
# applies the upper function for y_true[i] & y_pred[i] for each row i, 
# returns a single value (mean of all row-wise values) 
def in_top_k_loss(y_true, y_pred): 
    # if I change `in_top_k_loss_single` to `keras.metrics.binary_crossentropy` (for instance) it runs 
    return K.mean(tf.map_fn(lambda x: in_top_k_loss_single(x[0], x[1]), (y_true, y_pred), dtype=tf.float32)) 

其中in_top_k_loss是我在Keras模型定製損失函數。 當我用不同的輸入(甚至是棘手的)分別測試它們時,這些函數似乎可以工作。看起來只有Keras有問題 - 或許它需要不同的數據類型/形狀/等等。

來自互聯網的一些聰明的想法:試圖改變批量大小,更改優化器並剪裁漸變 - 沒有成功。還試過在train_on_batch之前撥打evaluate - 沒有成功。

代碼的其餘產品來自Keras損失以及像這樣的損失:

def bin_crossent_true_only(y_true, y_pred): 
    return (1 + keras.backend.sum(y_pred)) * keras.metrics.binary_crossentropy(y_true, y_true * y_pred) 

功能in_top_k_loss作品並且如果metrics陣列中使用返回有意義的結果。 所有輸入(y_true,y_pred)不是NaN。 y_true可以具有0和1(每行0或更多1,即訓練集的每個實例)。

異常本身:

Traceback (most recent call last): 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 491, in apply_op 
    preferred_dtype=default_dtype) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor 
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function 
    return constant(v, dtype=dtype, name=name) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant 
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto 
    raise ValueError("None values not supported.") 
ValueError: None values not supported. 

During handling of the above exception, another exception occurred: 

Traceback (most recent call last): 
    File "<stdin>", line 9, in <module> 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\models.py", line 941, in train_on_batch 
    class_weight=class_weight) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1620, in train_on_batch 
    self._make_train_function() 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\engine\training.py", line 1002, in _make_train_function 
    self.total_loss) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\optimizers.py", line 210, in get_updates 
    new_a = self.rho * a + (1. - self.rho) * K.square(g) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\backend\tensorflow_backend.py", line 1225, in square 
    return tf.square(x) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\math_ops.py", line 384, in square 
    return gen_math_ops.square(x, name=name) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 2733, in square 
    result = _op_def_lib.apply_op("Square", x=x, name=name) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 504, in apply_op 
    values, as_ref=input_arg.is_ref).dtype.name 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 702, in internal_convert_to_tensor 
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 110, in _constant_tensor_conversion_function 
    return constant(v, dtype=dtype, name=name) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\constant_op.py", line 99, in constant 
    tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape)) 
    File "C:\Users\myname\AppData\Local\Programs\Python\Python36\lib\site-packages\tensorflow\python\framework\tensor_util.py", line 360, in make_tensor_proto 
    raise ValueError("None values not supported.") 
ValueError: None values not supported. 

回答

3

在TensorFlow的優化要求的損失函數是可微分的,其由所有的損失結果和在具有定義的梯度TensorFlow圖中的變量之間的操作的確定。 tf.where()操作沒有定義的梯度,這意味着整體損失函數是不可微分的。試圖計算TensorFlow中不可微分函數的梯度的結果是None,這會導致您在Keras嘗試更新變量時看到的錯誤。

+1

這回答了我的問題。不幸的是,我想不出去除函數體中2'where'的用法。 :( 我可以使用MAP @ k - 預測的最高K類(一個實例可能有多個標籤)的平均平均精度 - 作爲損失以某種方式(負數,因爲我們將損失最小化)?或者我的替代選擇 - 在'metrics'中使用它並單獨監控它(然後在度量值最大時使用回調來保存模型)? – altier2856

+1

這是否意味着沒有簡單的方法來懲罰損失中的符號變化?有點像這樣:https://codeburst.io/neural-networks-for-algorithmic-trading-volatility-forecasting-and-custom-loss-functions-c030e316ea7e使用K.switch(它是Theano) – wordsforthewise

相關問題