如何從另一個數組索引到張量張量流

我正在嘗試爲AI中的問題編寫深度q-學習網絡。我有一個函數predict()，它產生一個形狀(None, 3)的張量，輸入形狀(None, 5)。 (None, 3)中的3對應於每個狀態下可以採取的每個動作的q值。現在，在訓練步驟中，我必須多次呼叫predict()，並使用結果計算成本並訓練模型。爲此，我還有另一個可用的數據數組，名爲current_actions，它是一個列表，其中包含以前迭代中爲特定狀態採取的操作索引。如何從另一個數組索引到張量張量流

需要採取什麼是current_states_outputs應該從predict()輸出創建的張量，其中每行只包含一個q值（從的predict()輸出反對3）和q值應選擇哪些應取決於相應的指標current_actions。

例如，如果current_states_output = [[1,2,3],[4,5,6],[7,8,9]]和current_actions=[0,2,1]，在手術後的結果應該是[1,6,8]（更新）

我該怎麼辦呢？

我曾嘗試以下 -

current_states_outputs = self.sess.run(self.prediction, feed_dict={self.X:current_states}) 
    current_states_outputs = np.array([current_states_outputs[a][current_actions[a]] for a in range(len(current_actions))])

我基部跑predict()了會議，並做了使用需要普通的Python methords。但是因爲這樣可以切斷圖的前幾層的成本，所以不能進行培訓。所以，我需要做這個操作，保持在張量流中，並且把所有東西都保持爲張量張量本身。我怎樣才能管理這個？

來源

2017-08-17 ANANDA PADHMANABHAN S

你可以試試，

tf.squeeze(tf.gather_nd(a,tf.stack([tf.range(b.shape[0])[...,tf.newaxis], b[...,tf.newaxis]], axis=2)))

示例代碼：

a = tf.Variable(current_states_outputs) 
b = tf.Variable(current_actions) 
out = tf.squeeze(tf.gather_nd(a,tf.stack([tf.range(b.shape[0])[...,tf.newaxis], b[...,tf.newaxis]], axis=2))) 
sess = tf.InteractiveSession() 
tf.global_variables_initializer().run() 
sess.run(out) 

#output 
[1, 6, 8]

來源

2017-08-17 16:59:05

它是生產值錯誤說'ValueError異常：形狀必須是同等級別，但2和3 \t從合併形狀0與其他形狀。對於'stack_1'（op：'Pack'），輸入形狀爲：[100,1]，[100,1,1]。' 我嘗試了以'current_states_outputs = np.random.rand（100，3 ）'和 'current_actions = np.random.randint（0,3，（100,1））' –

上述代碼適用於您提供的示例。在你的情況下，看起來b [...，tf.newaxis]應該用b代替。 –

謝謝。用b代替'b [...，tf.newaxis]'做到了。 –

如何從另一個數組索引到張量張量流

回答

相關問題