在Tensorflow中嵌入特徵向量

在文本處理中，有embedding顯示（如果我正確地理解）數據庫字作爲向量（降維後）。現在，我想知道，有沒有像這樣的任何方法來顯示通過CNN提取的功能？在Tensorflow中嵌入特徵向量

例如：考慮我們有一個CNN和火車和測試集。我們想用列車集訓練CNN，同時在張量板的嵌入部分通過CNN看到提取的特徵（來自密集層）相應的類標籤。

這項工作的目的是查看每個批次中輸入數據的特徵，並瞭解它們離一起的距離有多遠或多遠。最後，在訓練好的模型中，我們可以找出分類器的準確性（如softmax等）。

非常感謝您的幫助。

來源

2017-09-06 Hajbabaei_M_R

我已經接受了Tensorflow文檔的幫助。

對於如何運行TensorBoard，並確保您記錄所有必要的信息，請參閱：TensorBoard: Visualizing Learning.

可視化你的嵌入，有三件事情你需要做的：

1 ）設置一個包含你的嵌入的2D張量。

embedding_var = tf.get_variable(....)

2）定期保存在LOG_DIR一個檢查站的模型變量。

saver = tf.train.Saver() 
saver.save(session, os.path.join(LOG_DIR, "model.ckpt"), step)

3）（可選）與嵌入關聯的元數據。

如果你有你的嵌入相關的任何元數據（標籤，圖像），你可以告訴TensorBoard它無論是在LOG_DIR直接存儲projector_config.pbtxt，或使用我們的API的Python。

例如，以下projector_config.ptxt關聯起來的元數據的word_embedding張量存儲在$ LOG_DIR/metadata.tsv：

embeddings { 
    tensor_name: 'word_embedding' 
    metadata_path: '$LOG_DIR/metadata.tsv' 
}

相同的配置可以編程方式使用下面的代碼段來生產：

from tensorflow.contrib.tensorboard.plugins import projector 

# Create randomly initialized embedding weights which will be trained. 
vocabulary_size = 10000 
embedding_size = 200 
embedding_var = tf.get_variable('word_embedding', [vocabulary_size, 
embedding_size]) 

# Format: tensorflow/tensorboard/plugins/projector/projector_config.proto 
config = projector.ProjectorConfig() 

# You can add multiple embeddings. Here we add only one. 
embedding = config.embeddings.add() 
embedding.tensor_name = embedding_var.name 
# Link this tensor to its metadata file (e.g. labels). 
embedding.metadata_path = os.path.join(LOG_DIR, 'metadata.tsv') 

#Use the same LOG_DIR where you stored your checkpoint. 
summary_writer = tf.summary.FileWriter(LOG_DIR) 

# The next line writes a projector_config.pbtxt in the LOG_DIR. TensorBoard will 
# read this file during startup. 
projector.visualize_embeddings(summary_writer, config)

來源

2017-09-06 07:36:12

在Tensorflow中嵌入特徵向量

回答

相關問題