tf.SequenceExample與多維數組

在Tensorflow中，我想保存一個多維數組到TFRecord。例如：tf.SequenceExample與多維數組

[[1, 2, 3], [1, 2], [3, 2, 1]]

由於我試圖解決的任務是連續的，我想使用Tensorflow的tf.train.SequenceExample()和寫入數據時，我成功地在數據寫入TFRecord文件。但是，當我嘗試加載使用tf.parse_single_sequence_example的TFRecord文件中的數據，我招呼着大量神祕的錯誤：

W tensorflow/core/framework/op_kernel.cc:936] Invalid argument: Name: , Key: input_characters, Index: 1. Number of int64 values != expected. values size: 6 but output shape: [] 
E tensorflow/core/client/tensor_c_api.cc:485] Name: , Key: input_characters, Index: 1. Number of int64 values != expected. values size: 6 but output shape: []

我使用的嘗試加載我的數據是下面的功能：

def read_and_decode_single_example(filename): 

    filename_queue = tf.train.string_input_producer([filename], 
               num_epochs=None) 

    reader = tf.TFRecordReader() 
    _, serialized_example = reader.read(filename_queue) 

    context_features = { 
     "length": tf.FixedLenFeature([], dtype=tf.int64) 
    } 

    sequence_features = { 
     "input_characters": tf.FixedLenSequenceFeature([],   dtype=tf.int64), 
     "output_characters": tf.FixedLenSequenceFeature([], dtype=tf.int64) 
    } 

    context_parsed, sequence_parsed = tf.parse_single_sequence_example(
    serialized=serialized_example, 
    context_features=context_features, 
    sequence_features=sequence_features 
) 

context = tf.contrib.learn.run_n(context_parsed, n=1, feed_dict=None) 
print context

，我使用保存數據的功能是在這裏：

# http://www.wildml.com/2016/08/rnns-in-tensorflow-a-practical-guide-and-undocumented-features/ 
def make_example(input_sequence, output_sequence): 
    """ 
    Makes a single example from Python lists that follows the 
    format of tf.train.SequenceExample. 
    """ 

    example_sequence = tf.train.SequenceExample() 

    # 3D length 
    sequence_length = sum([len(word) for word in input_sequence]) 
    example_sequence.context.feature["length"].int64_list.value.append(sequence_length) 

    input_characters = example_sequence.feature_lists.feature_list["input_characters"] 
    output_characters = example_sequence.feature_lists.feature_list["output_characters"] 

    for input_character, output_character in izip_longest(input_sequence, 
                  output_sequence): 

     # Extend seems to work, therefore it replaces append. 
     if input_sequence is not None: 
      input_characters.feature.add().int64_list.value.extend(input_character) 

     if output_characters is not None: 
      output_characters.feature.add().int64_list.value.extend(output_character) 

    return example_sequence

任何幫助將受到歡迎。

來源

2016-09-16 Torkoal

嗨，你能提供更多的上下文嗎？最好提供一個可以實際運行和測試的最小示例，包括如何將數據保存到文件的步驟。 – jlarsch

您的示例非常難以遵循，如果編輯示例以包含相關上下文，則會獲得更多幫助。例如 - 查看您在代碼中添加註釋的鏈接，很明顯，您生成序列示例的片段不包含實際寫入數據的代碼。 –

使用所提供的代碼，我無法重現您的錯誤，但是通過一些有教育意義的猜測給出了以下工作代碼。

import tensorflow as tf 
import numpy as np 
import tempfile 

tmp_filename = 'tf.tmp' 

sequences = [[1, 2, 3], [1, 2], [3, 2, 1]] 
label_sequences = [[0, 1, 0], [1, 0], [1, 1, 1]] 

def make_example(input_sequence, output_sequence): 
    """ 
    Makes a single example from Python lists that follows the 
    format of tf.train.SequenceExample. 
    """ 

    example_sequence = tf.train.SequenceExample() 

    # 3D length 
    sequence_length = len(input_sequence) 

    example_sequence.context.feature["length"].int64_list.value.append(sequence_length) 

    input_characters = example_sequence.feature_lists.feature_list["input_characters"] 
    output_characters = example_sequence.feature_lists.feature_list["output_characters"] 

    for input_character, output_character in zip(input_sequence, 
                  output_sequence): 

     if input_sequence is not None: 
      input_characters.feature.add().int64_list.value.append(input_character) 

     if output_characters is not None: 
      output_characters.feature.add().int64_list.value.append(output_character) 

    return example_sequence 

# Write all examples into a TFRecords file 
def save_tf(filename): 
    with open(filename, 'w') as fp: 
     writer = tf.python_io.TFRecordWriter(fp.name) 
     for sequence, label_sequence in zip(sequences, label_sequences): 
      ex = make_example(sequence, label_sequence) 
      writer.write(ex.SerializeToString()) 
     writer.close() 

def read_and_decode_single_example(filename): 

    filename_queue = tf.train.string_input_producer([filename], 
               num_epochs=None) 

    reader = tf.TFRecordReader() 
    _, serialized_example = reader.read(filename_queue) 

    context_features = { 
     "length": tf.FixedLenFeature([], dtype=tf.int64) 
    } 

    sequence_features = { 
     "input_characters": tf.FixedLenSequenceFeature([], dtype=tf.int64), 
     "output_characters": tf.FixedLenSequenceFeature([], dtype=tf.int64) 
    } 


    return serialized_example, context_features, sequence_features 

save_tf(tmp_filename) 
ex,context_features,sequence_features = read_and_decode_single_example(tmp_filename) 
context_parsed, sequence_parsed = tf.parse_single_sequence_example(
    serialized=ex, 
    context_features=context_features, 
    sequence_features=sequence_features 
) 

sequence = tf.contrib.learn.run_n(sequence_parsed, n=1, feed_dict=None) 
#check if the saved data matches the input data 
print(sequences[0] in sequence[0]['input_characters'])

所需的改變是：

sequence_length = sum([len(word) for word in input_sequence])到sequence_length = len(input_sequence)

否則它不爲您的示例數據工作

extend改爲append

來源

2016-09-24 22:59:29

嘗試這些更改時，我收到錯誤：'TypeError：[37]的類型爲，但預計爲：（,）''。 – Torkoal

我想我會看到這個問題，'[[1,2,3]，[1,2]，[3,2,1]]'意味着一個序列不是很多。 – Torkoal

您是否在答案中嘗試了這個片段？運行時沒有出現任何錯誤（Ubuntu，python3.4，沒有GPU的TF）。你的輸入數據看起來和問題完全一樣嗎？ –

我有同樣的問題。我認爲這是完全可以解決的，但是你必須決定輸出格式，然後弄清楚你將如何使用它。

第一個你的錯誤是什麼？

錯誤消息告訴您，您嘗試讀取的內容不符合您指定的功能大小。那麼你在哪裏指定它？就在這裏：

sequence_features = { 
    "input_characters": tf.FixedLenSequenceFeature([], dtype=tf.int64), 
    "output_characters": tf.FixedLenSequenceFeature([], dtype=tf.int64) 
}

這是說「我input_characters是單值的序列」，但事實卻並非如此;你所擁有的是一系列單值序列，因此是一個錯誤。

秒你能做什麼？

如果改用：

a = [[1,2,3], [2,3,1], [3,2,1]] 
sequence_features = { 
    "input_characters": tf.FixedLenSequenceFeature([3], dtype=tf.int64), 
    "output_characters": tf.FixedLenSequenceFeature([3], dtype=tf.int64) 
}

，因爲你已經指定了頂級序列的每個元素是3元久你會不會有你的代碼中的錯誤。

或者，如果您沒有固定長度的序列，那麼您將不得不使用不同類型的功能。

sequence_features = { 
    "input_characters": tf.VarLenFeature(tf.int64), 
    "output_characters": tf.VarLenFeature(tf.int64) 
}

VarLenFeature告訴它在讀取之前長度是未知的。不幸的是，這意味着您的input_characters不能再作爲一個密集的向量讀取。相反，它將默認爲SparseTensor。你可以用tf.sparse_tensor_to_dense例如變成一個緻密的張量：

input_densified = tf.sparse_tensor_to_dense(sequence_parsed['input_characters'])

正如the article，你一直在尋找提到的，如果你的數據並不總是具有相同的長度，你必須有一個「not_really_a_word」詞彙在你的詞彙中，你用它作爲默認索引。例如讓我們說你有索引0映射到「not_really_a_word」字，然後用你的

a = [[1,2,3], [2,3], [3,2,1]]

蟒蛇名單將最終會被

array((1,2,3), (2,3,0), (3,2,1))

張量。

被警告;我不確定後向傳播對於SparseTensors「正常工作」，就像它對於密集張量一樣。 wildml article討論每個序列的填充0s，以掩蓋「not_actually_a_word」單詞的損失（請參閱：文章中的「邊注：在你的語氣/類中應該注意0）」。這似乎表明，第一種方法將更容易實施。

請注意，這與此處所述的情況不同，其中每個示例都是一系列序列。就我的理解而言，這種方法得不到很好的支持的原因是因爲這是對這種情況的支持;直接加載固定大小的嵌入。

我會假設接下來要做的事情就是將這些數字轉換爲文字嵌入。您可以將索引列表轉換爲嵌入列表tf.nn.embedding_lookup

來源

2017-04-28 05:55:33 Multihunter

tf.SequenceExample與多維數組

回答

相關問題