AudioSet和Tensorflow瞭解

隨着AudioSet發佈和提供的研究爲那些誰的研究做聲音分析一個全新的領域，我一直都試圖挖掘有關如何分析和解碼這些數據深這幾天。AudioSet和Tensorflow瞭解

數據在.tfrecord文件，繼承人一個小片斷給出。

�^E^@^@^@^@^@^@C�bd 
u 
^[ 
^Hvideo_id^R^O 

^KZZcwENgmOL0 
^^ 
^Rstart_time_seconds^R^H^R^F 
^D^@^@�C 
^X 
^Flabels^R^N^Z^L 

�^B�^B�^B�^B�^B 
^\ 
^Pend_time_seconds^R^H^R^F 
^D^@^@�C^R� 

� 

^Oaudio_embedding^R� 

�^A 
�^A 
�^A3�^] q^@�Z�r�����w���Q����.���^@�b�{m�^@P^@^S����,^]�x�����:^@����^@^@^Z0��^@]^Gr?v(^@^U^@��^EZ6�$ 
�^A

給出的例子proto是：

context: { 
    feature: { 
    key : "video_id" 
    value: { 
     bytes_list: { 
     value: [YouTube video id string] 
     } 
    } 
    } 
    feature: { 
    key : "start_time_seconds" 
    value: { 
     float_list: { 
     value: 6.0 
     } 
    } 
    } 
    feature: { 
    key : "end_time_seconds" 
    value: { 
     float_list: { 
     value: 16.0 
     } 
    } 
    } 
    feature: { 
    key : "labels" 
     value: { 
     int64_list: { 
      value: [1, 522, 11, 172] # The meaning of the labels can be found here. 
     } 
     } 
    } 
} 
feature_lists: { 
    feature_list: { 
    key : "audio_embedding" 
    value: { 
     feature: { 
     bytes_list: { 
      value: [128 8bit quantized features] 
     } 
     } 
     feature: { 
     bytes_list: { 
      value: [128 8bit quantized features] 
     } 
     } 
    } 
    ... # Repeated for every second of the segment 
    } 

}

我很直接的問題在這裏 - 這是我似乎無法找到很好的信息 - 我該如何轉換乾淨兩者之間？

如果我有一臺機器可讀的文件，如何使人類可讀的，以及周圍的其他方法。

我發現this其拍照的tfrecord並將其轉換爲可讀的格式...但我似乎無法得到它與AudioSet

來源

2017-03-09 Zach

工作的形式AudioSet數據不張量流。例如protobuf，就像您鏈接的圖像示例一樣。這是一個序列示例。

我還沒有測試過，但如果您用tf.parse_single_sequence_example（並替換字段名稱）替換tf.parse_single_example，則應該能夠使用您鏈接的代碼。

來源

2017-03-09 22:48:17

謝謝你 - 這讓我很遠，但是現在「功能」返回一個指針，並試圖把它打印出來給：類型錯誤：預期二進制或Unicode字符串，得到{「標籤」：， 'start_time_seconds'：， 'VIDEO_ID'：， 'end_time_seconds'：} – Zach

，將返回張量的字典。您現在可以在您的計算圖表中使用它們，例如'one_video_id = sess.run（features ['video_id']）'。或者開始用'tf.train.shuffle_batch'對它們進行批處理。更多關於圖表執行的細節請看這裏：https://www.tensorflow.org/programmers_guide/faq#building_a_tensorflow_graph –

YouTube-8M starter code應該與AudioSet tfrecord文件一起使用。

來源

2017-03-10 14:22:27

當然 - 我已經開始運行了...問題是我需要獨立驗證並通過數據自己運行。這包括可視化和觀察實際數據。 – Zach

這是我迄今所做的。 prepare_serialized_examples來自youtube-8m starter code。我希望幫助:)

import tensorflow as tf 

feature_names = 'audio_embedding' 

def prepare_serialized_examples(serialized_example,max_quantized_value=2, min_quantized_value=-2): 

contexts, features = tf.parse_single_sequence_example(
     serialized_example, 
     context_features={"video_id": tf.FixedLenFeature([], tf.string), 
          "labels": tf.VarLenFeature(tf.int64)}, 
     sequence_features={'audio_embedding' : tf.FixedLenSequenceFeature([10], dtype=tf.string) 
    }) 

decoded_features = tf.reshape(
    tf.cast(tf.decode_raw(features['audio_embedding'], tf.uint8), tf.float32), 
    [-1, 128]) 

return contexts, features 


filename = '/audioset_v1_embeddings/bal_train/2a.tfrecord' 
filename_queue = tf.train.string_input_producer([filename], num_epochs=1) 

reader = tf.TFRecordReader() 

with tf.Session() as sess: 

    _, serialized_example = reader.read(filename_queue) 
    context, features = prepare_serialized_examples_(serialized_example) 

    init_op = tf.initialize_all_variables() 
    sess.run(init_op) 

    coord = tf.train.Coordinator() 
    threads = tf.train.start_queue_runners(coord=coord) 

    print(sess.run(features)) 

    coord.request_stop() 
    coord.join(threads)

來源

2017-03-13 09:24:36 BitWhyz

好的。所以我們有上下文和特徵..這些對象的類型是什麼？如何以人類可讀的方式打印它們？ – Zach

@ Zach花費的時間比預期的要長。對於那個很抱歉。我已經更新了我的答案。在'prepare_serialized_examples_'裏面的解碼被用來讀取浮點數而不是二進制值。正如你可能已經知道你的圖應該在會話中運行才能被計算。 – BitWhyz

你使用reader.read（）的庫是什麼？ – jerpint

這個工作對我來說，存儲feat_audio的功能。繪製它們，將它們轉換成一個ndarray並相應地重新塑造它們。

audio_record = '/audioset_v1_embeddings/eval/_1.tfrecord' 
vid_ids = [] 
labels = [] 
start_time_seconds = [] # in secondes 
end_time_seconds = [] 
feat_audio = [] 
count = 0 
for example in tf.python_io.tf_record_iterator(audio_record): 
    tf_example = tf.train.Example.FromString(example) 
    #print(tf_example) 
    vid_ids.append(tf_example.features.feature['video_id'].bytes_list.value[0].decode(encoding='UTF-8')) 
    labels.append(tf_example.features.feature['labels'].int64_list.value) 
    start_time_seconds.append(tf_example.features.feature['start_time_seconds'].float_list.value) 
    end_time_seconds.append(tf_example.features.feature['end_time_seconds'].float_list.value) 

    tf_seq_example = tf.train.SequenceExample.FromString(example) 
    n_frames = len(tf_seq_example.feature_lists.feature_list['audio_embedding'].feature) 

    sess = tf.InteractiveSession() 
    rgb_frame = [] 
    audio_frame = [] 
    # iterate through frames 
    for i in range(n_frames): 
     audio_frame.append(tf.cast(tf.decode_raw(
       tf_seq_example.feature_lists.feature_list['audio_embedding'].feature[i].bytes_list.value[0],tf.uint8) 
         ,tf.float32).eval()) 

    sess.close() 
    feat_audio.append([]) 

    feat_audio[count].append(audio_frame) 
    count+=1

來源

2017-05-19 19:48:46 jerpint

AudioSet和Tensorflow瞭解

回答

相關問題