正確使用張量流STFT函數

我正在嘗試構建一個類似於使用Audacity創建的音頻樣本的繪圖譜。從無畏的wiki頁面，該頻譜圖（附例）執行：正確使用張量流STFT函數

頻譜圖取音頻「大小」的樣本塊，莫非 FFT和所有塊平均值在一起。

我想我會用最近Tensorflow提供的STFT功能。

我使用大小爲512的音頻塊，和我的代碼如下：

audio_binary = tf.read_file(audio_file) 
waveform = tf.contrib.ffmpeg.decode_audio(
    audio_binary, 
    file_format="wav", 
    samples_per_second=4000, 
    channel_count=1 
) 

stft = tf.contrib.signal.stft(
    waveform, 
    512,  # frame_length 
    512,  # frame_step 
    fft_length=512, 
    window_fn=functools.partial(tf.contrib.signal.hann_window, periodic=True), # matches audacity 
    pad_end=True, 
    name="STFT" 
)

但STFT的結果僅僅是一個空數組時，我希望每一幀的FFT結果（512樣品）

我打這個電話的方式有什麼問題？

我已驗證波形音頻數據正在正常讀取，只有正常的tf.fft函數。

來源

2017-08-27 TheBottleSeller

我想通了......'tf.contrib.signal.stft'每個都有自己的信號數據。所以它將是形式（信號，signal_data）。 'tf.contrib.ffmpeg.decode_audio'返回表單（signal_data，1）的單個信號的數據。所以我需要調換'波形' – TheBottleSeller

audio_file = tf.placeholder(tf.string) 

audio_binary = tf.read_file(audio_file) 
waveform = tf.contrib.ffmpeg.decode_audio(
    audio_binary, 
    file_format="wav", 
    samples_per_second=sample_rate, # Get Info on .wav files (sample rate) 
    channel_count=1    # Get Info on .wav files (audio channels) 
) 

stft = tf.contrib.signal.stft(
    tf.transpose(waveform), 
    frame_length,  # frame_lenght, hmmm 
    frame_step,  # frame_step, more hmms 
    fft_length=fft_length, 
    window_fn=functools.partial(tf.contrib.signal.hann_window, 
      periodic=False), # matches audacity 
    pad_end=False, 
    name="STFT" 
)

來源

2017-10-31 13:07:03

正確使用張量流STFT函數

回答

相關問題