解讀WAV數據

我正在嘗試編寫一個程序來顯示PCM數據。我一直非常沮喪地試圖找到一個具有正確抽象級別的庫，但是我找到了Python波形庫並且一直在使用它。但是，我不確定如何解釋數據。解讀WAV數據

wave.getparams函數返回（2個通道，2個字節，44100 Hz，96333幀，無壓縮，無壓縮）。這一切似乎很愉快，但後來我嘗試打印一個單幀：'\ xc0 \ xff \ xd0 \ xff'這是4個字節。我想可能有一幀是2個樣本，但是模糊不會在那裏結束。

96333幀* 2個樣品/幀*（1/44.1K秒/樣品）=4.3688秒

然而，iTunes的報告時間接近2秒，並計算基於文件大小和比特率是在2.7秒的球場。這裏發生了什麼？

此外，我怎麼知道字節是否有符號或無符號？

非常感謝！

來源

2010-02-09 SapphireSun

「雙通道」是指音響，所以它是沒有意義的總和每個通道的時間 - 讓你通過兩個（2.18秒，沒有4.37）的一個因素是關閉。至於符號類型，例如here解釋，我引用：

8位採樣存儲爲無符號字節，其範圍從0到255的16位樣本被存儲爲二進制補碼簽署整數，從-32768 到32767

這是WAV格式（實際上是它的超RIFF）的規範的一部分，因此不依賴於你使用的處理WAV文件庫什麼。

來源

2010-02-09 05:15:36

謝謝！我只能希望這是我的睡眠不足導致我不注意立體聲號碼;-) – SapphireSun 2010-02-09 05:20:00

每個樣本爲16個比特和有2個通道，所以該幀需要4個字節

來源

2010-02-09 05:17:05

的持續時間僅僅是由每秒的幀數除以幀的數量。根據您的數據，這是：96333/44100 = 2.18 seconds。

來源

2010-02-09 05:21:32 mhawke

我知道一個答案已經被接受了，但是我前段時間做了一些與音頻有關的事情，並且你必須解開波浪做這樣的事情。

pcmdata = wave.struct.unpack("%dh"%(wavedatalength),wavedata)

此外，我使用的一個軟件包被稱爲PyAudio，但我仍然必須使用它的波包。

來源

2010-02-09 05:52:49

謝謝你的幫助！我得到它的工作，我會在這裏發佈的解決方案給大家的情況下，使用一些其他可憐的人需要它：

import wave 
import struct 

def pcm_channels(wave_file): 
    """Given a file-like object or file path representing a wave file, 
    decompose it into its constituent PCM data streams. 

    Input: A file like object or file path 
    Output: A list of lists of integers representing the PCM coded data stream channels 
     and the sample rate of the channels (mixed rate channels not supported) 
    """ 
    stream = wave.open(wave_file,"rb") 

    num_channels = stream.getnchannels() 
    sample_rate = stream.getframerate() 
    sample_width = stream.getsampwidth() 
    num_frames = stream.getnframes() 

    raw_data = stream.readframes(num_frames) # Returns byte data 
    stream.close() 

    total_samples = num_frames * num_channels 

    if sample_width == 1: 
     fmt = "%iB" % total_samples # read unsigned chars 
    elif sample_width == 2: 
     fmt = "%ih" % total_samples # read signed 2 byte shorts 
    else: 
     raise ValueError("Only supports 8 and 16 bit audio formats.") 

    integer_data = struct.unpack(fmt, raw_data) 
    del raw_data # Keep memory tidy (who knows how big it might be) 

    channels = [ [] for time in range(num_channels) ] 

    for index, value in enumerate(integer_data): 
     bucket = index % num_channels 
     channels[bucket].append(value) 

    return channels, sample_rate

來源

2010-02-09 06:18:05 SapphireSun

大廈在this answer，你可以通過使用numpy.fromstring或numpy.fromfile一個良好的性能提升。另請參閱this answer。

這裏是我做過什麼：

def interpret_wav(raw_bytes, n_frames, n_channels, sample_width, interleaved = True): 

    if sample_width == 1: 
     dtype = np.uint8 # unsigned char 
    elif sample_width == 2: 
     dtype = np.int16 # signed 2-byte short 
    else: 
     raise ValueError("Only supports 8 and 16 bit audio formats.") 

    channels = np.fromstring(raw_bytes, dtype=dtype) 

    if interleaved: 
     # channels are interleaved, i.e. sample N of channel M follows sample N of channel M-1 in raw data 
     channels.shape = (n_frames, n_channels) 
     channels = channels.T 
    else: 
     # channels are not interleaved. All samples from channel M occur before all samples from channel M-1 
     channels.shape = (n_channels, n_frames) 

    return channels

分配一個新的價值塑造，如果它需要在內存中拷貝數據將拋出一個錯誤。這是一件好事，因爲你想要使用這些數據（使用更少的時間和內存）。如果可能的話，ndarray.T函數也不會複製（即返回視圖），但我不確定如何確保它不會複製。

直接從文件中讀取np.fromfile將會更好，但是您必須使用自定義dtype跳過標題。我還沒有嘗試過。

來源

2015-07-25 11:16:16 rudolfbyker

解讀WAV數據

回答

相關問題