保存Android股票語音識別引擎的音頻輸入

我試圖在文件中保存由android的語音識別服務收聽的音頻數據。保存Android股票語音識別引擎的音頻輸入

其實我實現RecognitionListener如下解釋： Speech to Text on Android

將數據保存到緩衝區中，如下圖所示： Capturing audio sent to Google's speech recognition server

和緩存寫入到一個WAV文件，如在這裏。 Android Record raw bytes into WAVE file for Http Streaming

我的問題是如何得到適當的音頻設置保存在wav文件的標題。事實上，當我玩wav文件只能聽到奇怪的聲音，這個參數，

short nChannels=2;// audio channels 
int sRate=44100; // Sample rate 
short bSamples = 16;// byteSample

或自認倒黴：

short nChannels=1;// audio channels 
int sRate=8000; // Sample rate 
short bSamples = 16;// byteSample

什麼困惑的是，看的語音識別任務的參數從logcat的，我覺得第一設置播放採樣率44100赫茲：

12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK PCM format to S16_LE (Signed 16 bit Little Endian) 
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Using 2 channels for PLAYBACK. 
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Set PLAYBACK sample rate to 44100 HZ 
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Buffer size: 2048 
    12-20 14:41:34.007: DEBUG/AudioHardwareALSA(2364): Latency: 46439

然後aInfo.SampleRate = 8000時，它起着發送給谷歌服務器上的文件：

12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::InitWavParser 
12-20 14:41:36.152: DEBUG/(2364): File open Succes 
12-20 14:41:36.152: DEBUG/(2364): File SEEK End Succes 
... 
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = RIFF? 
12-20 14:41:36.152: DEBUG/(2364): Data Read = RIFF? 
12-20 14:41:36.152: DEBUG/(2364): PV_Wav_Parser::ReadData 
12-20 14:41:36.152: DEBUG/(2364): Data Read buff = fmt 
... 
12-20 14:41:36.152: DEBUG/(2364): PVWAVPARSER_OK 
12-20 14:41:36.156: DEBUG/(2364): aInfo.AudioFormat = 1 
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumChannels = 1 
12-20 14:41:36.156: DEBUG/(2364): aInfo.SampleRate = 8000 
12-20 14:41:36.156: DEBUG/(2364): aInfo.ByteRate = 16000 
12-20 14:41:36.156: DEBUG/(2364): aInfo.BlockAlign = 2 
12-20 14:41:36.156: DEBUG/(2364): aInfo.BitsPerSample = 16 
12-20 14:41:36.156: DEBUG/(2364): aInfo.BytesPerSample = 2 
12-20 14:41:36.156: DEBUG/(2364): aInfo.NumSamples = 2258

所以，我怎樣才能找到合適的參數保存音頻緩衝在一個良好的WAV音頻文件？

來源

2011-12-20 mmmx

你有沒有找到你的解決方案？ – Doug 2012-03-11 23:45:14

好像你已經得到了最遠的這樣做。 mmmx，你能解決這個問題嗎？ – ComputerEngineer88 2012-03-14 07:25:18

你沒有包括你的代碼來真正寫出PCM數據，所以它很難診斷，但是如果你聽到奇怪的噪音，那麼看起來你很可能在編寫數據時出錯endian，或者頻道數量錯誤。使採樣率錯誤只會導致音頻聲音變慢或變快，但如果聽起來完全亂碼，則可能是在指定字節流的通道數或字節數時出錯。

要知道肯定，只需將您的字節直接流到沒有任何標題（原始PCM數據）的文件。這樣，您可以在編寫文件頭時排除任何錯誤。然後使用Audacity導入原始數據，試驗不同的選項（比特深度，端序，通道），直到聽到正確的音頻文件（只有一個是正確的）。您可以通過文件 - >導入 - >原始數據執行此操作...

以這種方式識別出字節格式後，您只需要擔心是否正確設置標題。您可能想要參考此參考文件http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html獲取文件格式。或者在編寫音頻文件Java - reading, manipulating and writing WAV files或FMJ上查看現有Java解決方案的以下鏈接。雖然我猜這些可能無法在Android上使用。

如果您在推出自己的WAV/RIFF作家記得Java的數據類型是big-endian所以你寫信給你的文件中的任何多字節原語必須寫在reverse byte order匹配RIFF的小字節序。

來源

2012-05-29 21:50:30

，小尾數，16位PCM，單聲道的伎倆

來源

2012-07-11 19:35:52 chandru

FWIW，以上信息適用於三星GS2的音頻 – 2013-08-29 17:34:26

在最新版本onBufferReceived不起作用，你可以使用record/save audio from voice recognition intent代替。

來源

2016-02-03 15:30:53

保存Android股票語音識別引擎的音頻輸入

回答

相關問題