經過很多工作,我終於得到了Microsoft.SpeechRecognitionEngine
接受WAVE音頻流。過程如下:
在Pi上,我運行了ffmpeg。我流使用此命令
ffmpeg -ac 1 -f alsa -i hw:1,0 -ar 16000 -acodec pcm_s16le -f rtp rtp://XXX.XXX.XXX.XXX:1234
在服務器端的聲音,我創建了一個UDPClient
和偵聽端口1234我收到一個單獨的線程包。首先,我剝離RTP頭(header format explained here)並將有效載荷寫入特殊流。爲了使SpeechRecognitionEngine正常工作,我必須使用SpeechStreamer
類described in Sean's response。它不符合標準Memory Stream
。
我必須在語音識別方面做的唯一事情是將輸入設置爲音頻流而不是默認音頻設備。
recognizer.SetInputToAudioStream(rtpClient.AudioStream,
new SpeechAudioFormatInfo(WAVFile.SAMPLE_RATE, AudioBitsPerSample.Sixteen, AudioChannel.Mono));
我沒有做廣泛的測試就可以了(即讓它流了幾天,看它是否仍然有效),但我能救關閉音頻樣本中的SpeechRecognized
而且聲音很大。我正在使用16 KHz的採樣率。我可能會將其降至8 KHz以減少數據傳輸量,但一旦出現問題,我會擔心這一點。
我還應該提到,反應速度非常快。我可以說一整句話,並在不到一秒鐘內得到答覆。 RTP連接似乎爲這個過程增加了很少的開銷。我將不得不嘗試一個基準,並將其與僅使用MIC輸入進行比較。
編輯:這是我的RTPClient類。
/// <summary>
/// Connects to an RTP stream and listens for data
/// </summary>
public class RTPClient
{
private const int AUDIO_BUFFER_SIZE = 65536;
private UdpClient client;
private IPEndPoint endPoint;
private SpeechStreamer audioStream;
private bool writeHeaderToConsole = false;
private bool listening = false;
private int port;
private Thread listenerThread;
/// <summary>
/// Returns a reference to the audio stream
/// </summary>
public SpeechStreamer AudioStream
{
get { return audioStream; }
}
/// <summary>
/// Gets whether the client is listening for packets
/// </summary>
public bool Listening
{
get { return listening; }
}
/// <summary>
/// Gets the port the RTP client is listening on
/// </summary>
public int Port
{
get { return port; }
}
/// <summary>
/// RTP Client for receiving an RTP stream containing a WAVE audio stream
/// </summary>
/// <param name="port">The port to listen on</param>
public RTPClient(int port)
{
Console.WriteLine(" [RTPClient] Loading...");
this.port = port;
// Initialize the audio stream that will hold the data
audioStream = new SpeechStreamer(AUDIO_BUFFER_SIZE);
Console.WriteLine(" Done");
}
/// <summary>
/// Creates a connection to the RTP stream
/// </summary>
public void StartClient()
{
// Create new UDP client. The IP end point tells us which IP is sending the data
client = new UdpClient(port);
endPoint = new IPEndPoint(IPAddress.Any, port);
listening = true;
listenerThread = new Thread(ReceiveCallback);
listenerThread.Start();
Console.WriteLine(" [RTPClient] Listening for packets on port " + port + "...");
}
/// <summary>
/// Tells the UDP client to stop listening for packets.
/// </summary>
public void StopClient()
{
// Set the boolean to false to stop the asynchronous packet receiving
listening = false;
Console.WriteLine(" [RTPClient] Stopped listening on port " + port);
}
/// <summary>
/// Handles the receiving of UDP packets from the RTP stream
/// </summary>
/// <param name="ar">Contains packet data</param>
private void ReceiveCallback()
{
// Begin looking for the next packet
while (listening)
{
// Receive packet
byte[] packet = client.Receive(ref endPoint);
// Decode the header of the packet
int version = GetRTPHeaderValue(packet, 0, 1);
int padding = GetRTPHeaderValue(packet, 2, 2);
int extension = GetRTPHeaderValue(packet, 3, 3);
int csrcCount = GetRTPHeaderValue(packet, 4, 7);
int marker = GetRTPHeaderValue(packet, 8, 8);
int payloadType = GetRTPHeaderValue(packet, 9, 15);
int sequenceNum = GetRTPHeaderValue(packet, 16, 31);
int timestamp = GetRTPHeaderValue(packet, 32, 63);
int ssrcId = GetRTPHeaderValue(packet, 64, 95);
if (writeHeaderToConsole)
{
Console.WriteLine("{0} {1} {2} {3} {4} {5} {6} {7} {8}",
version,
padding,
extension,
csrcCount,
marker,
payloadType,
sequenceNum,
timestamp,
ssrcId);
}
// Write the packet to the audio stream
audioStream.Write(packet, 12, packet.Length - 12);
}
}
/// <summary>
/// Grabs a value from the RTP header in Big-Endian format
/// </summary>
/// <param name="packet">The RTP packet</param>
/// <param name="startBit">Start bit of the data value</param>
/// <param name="endBit">End bit of the data value</param>
/// <returns>The value</returns>
private int GetRTPHeaderValue(byte[] packet, int startBit, int endBit)
{
int result = 0;
// Number of bits in value
int length = endBit - startBit + 1;
// Values in RTP header are big endian, so need to do these conversions
for (int i = startBit; i <= endBit; i++)
{
int byteIndex = i/8;
int bitShift = 7 - (i % 8);
result += ((packet[byteIndex] >> bitShift) & 1) * (int)Math.Pow(2, length - i + startBit - 1);
}
return result;
}
}
沒有必要通過互聯網發送音頻。您可以使用脫機語音識別引擎[CMUSphinx](http://cmusphinx.sourceforge.net)存檔即時響應。 CMUSphinx引擎的精確度與微軟引擎並無太大區別,它完全可以在Raspberry Pi本身上運行。 – 2013-04-08 19:54:40
我使用的是高質量的語音合成引擎,只能在Windows上運行,所以不幸的是我無法在Pi上做所有的事情。人工智能也將執行比Pi可以處理的計算密集型任務。 – dgreenheck 2013-04-08 20:15:34