Unity语音唤醒与识别
在Unity中实现语音唤醒与识别功能
先描述一下该模块的实现方式:通过监听麦克风音量获取实时音量,设置一个音量阈值,当音量超过阈值后开始记录从麦克风获取的音频数据,在获取的数据前加上PCM文件头,然后直接上传给阿里识别平台,获取识别结果。
- 初始化麦克风
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19private void InitMicphone()
{
if (Microphone.devices.Length > 0)
{
curDevice = Microphone.devices[0];
Debug.Log("当前Mic设备 : " + curDevice);
if (Microphone.devices[0].IsNormalized())
{
micRecords = Microphone.Start(curDevice, true, 60, 16000);
Debug.Log("Mic start!");
}
}
else
{
Debug.Log("Mic missing!");
return;
}
} - 获取实时音量volumeData数组的长度取决于你想要识别的频率,越小识别频率越高,当然性能损耗也越高。相当于吧这个数组内的数据看做“一帧”,识别这一帧的最大音量作为实时音量返回。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23private float GetMaxVolume()
{
float maxVolume = 0f;
float[] volumeData = new float[128];
int offset = Microphone.GetPosition(null) - 128 + 1;
if (offset < 0)
{
return 0f;
}
micRecords.GetData(volumeData, offset);
for (int i = 0; i < 128; i++)
{
float tempMax = volumeData[i];
if (maxVolume < tempMax)
{
maxVolume = tempMax;
}
}
return maxVolume;
} - 写PCM文件头这里写进去的内容都是上一篇文章中提到的PCM文件头相关的信息,具体可以参见上一篇PCM文件头解析。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47private void WriteHeader(AudioClip clip)
{
int hz = clip.frequency;
int channels = clip.channels;
int samples = clip.samples;
bytesList.Clear();
Byte[] riff = System.Text.Encoding.UTF8.GetBytes("RIFF");
bytesList.AddRange(riff);
Byte[] chunkSize = BitConverter.GetBytes(36); // 44 - 8
bytesList.AddRange(chunkSize);
Byte[] wave = System.Text.Encoding.UTF8.GetBytes("WAVE");
bytesList.AddRange(wave);
Byte[] fmt = System.Text.Encoding.UTF8.GetBytes("fmt ");
bytesList.AddRange(fmt);
Byte[] subChunk = BitConverter.GetBytes(16);
bytesList.AddRange(subChunk);
Byte[] audioFormat = BitConverter.GetBytes(1);
bytesList.AddRange(audioFormat);
Byte[] channel = BitConverter.GetBytes(channels);
bytesList.AddRange(channel);
Byte[] sampleRate = BitConverter.GetBytes(hz);
bytesList.AddRange(sampleRate);
Byte[] byteRate = BitConverter.GetBytes(hz * channels * 2);
bytesList.AddRange(byteRate);
Byte[] blockAlign = BitConverter.GetBytes(channels * 2);
bytesList.AddRange(blockAlign);
Byte[] bitsPerSample = BitConverter.GetBytes(16);
bytesList.AddRange(bitsPerSample);
Byte[] data = System.Text.Encoding.UTF8.GetBytes("data");
bytesList.AddRange(data);
Byte[] subChunk2 = BitConverter.GetBytes(samples * channels * 2);
bytesList.AddRange(subChunk2);
} - 写音频数据体传进来的 AudioClip 为达到阈值后录下来的音频数据,将其转换为二进制数据与文件头拼接,就可以得到完整的音频数据。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22private void WriteBody(AudioClip clip)
{
float[] samples = new float[clip.samples];
clip.GetData(samples, 0);
Int16[] intData = new Int16[samples.Length];
Byte[] bytesData = new Byte[samples.Length * 2];
int rescaleFactor = 32767; //to convert float to Int16
for (int i = 0; i < samples.Length; i++)
{
intData[i] = (short)(samples[i] * rescaleFactor);
Byte[] byteArr = new Byte[2];
byteArr = BitConverter.GetBytes(intData[i]);
byteArr.CopyTo(bytesData, i * 2);
}
bytesList.AddRange(bytesData);
} - 获取音频数据获取数据并包装成完整音频数据然后上传识别平台。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15private void GetSpeechData()
{
if (Microphone.GetPosition(null) - curPosition < responseTime)
{
return;
}
float[] volumeData = new float[responseTime];
micRecords.GetData(volumeData, (int)curPosition);
WriteHeader(micRecords);
WriteBody(micRecords);
SpeechRecognizer.GetSpeechResult("pcm", 16000, bytesList.ToArray());
isRecording = false;
}
SpeechRecognizer.GetSpeechResult 函数就是阿里音频识别相关的内容了,因为接口会变,所以具体如何写参见阿里对应的文档吧。
Unity语音唤醒与识别
https://baifabaiquan.cn/2023/02/21/Unity语音唤醒与识别/