基于支持向量机模型的声音事件识别方法的研究
2022-12-03 11:10:47
论文总字数:23761字
摘 要
研究声音识别技术的目的是使机器能够听懂人类的声音,使人类与机器能够利用声音进行交互,这是一项公认的极具市场价值和挑战性的工作。
本文对声音识别技术进行了研究,设计并实现了一个小词汇量的实时声音识别系统。在该声音识别系统中,首先求得声音信号的时域特征,然后求得LPC倒谱系数,然后进行Mel变换求得Mel倒谱系数作为声音识别的特征参数。其次对声音信号建立向量机模型(SVM),在该系统中,采用四状态无跨越左右的连续密度SVM模型作为声音识别系统的基本模型。模型训练及测试的声音库共120个声音数据。对声音识别系统的测试实验表明:平均识别率达到85.66%。最后对实验结果进行了讨论,提出了系统不足及改进方法。
关键词:声音识别;向量机模型;Mel倒谱函数
ABSTRACT
Speech recognition is a technology that makes machine understand the human language, and so human can communicate with machine by language. It is well known that this is a changleable work and has great market value.
At first, the dissertation studied the basic principles of speech recognition, and then designed and implemented a small real-time isolated word speech recognition system with 20 vocabularies. In the speech recognition system,the time domain characteristics of sound signals were contained firstly,then the LPC cepstral coefficients of the speech signal were contained , and the MEL cepstral coefficients were calculate by the MEL transformation as the representation of speech signal. Secondly, we established the hidden Markov model for speech signal via Baum-Welch training procedure. In this system, the four-state left-right and continued density SVM without jump was used as the basic model of speech recognition. The speech database for training and testing includes 120 speech,.The average recognition rate ,85.66%. At the end of this dissertation, we discuss the result of the experiment and raise the lack and the improvement method of the system.
Keywords: speech recognition, Support Vector Machine,MEL cepstral coefficients
目录
- 绪论………………………………………………………………………………1
- 声音识别的基本概念……………………………………………………………1
- 模式匹配…………………………………………………………………………1
1.2 声音识别系统的分类……………………………………………………………2
1.3 声音识别的研究进展……………………………………………………………3
- 声音事件检测原理………………………………………………………………5
2.1 声音事件识别分类方法…………………………………………………………5
2.2 声音识别原理……………………………………………………………………6
- 声音事件特征提取………………………………………………………………8
3.1 时域特征…………………………………………………………………………8
3.2 倒谱特征…………………………………………………………………………9
- 支持向量机模型…………………………………………………………………15
- 系统实现与实验结果……………………………………………………………18
5.1 语音库的建立……………………………………………………………………18
5.2 相关阈值及参数的设定…………………………………………………………18
5.3 实验结果及讨论…………………………………………………………………18
- 总结与展望………………………………………………………………………20
参考文献……………………………………………………………………………………21
谢辞…………………………………………………………………………………………22
附录…………………………………………………………………………………………23
第一章 绪论
1.1声音识别的基本概念
声音识别技术主要指对基于生理学和行为特点的说话人嗓音及其语言学模型的运用。其与语言辨认法的不同之处就是该技术无法对所讲述的单个词语自己进行识别。而是指我们通过分析语音的唯一属性,例如语音的频次,来判断和识别说话的个体。声音识别技术让我们可以从中利用人们所讲话的喉咙声音去控制他们能不会出入受到限制的地方。[1]
1.1.1预处理
预处理:即将语音音频信号声道数字化,预处理加重(在对一个语音音频信号声道进行音频取样后,插入一个一阶高频低通音频滤波器,这样只是保留出了一定余量的语音声道信号组成部分,便于此时人们对其中的语音声道组成参数和语音频率信号进行准确判断),分帧,加窗,端点音频检测(从各种不同语音音频信号中我们可以轻松确定各种不同语音的声道起始和点到终点)
1.1.2特征提取
剩余内容已隐藏,请支付后下载全文,论文总字数:23761字