语音端点检测算法研究与仿真实现
2022-11-22 10:07:43
论文总字数:17408字
摘 要
语音端点检测技术(Voice Activity Detection,VAD)起源于20世纪50/60年代,由贝尔实验室提出并且经历了快速的发展,它是本文论述的中心点,它常常作为语音识别系统的重要基础,是所有语音识别系统的根本所在。本文所利用的双门限法,是一种语音端点检测的常用方法,它的优点有许多,例如不仅可以在高信噪比的环境下尽量减小误差,而且有即使在低信噪比的环境下,也可以保有一定的准确率的特性、拥有较强的自适应能力、抗干扰能力、可操作性等。双门限法的原理是利用短时能量与短时过零率找出阈值并将每一帧语音与计算后所得的该语音的阈值进行对比。短时能量与短时过零率是基于时域的特征端点检测的算法的基础。本文讲述了短时能量、短时过零率和双门限法等的概念,并运用这些概念为后续实验做准备。在本实验中,我们主要实现了在一段语音中,用双门限法检测其语音端点,检测步骤分为三步:1)利用公式求得语音片段中较高的那个阈值,测得其浊音部分2)计算其语音片段中较低的阈值3)计算其短时过零率,找到清音部分,并且测得最终语音端点。根据该算法,我们成功在一段音频中找到了语音端点并对其进行了标记并对其准确率、优缺点等进行了分析。
关键词:语音端点检测;双门限法;短时能量;短时过零率;特征端点;时域;特征点;Python
Speech endpoint detection based on double threshold method
Abstract
Voice Activity Detection (VAD) originated in the 1950s and 1960s, was proposed by bell LABS and experienced rapid development. It is the central point of this paper. It is often used as an important basis for speech recognition systems and is the root of all speech recognition systems. In this paper, by using double threshold method, the commonly used method is a kind of speech endpoint detection, and its advantages, there are many, such as can not only in high SNR environment to minimize errors, and there are even in low SNR environment, also can keep the characteristics of a certain accuracy, have strong adaptive capacity, anti-interference ability, maneuverability, etc. The principle of the two-threshold method is to find the threshold value by using the short-time energy and short-time zero-crossing rate and compare each frame speech with the threshold value of the speech obtained after calculation. Short - term energy and short - term zero - crossing rate are the basis of feature - end - point detection algorithm. In this paper, the concepts of short - term energy, short - term zero - crossing rate and double - threshold method are introduced. In this experiment, we mainly realized in a voice, with double threshold method to detect the speech endpoint, test steps are divided into three steps: 1) using the formula of speech segments in the higher the threshold, measured its dullness in part 2) calculating the voice clips lower threshold 3) calculate the short-time zero crossing ratio, find surd part, and measured the final speech endpoint. According to the algorithm, we successfully found the speech endpoint in a piece of audio, marked it, and analyzed its accuracy, advantages and disadvantages.
Key words:double threshold method,short-time energy,short-time zero ctossing counter,characteristics of the endpoint,time domain,python
目 录
摘 要 1
Abstract 2
第一章 引言 4
1.1 研究的背景 4
1.2 研究的现状与发展 4
1.3 研究的意义与结构安排 5
第二章 系统总体设计 6
2.1 声音中的各个部分 6
2.1.1 清音 6
2.1.2 浊音 6
2.1.3 噪音 7
2.1.4 静音 7
2.2 基于麦克风的语音识取 8
2.1 几种常见麦克风 8
2.2 麦克风阵列 8
2.3 语音端点检测的总体框架 8
2.6 梅尔频率倒谱系数的具体算法 13
第四章 结论 22
4.1 实验结果 22
4.1.1 处理后的语音能量谱图 22
4.1.2 初步的语音端点检测结果 22
4.1.3 进一步的语音端点检测的结果 23
4.1.4 最终的结果图片 24
第五章 结果分析 25
5.1 本实验的误差、错误分析以及改进 25
5.1.1 本实验的误差 25
5.1.2 本实验的错误分析 25
5.1.3 对原实验的改进 26
5.2 本实验所用方法的优缺点 26
5.2.1 双门限法的优点 27
5.2.2 双门限法的缺点 27
5.2.3 基于时域的特征参数的算法 27
第六章 结束语 28
致 谢 29
参考文献 30
第一章 引言
1.1 研究的背景
语言作为一种传递信息、交流思想的工具,是人类独有的、因相互交流思想与学习历史经验等需要而产生的的具有创造性、结构性、指代性与社会性的符号指令。它由具有一定规范的词汇和语法构成,使人类能够更加丰富具体地表达自己的思想情感与学习历史经验。语音、手势与表情等是语言在人类肢体上的体现,文字符号是语言的显像符号。根据维果斯基的社会互动理论,语言是人类认知发展中的中心和基础。
语言的产生,是人类在数万年的漫长演化中得到的相互交流的、只读书与人类的工具。
剩余内容已隐藏,请支付后下载全文,论文总字数:17408字