基于粗糙集属性约简的多分类器集成系统研究毕业论文
2022-04-13 19:55:27
论文总字数:19872字
摘 要
单分类器局限于自身的分类领域、分类能力,可能会出现在训练集中表现很好,但遇到待测数据效果却不如人意的情况。由此看出单分类器的不稳定。另一方面,当其分类性能达到某一程度,想要再提高是很困难的。集成分类技术在当下应用很多,常用训练集获得多个具有差异性的基分类器,最后投票得出最终的分类结果,获得更好的分类效果。而选择性能好的单分类器进行集成,准确性和稳定性都能得以保证。
多分类器集成的效果,与个体分类器的数量,没有直接关联。也就是说,并不是个体分类器越多越好。本文利用遗传算法,从分类器中选择部分进行集成,提高最终的分类效果。文章大致思路可分为四个部分:样本集的属性进行约简,去掉不必要的属性;构造基分类器,利用KNN(k-Nearest Neighbor algorithm,又称k邻近算法)分类器,根据所得约简,对训练集进行训练,生成多个不同类别的基分类器;运用遗传算法基分类器中选取适应度高的分类器,最终构成多分类器集成系统;利用多数投票法投出最终的分类结果。为了验证效果,采用六个大小不同属性不一的UCI数据集,对其进行测试比对,观察分类准确率。
关键词:粗糙集 属性约简 遗传算法 选择性集成 多分类器集成
Research on multiple classifier ensemble system based on rough set attribute reduction
Abstract
Single classifier is limited in its classification, classification ability, may appear in the training set performance is very good, but when the measured data on the effect is not satisfactory. From this we can see that the instability of single classifier. On the other hand, when the classification performance reaches a certain degree, want to improve is very difficult. Ensemble classification technique in the present many applications, common training set to get more with different base classifiers, with a final vote draws the final classification results, obtain better classification effect. And the choice of the good performance of the single classifier integration accuracy and stability can be guaranteed.
Multi classifier integration effect, and the number of individual classifiers integrated participation, no direct link. That is to say, and the individual classifier is not the more the better. In this paper, we propose a selective ensemble method based on genetic algorithm. The genetic algorithm from the individual classifiers selective selected parts of the quality of the individual classifier for the final integration, in order to improve the multiple classifier systems the final classification result.Effect of integrated classifier is generated from a neighbor (k-Nearest; the general ideas can be divided into four parts: the sample set of attribute reduction, to remove unnecessary attributes structure based classifier, KNN algorithm, also known as the k nearest neighbor algorithm) classifier based on rough set attribute reduction to obtain the reduction of the training set is trained by different categories of base classifiers using genetic algorithm based classifier are selected to relatively high integration, a multiple classifier system; the majority voting method to cast the final classification results. In order to verify, using six different size property is not a UCI data set, the test comparison, observation and classification accuracy rate.
.
Key Words: Rough Set;Attribute Reduction;Genetic Algorithm;Selective Ensemble Method;Multiple Classifier Ensemble
目录
摘要 I
Abstract II
第一章 绪论 1
1.1研究背景及意义 1
1.2国内外研究现状 2
1.3 本文的主要工作和结构安排 2
第二章 集成学习相关知识 4
2.1集成学习的基本思想 4
2.2基分类器的生成 5
2.3基分类器的集成方式 7
2.4基分类器输出的组合 8
第三章 粗糙集模型及其属性约简 11
3.1粗糙集基本概念 11
3.1.1等价类 11
3.1.2上下近似 11
3.1.3正域 12
3.2常见的属性约简算法 12
3.2.1基于区分矩阵的方法 12
3.2.2基于信息熵的属性约简方法 12
3.2.3一般的约简方法 13
3.2.4基于依赖度的改进的约简方法 14
3.2.5贪心算法 15
第四章 基于粗糙集属性约简的多分类器集成系统 16
4.1设计流程 16
4.2离散化处理 16
4.3属性约简 17
4.4基分类器的生成 18
4.5个体分类器选择 19
4.6多分类器集成 20
4.7仿真实验及结果分析 21
第五章 总结与展望 23
5.1本文总结 23
5.2展望 23
参考文献 24
致谢 26
第一章 绪论
1.1研究背景及意义
研究过程中发现,单分类器的学习能力有限,改进单分类器的性能来解决问题是个难题。而以某种规则组合多个单分类器,性能则远高于单分类器中最优者。相较于提高单个分类器的性能而言,提高集成分类器的性能所花代价要小得多。
请支付后下载全文,论文总字数:19872字