登录

  • 登录
  • 忘记密码?点击找回

注册

  • 获取手机验证码 60
  • 注册

找回密码

  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 电子信息类 > 通信工程 > 正文

基于大数据的数据挖掘算法设计-K-Means算法毕业论文

 2021-10-06 13:56:12  

摘 要

Abstract 5

第1章 绪论 6

1.1论文的研背景与意义 6

1.2 国内外研究动态 7

1.3论文的主要内容及组织结构 8

第2章 数据挖掘 8

2.1 概述 8

2.2 数据挖掘的意义 11

第3章 聚类算法 13

3.1聚类分析介绍 13

3.1.1聚类的概念 13

3.1.2聚类算法 14

第4章 K-Means算法分析及改进 17

4.1 K-Means算法 17

4.2 K-Means算法的优缺点分析 19

4.3改进的K-Means算法 19

4.3.1基本思想 19

4.3.2孤立点检测 20

4.3.3初始中心确定 21

4.3.4改进的K-Means算法描述 22

4.3.5实验结果分析 23

第5章 总结与展望 25

5.1本文的总结 25

5.2 未来的展望 26

致 谢 27

参考文献 28

摘 要

作为一种有效的数据分析方法,数据挖掘技术是从大量数据中进行提取、挖掘并最终输出有价值数据的过程,数据挖掘的方法众多,K-Means聚类分析是其中最常见也是最经典的一种方法,在各个领域获得了广泛应用,学术界也不断有新的模型优化和改进算法。

本文是针对聚类算法k-means算法的研究,首先简单介绍了相关的数据处理和算法,阐述了基本概念。其次重点对k-means算法进行了研究和改进,包括对K-Means聚类分析优缺点的分析,基于其优缺点提出了改进算法分析,并进行了实验验证。

在实际研究中,文章结合数据挖掘理论,基于相关概念详细分析了数据挖掘技术及其在国内外的应用。并基于聚类分析理论及其算法,详细分析了k-means算法及其优缺点,针对k-means算法现状提出了改进算法,针对其存在的孤立点和初始聚类随机问题,提出了相应的改进算法。这种新型的改进算法是利用异常分析法,具有缜密的数学数理学基础,还可以防止用户设定阀值条件,同时,新的算法提出了初始聚类的思想,通过这种方法来进行集中数据分类,并确保集群的严格相似性。k-means算法的改进方法实现离群点检测,通过初始聚类降低了对聚类结果可能造成的干扰,首先局部最优解,并可以减少迭代算法的数量。文章研究的最后,通过iris数据集的实验结果,对改进算法进行了验证,实验结果表明,改进算法与原算法相比,具有更好的精度和收敛速度。

关键词:数据挖掘,聚类算法,k-means算法

Abstract

Non-trivial process of data mining is to extract data from a large number of valid, novel, potentially useful, credible, and can eventually be understood pattern. Cluster analysis is an analytical method for data mining, and K-Means is one of the most classic and most widely used clustering methods, however there are many today threw its improved model proposed.

This paper is the study of clustering algorithm k-means algorithm. It introduces the concepts of clustering algorithm. Second, focus on the k-means algorithm is analyzed and studied this paper systematically introduces the basic theory of clustering and clustering mining, and then make their own improved method for the limitations of K-Means Algorithm

First, the article describes the current status of research clustering algorithm at home and abroad. At the same time, briefly describes the content of data mining theory, including the concept and the steps of data mining data mining.

Then, in the concept of clustering algorithms and clustering introduces cluster analysis on the basis of theoretical knowledge, focusing on the interpretation of K-Means algorithm, and to analyze the advantages and disadvantages. For the original K-Means algorithm that isolate the impact point and the initial cluster centers randomly selected issues proposed Outlier Analysis with improved K-Means clustering algorithm. Outlier analysis is mainly based on thought of statistics "Z score (standard score) is greater than the absolute value of the data 2 as isolated points", this approach not only has a strict mathematical theory basis and avoid the user to set the threshold prerequisite. Determine the initial cluster centers policy is relatively centralized data every time carved out first, so that you can ensure that each cluster divided data object has a higher similarity. Outlier detection may reduce the effect of an isolated point of clustering results and improved K-Means algorithm to determine the initial cluster centers algorithm strategy can reduce the possibility of local optimum and to some extent, reduce the number of iterations of the algorithm. Then using iris data sets for improved algorithm experiments to verify the effectiveness and improve the performance of K-Means algorithm than the original algorithm has greatly improved in comparison.

Key Words:Data Mining,Clustering Algorithm,k -means clustering

第1章 绪论

1.1论文的研究背景与意义

近年来,随着计算机信息技术的发展,数据库技术得到了普及和应用,数据库技术的应用使得数据容量利用效率不断提升,对于企业管理、科研、存储等具有极大的促进作用,对数据库的应用能够极大的提升企业经营管理水平,促进管理者进行科学决策。在此背景下,数据挖掘技术不断被普及和应用。

您需要先支付 80元 才能查看全部内容!立即支付

企业微信

Copyright © 2010-2022 毕业论文网 站点地图