基于Apriori算法的关联规则挖掘系统的设计与实现毕业论文

2021-03-12 00:10:31

摘要

数据挖掘就是从大量的、不完全的、有噪声的、模糊的、随机的数据集合中，提取隐含在其中的并且有用的信息和知识的过程，它是一种应用前景非常广泛的数据分析方法，特别是用来解决随着信息技术发展所带来的大数据分析问题。在众多数据挖掘算法中，关联规则挖掘是其中一个十分重要的研究方向。由于关联规则挖掘算法能够较为有效的捕捉数据之间的隐藏的重要关系，再加之所挖掘的规则形式简洁、易于理解，近年来在越来越多的领域得以应用，卓有成效。本文的主要内容是基于Apriori算法实现一个关联规则可视化的系统，以清晰的展示如何从数据集合中发掘关联规则，除此之外还通过实际数据集来验证Apriori算法的效果，以及各个参数设置对于算法运行效果和准确度的影响。

在系统实现方面，我们基于Web技术构建了一个关联规则可视化的平台，前端通过网页输入最小支持度（support）和最小置信度（confidence）的阀值并且上传训练数据集，然后通过HTTP（HyperText Transfer Protocol）协议将这些数据传输给服务器，服务器端将对所得到的数据进行关联规则挖掘计算。并将得到的关联规则集传送到浏览器端并进行可视化的显示。这样我们就能通过该系统对实际数据中隐含的规律进行挖掘，并验证的到的关联规则的合理性和准确性。

为了测试算法的效果，我们使用了几组实际数据进行交叉验证。通过调整最小支持度和最小置信度的阀值来改变算法的运行过程，将得到的关联规则对测试集中数据进行测试，统计成功匹配的概率和匹配成功情况下关联规则能够正确预测的概率。经过对实验结果分析，我们发现支持度和置信度阀值的设定对于算法的性能和运算时间有着不同的影响。同时，我们对找出的关联规则进行分析，发现部分规则有相应的理论依据支持，一定程度上反映了Apriori算法的合理性和有效性。

关键词：数据挖掘；关联规则挖掘；Apriori算法

Design and Implementation of Association Rule Mining System Based on Apriori Algorithm

Abstract

Data mining is a process of extracting and useful information and knowledge from a large, incomplete, noisy, fuzzy, and random data set. It is a very broad application of data analysis Methods, in particular, to address the development of information technology with the large data analysis.Among the many data mining algorithms, association rule mining is one of the most important research directions. Because the association rules mining algorithm can capture the hidden important relationship between the data effectively, and then the mining rules are simple and easy to understand. In recent years, it has been applied in more and more fields. The main content of this paper is to implement an association rule visualization system based on Apriori algorithm to clearly show how to find association rules from data set. In addition, we can verify the effect of Apriori algorithm and the parameter setting through the actual data set. Algorithm operation effect and accuracy influence.

In terms of system implementation, we build a platform for visualizing association rules based on Web technology. The front end inputs the minimum support and minimum confidence thresholds through the web page and uploads the training data set, and then passes the HTTP (HyperText Transfer Protocol) protocol to the data transmission to the server, the server will be on the data obtained by the association rules mining calculation. And sends the resulting association rule set to the browser and visualizes the display. In this way, we can use the system to excavate the implicit laws in the actual data and verify the rationality and accuracy of the associated rules.

In order to test the effect of the algorithm, we used several sets of actual data for cross validation. By adjusting the threshold of minimum support and minimum confidence, the operation process of the algorithm is changed, and the obtained association rules are used to test the test data. The probability of successful matching and the probability of the association rule can be correctly predicted. After analyzing the experimental results, we found that the setting of the support and confidence thresholds had different effects on the performance and computation time of the algorithm. At the same time, we analyze the association rules and find that some of the rules have the corresponding theoretical basis to support, to a certain extent, reflects the Apriori algorithm is reasonable and effective.

Keywords： Data mining; association rule mining; Apriori algorithm

第一章绪论 1

1.1 论文研究的介绍 1

1.2系统实现中所使用的技术 2

1.3 国内外发展现状 2

1.3.1 国外研究现状 2

1.3.2 国内研究现状 4

1.4论文组织结构 4

第二章关联规则相关概念 5

2.1 引言 5

2.2关联规则的基本概念 5

2.2.1关联规则的相关定义 5

2.2.2 关联规则的性质 7

2.3 关联规则的分类 7

2.4 关联规则挖掘步骤 9

第三章 Apriori可视化算法实现 9

3.1 Apriori算法概述 9

3.2 Apriori算法思想 10

3.3 Apriori算法实现过程 10

3.4 Apriori算法可视化实现 14

3.5 Apriori算法可视化系统介绍 15

3.6 规则匹配预测 16

第四章数据测试及结果分析 17

4.1测试数据集说明 17

4.2结果分析 17

4.2.1 Apriori算法运行时间 17

4.2.2 验证关联规则 20

4.2.3关联规则的实际意义 24

第五章总结与展望 25

5.1总结 25

5.2 展望 26

参考文献 27

第一章绪论

1.1 论文研究的介绍

如今随着数据库技术的迅猛发展以及各种数据库管理系统的广泛使用，人们积聚和保存的数据越来越多。毫无疑问，在这些激增的数据背后潜藏着许多的重要信息。人们希望对这些数据进行有效的、高效率的分析，以便能够更好的来使用这些数据。我们所遇到的问题不是缺少数据，而是不能够真正的找到有效的、需要的数据。仅仅依靠传统的数据检索方法和分析工具已经远远无法满足人们的实际需求。在这样的背景下，一个新的研究领域-知识发现孕育而生。由于存储知识的数据大多位于数据库中，数据挖掘也被称为数据库中的知识发现KDD（Knowlegde Discovery in Database）,这是随着近几年来数据库技术和人工智能技术发展起来的技术。通过处理大量的日常的业务数据以抽取一些有价值的知识或信息。

目前，数据挖掘的主要研究方向主要分为：数据总结、数据分类、数据聚类、关联规则等方面。关联规则表示数据库中一组对象之间的某些关联关系的规则。比如，”购买商品A和B的人中有95%又购买的C和D”。从这些规则中我们可以发掘出顾客的购买模式，可以用于商品的货架设计、生产安排、针对性的市场营销等。采用关联关联模型比较经典的例子及时”啤酒和尿布”。在美国，许多年轻的父亲在下班后经常去超市买婴儿尿布，超市经过对顾客购买模式的挖掘，得出结论在购买尿布的年轻父亲中，有30%~40%的人会同时购买啤酒。超市随后调整了货架的摆放，将尿布与啤酒摆放在一起，结果是：啤酒与尿布的销量都明显增加了。该商家的最初目的就是希望在销售数据中发现那些经常容易被同时购买的产品。当某些商品同时出现的次数超过某一个阀值时，就认为这些商品之间有可能存在关联关系。发现这种关联关系的过程就叫做关联关系挖掘。

您需要先支付 80元 才能查看全部内容！立即支付

注册

找回密码