基于机器学习和用户行为的图灵测试算法的设计与实现毕业论文
2022-01-27 15:34:14
论文总字数:31540字
摘 要
近年来,有关图灵测试的试验算法等都处在一个焦点状态,无数科学家都尝试过设计一些算法来通过图灵测试,虽然有些算法能够初步通过,有一定的进展,但仅仅这样还是远远不够,需要更高级的反机器人算法。CAPTCHA也是一种图灵测试,它通过区分使用者是人还是机器来防止恶意刷票、论坛灌水、暴力破解密码等行为,被广泛用于网站等应用的注册与登录。而验证的方式目前出现了很多类,较为常见的是拖动滑块以完成图片或者点击汉字字母之类的,而这样的验证方式都会留下鼠标轨迹。而且现在电脑的普及以及各种层出不同的应用以及游戏的出现,虽然大部分用户或者玩家都能遵守游戏规则和网络安全规则,但是总有部分人为了追求快感或者一些不良目的而做出一些有损公平的行为。模拟鼠标轨迹就成了偷懒、游戏捷径、绕过检测或者某些黑客批量攻击的手段等等。随着机器学习等技术的发展,分辨各种类别的数据变的有迹可循,如何准确的分辨机器和人为的轨迹也是一项具有难度的工作。
本算法的设计和测试参考2017年全国高校大数据挑战赛提供的数据集,采用机器学习的方式,通过对用户特征的提取,用多种分类算法训练模型以尽量能准确的区分机器轨迹与认为轨迹,并尝试着寻求最优参数和统计方法以追求精度的最大化。
算法使用Python语言,版本为3.6,使用PyCharm开发环境进行开发,以sklearn算法库加速机器学习模型的设计与实现。
关键词:机器学习;特征提取;鼠标轨迹;图灵测试
Design and Implementation of Turing Test Algorithm Based on Machine Learning and User Behavior
Abstract
In recent years, test and algorithms for Turing Test were in a focused state, Numerous scientists tried to design some algorithms to pass the Turing test. Although some algorithms can be passed initially, the overall results are unsatisfactory. CAPTCHA is also a kind of Turing test. It is widely used for registration and login of websites and other applications by distinguishing whether a user is a person or a machine to prevent malicious brushing of votes, forum flooding, brute force cracking of passwords, and the like. There are many types of verification methods like dragging a slider to complete a picture or clicking the correct character or the like. These verification method leaves a mouse track. Now that, with the popularity of computers and the rapid development of various applications and games, although most users or players can follow the rules of the game and network security rules, but some people always make some pursuit of pleasure or with some bad purpose to make behavior detrimental to fair. Simulating mouse trajectories has become the way of lazy, game shortcuts, bypassing detection, or some hacking batches, etc., With the development of machine learning and other technologies, it is possible to distinguish between various types of data. It is also a difficult task to accurately distinguish between machines and human trajectories.
The algorithm data set was provided by the National University Big Data Challenge Competition in 2017. Its main means is Machine learning. The extraction of user characteristics and a variety of classification algorithms were used to train models to distinguish the machine and human trajectory as accurately as possible, and tried to Seek optimal parameters and statistical methods to maximize the pursuit of accuracy.
The algorithm uses Python language(version 3.6) and uses PyCharm for development. It is based on sk-learn to accelerate the design and implementation of machine learning model.
Keywords: Machine Learning; Features Extraction; Mouse Trace;Turing Test
目录
摘要 I
Abstract III
第一章 绪论 1
1.1课题背景 1
1.2 目的和意义 2
1.3开发工具及技术 2
1.3.1开发工具 2
1.3.2 PyCharm 3
1.3.3 Sklearn简介 3
第二章 需求分析 4
2.1需求分析 4
2.2可行性分析 4
2.2.1技术的可行性 4
2.2.2经济的可行性 5
2.3数据分析 5
2.4设计思想 5
2.5 设计流程图 6
2.6数据的完整性 7
2.7实验设置 7
第三章 用户行为分析 8
3.1鼠标轨迹特征分析 8
3.1.1特征提取算法的设计 12
3.1.2特征选择算法的设计 15
3.1.3特征降维的处理 17
第四章 算法精度测试 20
4.1 算法精度测试的目的与意义 20
4.2测试过程 20
4.2.1 LogisticRegression算法模型的测试 20
4.2.2 Decision Tree算法模型的测试 21
4.2.3 SVM算法模型的测试 23
4.2.4 模型融合算法的测试 26
结论 29
思考与展望 32
参考文献 33
致谢 34
第一章 绪论
1.1课题背景
目前,人工智能刮起了一股热潮,其中最为热门的当属机器学习,机器学习(Machine Learning, ML)并不仅仅指这一种技术,它由诸如概率论、统计学等多门专业的学科组合而成。它通过训练模型的方式,使机器能够自我的识别一些东西,比如图片识别,文字识别等一些相关技术。图灵测试是机器学习应用的典型场景之一,它能够区分诸如回答问题的是人类还是机器这种场景,同样的我们可以依托于图灵测试,从而判断出人机的差别。
验证码(CAPTCHA)也是一种图灵测试,它同样的更够判断用户是人还是机器,并且能够有效的防止诸如违规刷票、论坛灌水顶楼、恶意破解密码等行为,部分黑客就采用暴力破解的方式来获取他人密码信息,而有了验证码则能够大大提升破解的难度,实际上用验证码进行登录验证或其他验证是现在很多网站通行的方式。实现的思路也很简单,就是通过计算机生成一个只能由人类来进行解答的问题,并对其进行评判,如果通过验证就认为该用户是人类。而验证的方式也有多种多样,目前较为常见的是让你在文本框中输入旁边图片中出现的字母数字,更加新式的验证方式则以拖动滑块来完成图片或者点击字母汉字之类的验证为代表,而这样的验证方式都会留下鼠标的运动轨迹。
请支付后下载全文,论文总字数:31540字