SPARK下基于实时数据流的船舶轨迹聚类算法研究与实现毕业论文
2021-03-13 22:26:51
摘 要
随着我国航运的繁荣发展,船舶轨迹数据作为水上交通调查的重要数据来源,是发现船舶总体运动规律的重要依据。而利用聚类方式研究船舶轨迹,可以挖掘船舶总体交通模式、建立各种类型船舶典型轨迹,也为检测船舶异常轨迹奠定基础。同时随着季节与天气的变化,水域环境发生变化,船舶航行轨迹也会随之变化,因此,利用实时获得的AIS(Automatic Identification System)数据流对船舶轨迹进行聚类与更新,能更有效地动态反映船舶航行特征。随着云计算技术的发展, Spark是继Hadoop之后的新一代大数据分布式处理框架,对流数据处理有专门的接口。为了提高聚类性能,应对船舶数据量的增大和实时聚类对运算速度的要求,本论文对船舶轨迹实时聚类方法进行研究,优化聚类算法,并基于Spark并行化聚类过程,提高运算速度,满足实时聚类要求。本文主要工作如下:
- 设计了基于Spark的船舶子轨迹划分的并行化算法,实现了在Spark下并行化的对船舶轨迹进行划分,提高了划分效率,为船舶轨迹聚类奠定了基础。
- 改进了经典聚类算法DBSCAN,设计了面向船舶子轨迹的聚类算法,引入了ball tree结构,将寻找聚类中心的范围缩小,降低了运算量,提高了算法的运行效率,并且在Spark下进行并行化实现,结合航道分区处理,通过实验确定了聚类参数邻域和密度阈值,依照聚类结果建立船舶典型轨迹模型,最后通过实验验证了算法的可行性和有效性;
- 设计了基于Spark Streaming的船舶轨迹实时聚类算法,借鉴CluStream在线、离线两阶段框架思想,离线阶段利用Spark下改进的DBSCAN算法对历史AIS数据进行轨迹聚类所构建船舶典型轨迹模型;在线阶段接收并处理实时AIS数据,更新ball tree结构,并周期性的保存;在此基础上,对更新后的ball tree结构中船舶轨迹数据进行重新聚类,并更新船舶典型轨迹模型,实现了对船舶轨迹的实时聚类,使船舶典型轨迹模型能够适应正常轨迹的动态变化,最后通过实验验证了动态更新方法的有效性。
关键词:AIS数据;船舶轨迹;DBSCAN;Spark;实时聚类
Abstract
With the prosperity and development of our shipping, ship trajectory data as an important source of data for water traffic survey is an important basis for finding the overall movement of the ship. The use of clustering method to study the ship trajectory, you can tap the overall traffic mode of the ship, the establishment of various types of ships typical trajectory, but also to lay the foundation for the detection of abnormal trajectories. At the same time, with the change of season and weather, the water environment changes and the navigation trajectory of the ship will change. Therefore, it is more effective to cluster and update the ship's trajectories by using the real-time AIS (Automatic Identification System) Dynamic reflection of ship navigation characteristics. With the development of cloud computing technology, Spark is a new generation of big data distributed processing framework after Hadoop, convective data processing has a special interface. In order to improve the clustering performance, and to increase the amount of ship data and the requirement of real-time clustering to calculate the speed, this paper studies the real-time clustering method of ship trajectory and optimizes the clustering algorithm. Based on the Spark parallel clustering process, Speed, meet real-time clustering requirements. The main work of this paper is as follows:
- The parallel algorithm of ship sub-trajectory division based on Spark is designed, which divides the ship's trajectory parallelism under Spark, improves the efficiency of partitioning, and lays the foundation for ship trajectory clustering;
- The classical clustering algorithm DBSCAN is improved, and the clustering algorithm for ship sub-trajectory is designed. The ball tree structure is introduced to reduce the scope of the clustering center, reduce the computational complexity and improve the efficiency of the algorithm. Spark under the parallelization, combined with the channel partition processing, through the experiment to determine the clustering parameters neighborhood and density threshold, according to the clustering results to establish the ship typical trajectory model, and finally through the experiment to verify the feasibility and effectiveness of the algorithm;
- Designed a real-time clustering algorithm for ship trajectory based on Spark Streaming, learns the CluStream online and offline two-stage framework, receives and processes real-time AIS data in the online stage, updates the ball tree structure, and periodically saves it, Based on the improved DBSCAN algorithm, the ship trajectory data of the updated ball tree structure is re-clustered and the typical trajectory model of the ship is updated. The real trajectory of the ship trajectory makes the typical trajectory model of the ship adapt to the dynamic change of the normal trajectory. Finally, the validity of the dynamic updating method is verified by experiments.
Keywords:AIS; ship trajectory; DBSCAN; Spark; real-time clustering;
目 录
第1章 绪论 1
1.1研究背景 1
1.2研究目的及意义 1
1.3国内外研究现状 1
1.3.1 国内研究现状 1
1.3.2 国内研究现状 2
1.3.3 存在的问题分析 3
1.4研究内容 3
1.5本文组织结构 3
第2章 基于DBSCAN改进的船舶轨迹聚类算法 5
2.1 AIS数据 5
2.2 船舶子轨迹划分方法 5
2.3船舶子轨迹相似性度量方法 8
2.4基于DBSCAN的聚类算法改进 10
2.4.1 DBSCAN聚类算法分析 10
2.4.2 面向船舶子轨迹的快速聚类算法(BTFC) 11
2.5本章小结 13
第3章 基于Spark的实时AIS数据流聚类算法的研究与实现 14
3.1Spark概述 14
3.2基于Spark的实时AIS数据流聚类算法思想 14
3.3基于Spark的船舶轨迹离线聚类方法 14
3.4基于Spark Streaming的船舶轨迹在线聚类方法 15
3.4.1 船舶轨迹在线聚类思想 15
3.4.2 基于Spark Streaming的船舶轨迹在线聚类算法设计 16
3.5本章小结 17
第4章 实验验证及分析 18
4.1实验环境及数据 18
4.1.1 实验环境 18
4.1.2研究水域的确定 18
4.1.3研究对象的确定 18
4.1.4 AIS数据来源及预处理 19
4.2 基于Spark的船舶轨迹离线聚类方法测试 19
4.2.1 各分区航段综合距离权值的确定 20
4.2.2 邻域以及密度阀值minStr 20
4.2.3 聚类效果显示 20
4.2.4 Spark平台下BTFC算法效率分析 22
4.3 基于Spark Streaming的船舶轨迹在线聚类方法测试 22
4.4 本章小结 23
第5章 总结与展望 24
5.1工作与总结 24
5.2展望 24
参考文献 26
致 谢 28
第1章 绪论
1.1研究背景
随着海上交通量的迅猛增长以及海上交通环境日益复杂,船舶在生产活动中产生的时空轨迹也越来越多,利用聚类方式发现不同类型的船舶航行规律,分析其运动模式,建立相应的典型轨迹模型是实现海上智能交通的关键技术之一。由于传统的海上交通调查方法费时费力,且效率低下。而基于海量的船载 AIS 信息集合蕴藏着大量的海上交通特征,从中获取能够反映船舶行为规律的、有效的、潜在的信息,有利于进行海上交通调查获取数据信息。
随着季节和天气的变化,水域情况也会发生变化,船舶航行轨迹也会随之变化,因此,利用实时获得的AIS数据流对船舶轨迹进行聚类,能更有效地动态反映船舶轨迹特征。