旅游博客中地名的提取与消歧外文翻译资料
2021-12-11 21:54:17
英语原文共 6 页
Extraction and Disambiguation of Name of Place from Tourism Blogs
旅游博客中地名的提取与消歧
Abstract—By development of the Internet in recent years, tourism portal sites and blog articles about tourism increased on WWW. Acquisition of various tourism information became easy. When gathering and classifying the information automatically from blog articles, it is not easy to decide automatically place names used as the key. In this paper, we propose a method of extracting place names from blog articles automatically. Moreover, we also tried disambiguation of a place name.
摘要--近年来, 随着互联网的发展, 万维网上关于旅游的旅游门户网站和博客文章有所增加。获取各种旅游信息变得很容易。在从博客文章中自动收集和分类信息时, 要自动决定作为key的地名并不容易。本文提出了一种从博客文章中自动提取地名的方法。此外, 我们还尝试消除地名的歧义。
I. INTRODUCTION
The information on the Internet is continuing increasing every day. Thanks to the development of search technology, anyone can obtain various information in large quantities nowadays. Tourism information is not an exception. Most people use the Internet to obtain the information concerning the destination and stay before they actually going out for sightseeing. These tourism information are available in
互联网上的信息每天都在不断增加。由于搜索技术的发展, 现在任何人都可以大量获得各种信息。旅游信息也不例外。大多数人在真正外出观光之前使用互联网来获取有关目的地的信息,这些旅游信息可在
(a) Tourism portal sites, (b) Web pages of service providers and (c) usersrsquo; Blog pages.
(a) 旅游门户网站; (b) 服务器提供者的网页 (c) 用户的博客网页。
For example, official information with respect to the accommodation and hotels are available at travel agentrsquo;s reservation service site. Many people write their comments and opinion in so-called “word-of-mouth” site where we can use othersrsquo; evaluation for choosing a hotel. However, writing a critical comment is not easy. In fact, the ratio of negative opinions are very small compared to the positive ones. Anyway, perfect neutrality is not guaranteed.
例如, 有关住宿和酒店的官方信息可在旅行社的预订服务网站获得。很多人把自己的评论和意见写在所谓的 '口碑' 网站上, 我们可以用别人的评价来选择酒店。然而, 撰写批评意见并不容易。事实上, 与正面意见相比, 负面意见的比例很小。无论如何, 完美的中立是不能保证的。
There are a large number of blog articles where general users write individual experiences, travel records and opinions. These articles are not official ones but can be reliable in a sense that they have no influence from organizations. The blog articles cover a wide range of topics. It is often the case that a user writes where they visited and what sort of food they enjoyed, together with their comment on the hotel where they stayed. These information cannot be found at a hotel site.
有大量的博客文章, 一般用户在那里写个人体验、旅行记录和意见。这些文章不是官方的, 但从某种意义上说, 它们没有来自组织的影响力, 可以是可靠的。博客文章涵盖了广泛的主题。通常情况下, 用户写他们参观的地方和他们喜欢的食物, 以及他们对住在哪里的酒店的评论。在酒店网站上找不到这些信息。
However, as compared with an official site, neither a detailed regional name nor an institution name are clear in many cases. To use these personal opinion, we first need to extract the exact place names and we need to confirm the real address of the spots. It is relatively easy for human to understand the place name, since we have a lot of background knowledge and we can grasp the context. However, automation of the processing is not easy. The authors have been working concerning to search engine of tourism information. The present paper reports a trial of place name extraction from tourism blogs and disambiguation of location names.
然而, 与官方网站相比, 在许多情况下, 详细的地区名称和机构名称都不清楚。要使用这些个人意见, 我们首先需要提取确切的地名, 我们需要确认这些景点的真实地址。人类理解地名相对容易, 因为我们有很多背景知识, 我们可以把握语境。然而, 自动化的处理并不容易。作者一直致力于旅游信息的搜索引擎。本文报道了从旅游博客中提取地名和消除地名歧义的试验。
II. RELATED WORK
There are many researches in collecting and extracting valuable information from Web pages or news paper articles. For example, [12] tried to define tourism information by extracting keywords from WWW documents. They proposed a key map which visualizes co-occurrence relation, and showed the concrete key map obtained from the search results with respect to the keywords of tourism in Hokkaido or Okinawa. [11] proposed a natural language interface for the sightseeing tour search engine. In order to offer the travel plan according to an individualrsquo;s liking, [2] considered that collecting and updating sightseeing information are crucial. They realized information extraction agent (IE) and information clustering agent (IC) as add-on function to the tourism recommendation system. They demonstrated the effectiveness by providing the names of services and location, the price, the time, and the period which they extracted from Web pages using patterns.
在从网页或报刊文章中收集和提取有价值的信息方面有许多研究。例如, [12] 试图通过从 WWW 文档中提取关键字来定义旅游信息。他们提出了一个可视化共现关系的关键地图, 并显示了从搜索结果中获得的关于北海道或冲绳旅游关键词的具体关键地图。[11] 为观光旅游搜索引擎提出了一个自然语言接口。为了根据个人的喜好提供旅游计划, [2] 认为收集和更新观光信息是至关重要的。他们实现了信息提取代理 (IE) 和信息聚类剂 (IC) 作为旅游推荐系统的附加功能。他们通过提供服务名称和位置、价格、时间以及使用模式从网页中提取的时间来证明其有效性。
The targets of these researches are Web pages offered by the travel agents or the related organization. The objects of the present paper on the other hand are various blog pages written by general users and there is no common pattern in those pages. Many people write their experience and opinion on their blogs.
这些研究的对象是旅行社或相关组织提供的网页。另一方面, 本文的对象是一般用户编写的各种博客页面, 这些页面中没有常见的模式。许多人在博客上写下他们的经验和意见。
Attentions are being paid for the analysis and practical use of these blogs. [4] proposed a method to extract the keyword from tourism blogs that characterize the area. [9] succeeded to choose feature sentences using sentiment information and clustered the sentences depending on semantic attributes of partofspeechandthecharacteristicnounswithtourism.[13] analyzed the difference between the information provided by the travel agent and those by blogs concerning tourism. [14], [15] proposed the methods to extract the events of the regions.
目前正在为这些博客的分析和实际使用提供关注。[4] 提出了一种从旅游博客中提取该地区特征的关键字的方法。[9] 成功地利用情感信息选择特征句子, 并根据旅游的语义属性对句子进行分组。[13] 分析了旅行社提供的信息与博客提供的有关旅游的信息之间的差异。[14], [15] 提出的方法, 以提取区域的事件。
The name of places are typical example of the named entities for which there have been many researches. [7] analyzed characteristic occurrences of tag pa
资料编号:[5755]