基于自建平行语料库的人机翻译语言特征差异多维分析 —— 以《射雕英雄传》英译本为例 A Parallel-Corpora Comparative Study of Machine Translation and Human Translation—— Based upon “Legends of the Condor Heroes开题报告
2020-02-18 18:10:31
1. 研究目的与意义(文献综述)
machine translation (mt) is a pattern that convertsone kind of language symbols to another using computer. as machine learning(ml) algorithms have gotten more complex and powerful, natural language process(nlp) techniques have been widely applied. technical giants like google andbaidu have developed progressive mt system and perform well. although whethermt will ultimately exceed human translation remains controversial, it becomesan irreversible trend that they should cooperate to better serve thetranslation industry.
the assessment of mt system is a cross-disciplinaryfield in both computer science and linguists. by exploring how mt algorithmsprocess natural languages, it becomes clearer how they can be improved, and howthe human language work (doddington, 2002). atpresent most of studies are concentrating on how machine translation algorithmsiterate and help improve the quality of translation work. wolk (2015) states ina study that modern neutral network-based machine translation (nmt) isfundamentally different from previous statistical or rule-based systems and theperformance is constantly improving. but how to apply linguistic theories andthe functions of corpus has always been ignored. thus, in addition to algorithmoptimization, we should pay attention to linguistics and its assistance ontranslation work.
2. 研究的基本内容与方案
this paper compared the linguistic and literaryfeatures of human translation with machine translation through md analysis andquantitative linguistics. this study is based upon a parallel corpus oftranslations of “legends of the condor heroes” built and optimized by theauthor. the machine translation corpus is provided by google translationapplying neural network technology. the comparison is aimed to find discrepancyof their linguistic features with md analysis and quantitative linguisticapproach.
the md analysis proposed by biber [1988] provides amore comprehensive linguistic description of a particular text type as itstarts from examining multiple linguistic features to see how they form thelinguistic regularities across a large number of texts. these linguisticco-occurrence patterns are then interpreted as “dimensions” according to the“situational, social, and cognitive functions most widely shared by thelinguistic features” [biber,2011, p. 6] biber’s pioneering study identified sixunderlying linguistic dimensions in the london–oslo–bergen corpus consisting of23 written and spoken text types. the following dimensions were generatedthrough multivariate statistical analysis [1988, p.102-114]:
3. 研究计划与安排
before 1st january : settlement of the title
before 1stmarch: submission of the outline
before 15thapril : submission of the first draft
4. 参考文献(12篇以上)
[1] anna holmwood. a hero born:legends of the condor heroes[m]. london: maclehose press, 2018.
[2] baker, m. towards a methodology for investigating the style of aliterary translator[j]. targetinternational journal of translation studies, 2000, 12(2): 241-266.
[3] biber, d. variation across speechand writing[m]. cambridge: cambridge university press, 1988.