N元组与二语写作水平关系研究The Relationship between N-gram Measures and L2 Writing Proficiency开题报告

2020-04-17 20:29:39

1. 研究目的与意义（文献综述包含参考文献）

1. Introduction 1.1Research Background Acquiring abundant English phraseology gradually becomes an indispensable part of process for gaining proficiency in second language (L2) writing. Corpus research has shown that many utterances in English are composed of fixed or semi-fixed multi-word sequences (MWSs), including collocations, idioms, n-grams, and lexical bundles (Romer, 2009; Sinclair, 1991). And much attention have been attached to the accurate use of highly frequent n-grams, especifically bigrams (2-word sequences) and trigrams (3-word sequences). A mass of studies have found out that a large quantity of English phraseology can help second language writers to compose more effective texts a lot. Understanding the ways in which patterns or sequences of words can be combined to form larger units of discourse can help learners produce texts that read more target-like (Pawley Syder, 1983). What#8217;s more, knowledge of multi-word sequences gives writers a processing advantage in relation to comprehending and producing written text (Ellis, 2012; Siyanova-Chanturia Martinez, 2015). Therefore, by increasing the knowledge of N-grams, L2 writers can achieve a higher degree of native-like fluency and be better able to complete more cognitively demanding writing tasks (Ellis, 2002, 2012). Among these studies, many researchers emphasize the importance of using N-gram in L2 language and supply massive evidence to prove the relationship between N-gram use and L2 writing proficiency. Nevertheless, recent research has suggested that N-gram use may be multi-faceted in nature and therefore warrant the investigation of concurrent multiple MWS indices (e.g., Gablasova et al.,2017; Kyle Crossley, 2015; Kyle, Crossley, Berger, 2018). Gablasova et al. (2017). With regard to the four indices, namely, frequency, dispersion, exclusivity (i.e. association strength), and directionality. In the previous research, it can be concluded that frequency and exclusivity have been far more used to measure L2 N-gram use in learner corpus research than indices such as dispersion and directional association strength. What#8217;s more, in learner corpus research, rarely is N-gram use operationalized as a polydimensional phenomenon. Given the above limitations of previous studies, the present study intends to make a comprehensive study of the relationship between L2 N-gram use and holistic writing proficiency scores in a multi-faceted sense. Specifically, N-gram (mainly bigram and trigram) use in a large corpus of compositions written by L1 Chinese learners of English is analyzed as regards reference corpus frequency, dispersion, and multiple measures of association strength. These features are of great significance which were used to predict holistic writing proficiency scores using a multiple regression analysis. 1.2 Need of the study The thesis has both practical and academic meanings. On the one hand, if the acquisition of N-grams has a positive impact on the score of writing proficiency, it will be helpful to support the teaching of N-grams in L2 writing classes. In this way, research could supply additional support to the inclusion of N-gram instruction in the L2 writing classroom, which has practical significance for second language teaching. Therefore, it is necessary to demonstrate how N-gram use is related to human judgments of writing proficiency. On the other hand, the study of relationship between multiple aspects of N-gram production and human ratings of proficiency will in return strengthen the understanding of the nature of L2 productive phraseological knowledge and its development across L2 writing proficiency levels. Last but not least, recognizing how various measures of N-gram production predict human ratings will be beneficial to create more accurate automatic essay scoring systems. These systems can provide more accurate assessment of writing quality for writing teachers and students and offer more detailed feedback which is of great importance. 1.3 Research Purposes The present study aims to investigate how the production of N-grams is predictive of human judgments of L2 writing proficiency. That#8217;s to say, to find out what indices of bigram and trigram use are predictive of human judgments of writing proficiency is the heart of the study. A large corpus of student compositions classified into seven proficiency levels by academic English teachers was analyzed using an automatic text analysis program, which computes a series of features relevant to phraseological knowledge, including bigram and trigram frequency, range, and association strength. For the sake of ascertaining which measures of bigram and trigram use were most predictive of human ratings, correlation and regression analyses were adopted between the indices and grades of proficiency. In addition, this study is dedicated to better comprehend the proporty of L2 productive phraseological knowledge and its relationship to writing proficiency, which is so important and meaningful as to improve multi-word sequence instruction in the second language writing classroom. In other words, if writing proficiency scores can be positively influenced by the use of N-grams, it would facilitate the process of teaching of N-grams in L2 writing classes. 2. Literature review 2.1 Definitions of N-grams Actually, there are a number of ways to identify N-grams. From the traditional view, N-grams were identified from a phraseological perspective that identifies important sequences on the basis of semantic transparency and constituent substitutability (Barfield Gyllstad, 2009). For example, Nesselhauf (2003, 2005) analyzed the verb-noun collocations producted by L1 German learners, differentiated confined collocations from free combinations and idioms on the basis of an arbitrary restriction on the verb in the collocation (i.e. the verb can only be united with certain nouns when utilized in a certain manner). Another approach, called the frequency-based approach, is based on work by John Sinclair (Sinclar, 1991) and identifies N-grams based on either frequency (Biber, Conrad, Cortes, 2004) or association strength between words within a sequence (Evert, 2005; Gablasova, Brezina, McEnery, 2017). Biber et al. (2004), for instance, investigated lexical bundles, which they defined as four-word sequences that occur at least 40 times per million words. On the contrary, association strength measures the degree to which words in the sequence appear solely or preponderantly together (Gablasova et al., 2017). This can be achieved by comparing the actual frequency of a sequence in a corpus with its expected frequency considering the frequency of its component words (Evert, 2009) using formulas such as mutual information (MI) and T-scores, among others (Gablasova et al., 2017; Gries Ellis, 2015). 2.2 Acquisition of N-grams Acquiring a productive comprehending of English N-grams is a tough yet significant mission for L2 writers (Pawley Syder, 1983). It#8217;s obviously that the knowledge of multi-word sequences gives writers a processing advantage in understanding and generating written text (Ellis, 2012; Siyanova-Chanturia Martinez, 2015 ). It also has the advantage of releasing cognitive resources for other language tasks, such as recalling propositional information (Nekrasova, 2009). When you are devoting yourself heart and soul to expressing your views, the cognition of multi-word sequences is conducive to focus on the fluency of indicting your thought, not simple of your acquisition of language. Now that having a certain level of understanding and utilizing of N-gram, L2 learners can create more native-like expressions than those who do not. 2.3 Aspects of L2 N-gram production It#8217;s well known that learner corpus research has mainly concentrated on two aspects of L2 N-gram production. The first one is the extent to which L2 writers use precast sequences. This study has proved that higher proficiency L2 learners have a tendency that they use a larger range of N-grams and output N-grams more frequently than lower proficiency L2 writers. Hsu (2007), discovered positive correlations between collocation type and token frequencies and holistic compositions scores created by an automatic compositions scoring system by means of using a corpus of essays written by L1 Chinese learners. Correlations were stablest for type frequencies and for verb-noun and adjective-noun collocations, which manifests that higher scoring writers were more liable to produce a larger quantity of collocations than lower scoring writers. Vidakovic and Barker (2010), did a study on lexical bundles of four words used in written responses to the Cambridge Life Skills Examinationin across five different proficiency levels (A1-C1 on the Common European Framework of Reference for Languages), found that learners at senior and advanced levels used a wider range of lexical bundles. In conclusion, the writers who are more superior were likely to use a wider range of functional lexical bundle types. Another center of the study of Learner Corpus N-gram is the extent to which L2 writers integrate target-like N-grams into their texts. According to the frequency or association strength information in the reference corpus, target-like N-gram is confirmed among these studies. Generally speaking, these studies have found that more skilled L2 writers tend to use N-grams that are more consistent with the target language domain. References Ackermann, K., Chen, Y. H. (2013). Developing the academic collocation list (ACL) e a corpus-driven and expert-judged approach. Journal of English for Academic Purposes, 12, 235-247. Barfield, A., Gyllstad, H. (2009). Introduction: Researching L2 collocation knowledge and development. In A. Barfield, H. Gyllstad (Eds.), Researching collocations in another language: Multiple interpretations (pp. 1-16). London: Palgrave Macmillan. Ellis, N. C. (2002). Frequency effects in language processing: A review with implications for theories of implicit and explicit language acquisition. Studies in Second Language Acquisition, 24, 143-188. Gablasova, D., Brezina, V., McEnery, T. (2017). Collocations in corpus-based language learning research: Identifying, comparing, and interpreting the evidence. Language Learning, 67(S1), 155-179. Durrant, P., Schmitt, N. (2009). To what extent do native and non-native writers make use of collocations? International Review of Applied Linguistics, 47, 157-177. Jones, M., Haywood, S. (2004). Facilitating the acquisition of formulaic sequences: An exploratory study in an EAP context. In N. Schmitt (Ed.), Formulaic sequences: Acquisition, processing, and use (pp. 269-300). Amsterdam: John Benjamins. Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some implications for teaching. Applied Linguistics, 24(2), 223-242. Conklin, K. N. Schmitt. 2008. Formulaic sequences: Are they processed more quickly than non -formulaic language by native and non-native speakers［J］. Applied Linguistics 29（1）：72-89. Cortes, V. 2004. Lexical bundles in published and student disciplinary writing: Examples from history and biology［J］. English for Specific Purposes 23（4）：397-423. Ellis N. 2002. ”Frequency effects in language processing”. Studies in Second Language Acquisition 24. Kormos J. 2006. Speech Production and Second Language Acquisition. New Jersey: Lawrence Erlbaum Associates. Auml;del, A. B. Erman. 2012. Recurrent word combinations in academic writing by native and non-native speakers of English: A lexical bundles approach[J]. English for Specific Purposes 31(2): 81-92. Biber, D., S. Conrad V. Cortes. 2004. If you look at...: Lexical bundles in university teaching and textbooks[J]. Applied Linguistics 25 (3): 371-405. Biber, D., S. Johansson, G. Leech, S. Conrad E. Finegan. 1999. Longman Grammar of Spoken and Written English[M]. Beijing: Foreign Language Teaching and Research Press. Chen, Y. P. Baker. 2010. Lexical bundles in L1 and L2 academic writing[J]. Language Learner and Technology 14(2): 30-49. Grabowski, L. 2015. Keywords and lexical bundles within English pharmaceutical discourse: A corpus-driven description[J]. English for Specific Purposes 38(2): 23-33. Hyland, K. 2008a. Academic clusters: Text patterning in published and postgraduate writing[J]. International Journal of Applied Linguistics 18(1): 41-62. Hyland, K. 2008b. As can be seen: Lexical bundles and disciplinary variation[J]. English for Specific Purposes 27(1): 4-21. Pan, F., R. Reppen D. Biber. 2016. Comparing patterns of L1 versus L2 English academic professionals: Lexical bundles in telecommunications research journals[J]. Journal of English for Academic Purposes 21(1): 60-71. P#233;rez-Llantada, C. 2014. Formulaic language in L1 and L2 expert academic writing: Convergent and divergent usage[J]. Journal of English for Academic Purposes 14(2): 84-94. Rouml;mer, U. 2009. English in academia: Does nativeness matter[J]. Anglistik: International Journal of English Studies 20(2): 89-100. Staples, S., J. Egbert., D. Biber A. McClair. 2013. Formulaic sequences and EAP writing development: Lexical bundles in the TOEFL iBT writing section[J]. Journal of English for Academic Purposes 12(3): 214-225. Wray, A. 2002. Formulaic Language and the Lexicon[M]. Cambridge: CUP. 曹宇，文秋芳，张钫炜．2016．中国英语学习者英语程式语表征方式研究#8212;#8212;来自听觉搭配成分词判断的证据[J]．外语与外语教学（4）：21-27. 胡元江．2011．口语产出中的语块研究：回顾与展望[J]．外语教学理论与实践（2）： 55-63．胡元江．2015．基于语料库的英语专业高年级学生口语词块结构特征研究[J]．外语研究（5）：26-30. 胡元江，娄喜祥．2011．程式化语言的多视角，多维度研究#8212;#8212;《程式化语言》评介[J]．外语教学与研究（4）：626-632．李德俊．2014．短语及其自动识别研究评述[J]．外语研究（6）：8-13．马广惠．2009．英语专业学生二语限时写作中的词块研究[J]．外语教学与研究（1）：54-60．潘璠．2016．语料库驱动的英语本族语和中国作者期刊论文词块结构和功能对比研究[J]．外语与外语教学（4）：115-123．文秋芳，梁茂成，晏小琴．2008．中国学生英语口笔语语料库[M]．北京：外语教学与研究出版社．

2. 研究的基本内容、问题解决措施及方案

The present paper makes research on how the production of N-grams is predictive of human judgments of L2 writing proficiency. The specific research question is: what indices of bigram and trigram use are predictive of human judgments of writing proficiency? Based on the self-compiled corpus, this current study aims to better comprehend the proporty of L2 productive phraseological knowledge and its relationship to writing proficiency. The corpus comprises 3251 compositions written by 1580 Chinese freshmen of different majors in Nanjing Tech University and totals about one hundred thousand words. Given a specific topic, students are required to write an argumentative paper of no less than 200 words, and writing time should not exceed 40 minutes. Any reference materials are not allowed to use during the writing. After completion, every composition is assigned a grade by the English teachers of the College of Foreign Languages according to the unified criteria. In order to research student bigram and trigram use in their compositions, the present study makes use of Tool for the Automatic Analysis of Lexical Sophistication 2.0 (TAALES: Kyle Crossley, 2015). TAALES is an automatic text analysis tool, which covers indices concerned with lexical sophistication such as word frequency, range, academic language for n-grams and so on. It#8217;s well known that TAALES 2.0 covers various indices obtained from the Corpus of Contemporary American English (COCA; Davies, 2009) and the BNC as reference corpora (BNC Consortium, 2007). Nevertheless, it is noteworthy that there exists indices based on the five different subsections of COCA (e.g. academic, fiction, magazines, news, and spoken) instead of the whole corpus. With regard to the current study, the related indice is the academic subsection of COCA , including n-gram frequency and proportion indices; range indices and association scores.

剩余内容已隐藏，您需要先支付 10元 才能查看该篇文章全部内容！立即支付

注册

找回密码