基于Iwrite的大学生英语写作智能批改与教师批改的对比分析A Contrastive Study Between Intelligent Automated essay scoring and Teacher Scoring in English Writing of University Students Based on Iwrite毕业论文
2021-03-13 23:23:11
摘 要
本文以40篇中国大学生的英语作文中出现的写作错误为研究对象,经人工与自动评分系统分别进行检测与归类。这些英语作文是2013年“外研社杯”全国英语写作大赛复赛的作品,该比赛的参赛对象都是中国大学生。这次研究使用了iWrite这一智能写作评分系统来识别错误。本次研究采用了由外研社研发的新型智能写作系统iWrite来识别错误。本文将大学生写作错误分成了词法类、句法类、搭配类和技术规范四大类错误,而这四大类中又包含着34小类错误。通过对这些错误的识别与分类,本文探究了中国英语学习者写作中的错误频率以及自动评分系统对于各类错误的识别率。通过定量研究,本文在借助图表的基础上完成了对数据的分类,分析与解释。在经过统计分析后,研究结果表明:技术规范类是英语学习者在写作过程中最易犯的错误类型,其次是词法类错误与句法类错误。学生在搭配方面的错误相对较少。至于iWrite对于错误的识别率,本文发现iWrite对于技术规范类错误的识别率最高,对于搭配类错误的识别率则最低。
关键词: iWrite;错误分析;错误频率;错误识别率;中国英语学习者
Abstract
This paper investigated the errors recognized by human raters and Automated Essay Evaluation(AEE) system in 40 writings composed by Chinese English learners who study English as a second language(ESL), and provided an analysis on the frequencies of errors and the recognition rates of errors detected by AEE system-iWrite. The forty writings were written by Chinese university or college students who participated in the semi-final competition of the “FLTRP Cup” National English Writing Competition in 2013. The study employed iWrite system in the research, which was developed by Foreign Language Teaching and Research Press(FLTRP). The errors were divided into four categories, namely, morphology, syntax, collocation and mechanics, including overall 34 sub-types of errors. The data were classified, analyzed and interpreted through quantitative research, and presented in tables and charts. The findings of error frequencies suggested that mechanical errors were the errors most frequently made by ESL learners among the four categories, followed by morphological errors and syntactic errors. Students made relatively fewer errors in collocation. As for error recognition rates of iWrite, it was found that within the four categorizes, errors in mechanics had the highest recognition rate, while collocation errors owned the lowest recognition rate.
Key words: iWrite;error analysis;frequency of errors; recognition rate of errors;
ESL learners
Contents
1 Introduction 1
2 Literature Review 4
2.1 Theoretical framework 4
2.2 Review of related studies 5
3 Research Methodology 7
3.1 Research questions 7
3.2 Participants 7
3.3 Procedures 7
3.4 Data analysis 8
4 Results and Discussion 9
4.1 Frequency analysis of errors made by ESL learners 9
4.2 Recognition rates of errors recognized by iWrite 12
4.2.1 Overall recognition rates of iWrite 15
4.2.2 Detailed analysis of recognition rates of iwrite 15
5 Conclusion 19
5.1 Major findings 19
5.2 Implications 20
5.3 Limitations 20
5.4 Suggestion for further research 21
References 22
Appendix 25
Acknowledgements 27
Error Analysis of English Writings of Chinese ESL Learners Based on iWrite System
1 Introduction
English learners who study English as a second language (ESL) have the tendency to make errors in their English writing, which impairs their writing ability as well as bothers their teachers. In the late 1980s, researchers started to take an interest in ESL learners’ errors. According to Corder (1967) , the errors made by the ESL/EFL learners are significant because “they provide to the researcher evidence of how language is learned or acquired, what strategies or procedures the learner is employing in the discovery of the language” . James (1998) defined errors as “a register of their current perspective on the target language”. Many problems were discovered to be the causes for learners’ errors, including overgeneralization, ignorance of the rules of restriction, incomplete application of errors and false concepts hypothesized (Corder, 1971). James (1990) explored two potential of causes of errors: interlingual interference and intralingual interference. Interlingual errors are caused by the interference of the mother tongue. Intralingual errors are caused by the target language itself. Errors caused by learning strategies include analogy, grammatical errors, redundancy, over-correction and over-generalization (Alamin amp; Ahmed, 2012).
Automated Essay Evaluation (AEE) is a rather time consuming and expensive activity, the subjectivity of which can not be guaranteed during the grading process. Resorting to AEE systems, however, the students could get instant feedback and make revisions repeatedly, reducing the pressure as well as improving the effectiveness of teachers in essay evaluation (Grimes amp; Warschauer, 2010; Jiang, 2015). Grimes et al (2010) also found that conscious use of AEE could increase the motivation of students in writing and revision, allowing teachers to focus on higher level concerns instead of writing mechanics. AEE systems are applied not only in high-stakes commercial business, namely, testing companies, they are widely used to assist teachers in low-stakes classroom assessment, especially in universities as well. AEE systems abroad included Intelligent Essay Accessor (IEA), the Project Essay Grade (PEG), electronic essay rater (e-rater®), MY Access!®, Bayesian Essay Test Scoring System™ (BETSY) and so on ( Dikli, 2006). Those AEE systems mainly aim to assess essays written by native English students. As for ESL learners, the systems are limited in quantity as well as technology. In China, there are BingGuo English system, PiGai system and iWrite system for Chinese ESL learners to evaluate their English articles.
Except the merits of AEE systems, the validity and reliability of the current popular yet limited AEE systems in China remain doubtful. Reliability of AEE was determined by the topics and scores, while the validity was dependent on the degree AEE systems were able to make accurate and correct reflections and scores towards the evaluated essays (He, 2013). To understand the validity and reliability of AEE systems, the process of evaluation must be understood first. It was analyzed that text features would be extracted and aggregated to predict a score approximate to that of human raters (Chen, Zhang amp; Bejar, 2017). Those text features include macrofeatures and microfeatures. While Chen et al (2017) defined the former as a set of combined high-level features calculated for each essay which can generate the scores, the latter was defined as features embracing sets of lower-level features which can produce the macrofeature values. Only if the feature values extracted are effective and accurate can the scores of AEE be valid and useful.
iWrite was developed by FLTRP, who furnished automated evaluation from dimensions of language, content and structure. The current version is iWrite2.0, launched by professor Liang Maocheng from Beijing Foreign Language University in the Tenth International Teaching and Researching Conference on Chinese English Writing. Professor Liulei from YanShan University introduced the concept of Grammar Error Correction (GEC) achieved via the combination of rule-driven methods and data-driven methods at the conference. GEC included two levels: error recognition and error correction, which was the essential structure of iWrite system (U Classroom, 2016).
Considering the importance of above-mentioned features utilized to evaluate every essay, the study endeavors to research on the frequencies of errors made by ESL learners and error recognition rates of iWrite via analyzing the macrofeatures and microfeatures of forty articles written by ESL learners. As for the structure of the paper, chapter one provides a basic introduction of AEE systems, especially iWrite, chapter two analyzes the process of evaluating of AEE systems and concludes pertinent studies about AEE systems, chapter three researches on the frequencies of errors in the forty articles and the recognition rates of various types of errors recognized by iWrite, and the last chapter discusses the results and makes summaries.
2 Literature Review
In this chapter, the paper introduces the operation procedures of AEE system from the aspect of error classifications and summarizes the features and results of current studies pertinent to error analysis in ESL learners’ writings.
2.1 Theoretical framework
The current set of eight macrofeatures included: grammar, usage, mechanics, style, organization, development, lexical complexity, and content (Attali amp; Burstein, 2006). The relationship between macrofeatures and microfeatures is quite simple. Each macrofeature is compounded of various microfeatures, which would detect a particular type of error in writing. The value of the microfeature is the count of the respective errors. For instance, one microfeature detects punctuation errors, and the value of this microfeature is the number of punctuation errors found in an essay. In other words, the macrofeatures are called categories which can be recognized by AEE systems and the microfeatures are sub-feature level measurement of errors. Different AEE system could resort to different models of features. For example, there was grammar, usage, mechanics, and style (GUMS) feature model adopted by e-rater, and there was a modified version of 6 1 trait scoring to rate student writing samples used by National Writing Project (NWP) (Bellamy, 2005; Quinlan, Higgins amp; Wolff, 2009).
Richards (1971) argued that learners’ errors could be divided into two types: interlingual errors resulting from interference from the mother’s tongue, and intralingual errors which resulted from the process of learning the second language itself, and did not exhibit any influence from the first language. As for the former, we can have some syntactic, phonetic, morphological or semantic features of the learner’s mother tongue being reflected in ones second language speech. Linguistic category taxonomies classified errors according to the language component or the particular linguistic constituent, such as preposition errors and verb phrase errors, and surface strategy taxonomy highlighted the ways surface structures were altered, such as the omission or deformation of some necessary items or addition of unnecessary ones (Jeptarus amp; Ngene, 2017). As for iWrite, it adopts morphology, syntax, collocation as macrofeatures, and includes overall 34 types of errors as microfeatures based on linguistic features of English.
The process of error recognition of AEE systems is based on the Error Analysis Approach proposed by Corder. According to Corder (1971), the identification of error constituted recognition, followed by description of the correct form of that error and how the error deviated from the correct form, and the last step was to explain the causes of the errors (Corder, 1971). Similarly, the AEE systems recognize errors first, and then categorize those errors according to different evaluation models of features, after which the correct form of the errors would be given. This process in fact mainly includes the recognition and description, and excludes the explanation of the causes, no matter it is caused by intrusion of mother tongue or the “ignorance of target language”(Zhan, 2015).
2.2 Review of related studies
There have been studies about certain types of errors made by ESL learners. He (2009) divided lexical errors into formal errors and semantic errors, and he found that lexical errors account for 59.33% of the total language errors and the most frequent lexical errors are spelling errors after analyzing 290 compositions made by Chinese ESL learners. Zhang (2015) studied 64 writing samples and categorized 10 types of common errors. Verb tense errors were the most frequently made errors in accordance with Zhang (2015), followed by article errors and spelling errors. Wang (2007) classified errors into three levels: substance, text and discourse, which can be sub-divided into 61 types. Wang (2007) concluded that verb errors and syntactic errors are the most frequently made errors. More researchers, however, chose case study method when analyzing the errors made by Chinese ESL learners. Liu (2014) analyzed punctuation errors made by ESL learners, while Liu (2008) studied interlingual errors and intralingual errors in a Chinese ESL learner’s writing. The above studies made error analysis by manually recognizing and categorizing errors, yet the paper not only tries to make error frequency analysis but also computes the recognition rates of AEE system towards various types of errors.