Why Detect Grammatical Errors? 03/10/2011 Detecting Grammatical Errors with Treebank-Induced,
by user
Comments
Transcript
Why Detect Grammatical Errors? 03/10/2011 Detecting Grammatical Errors with Treebank-Induced,
03/10/2011 Why Detect Grammatical Errors? Detecting Grammatical Errors with Treebank-Induced, Probabilistic Parsers • Grammar checker – Useful tool for growing number of people writing at work or home The All Ireland Linguistics Olympiad (AILO) is a contest in which secondary school develop their own strategies for … • Computer-assisted language learning (CALL) Joachim Wagner 2011-10-03 – Grammar checking for L2 – Error diagnosis and feedback – Learner modelling (in tutoring systems) – Automatic essay grading Supervisors: Jennifer Foster and Josef van Genabith National Centre for Language Technology School of Computing, Dublin City University 1 Further Applications Hand-Crafted Grammars • Sentence ranking in such areas as – machine translation – natural language generation – optical character recognition – automatic speech recognition 2 8 • Labour-intensive, difficult to scale • Demo systems raised high expectations 3 • Coverage too low for unrestricted text – Various CALL research prototypes B – No analysis for 1/3 of sentences – In theory, no analysis for ungrammatical input • Automatic post-editing and evaluation • Selecting “quality” training material • Augmentative and alternative communication • Unfulfilled expectations caused scepticism in CALL about NLP in general 3 Treebank-Induced Grammars ParGram English LFG Core Grammar 2007 Setup: BNC without spoken material, poems and headings; no verb form errors 4 S1 -> X X->X NP X->SYM NP->DT DT NN SYM -> ‘A’ DT -> ‘a’ NN -> ‘a’ 5 x 600 K Test data (1.00) (0.50) (0.50) (1.00) (1.00) (1.00) (1.00) • Successful in other fields • Highly robust to unexpected input – Parse almost any input – Wide coverage of unrestricted text • Probabilistic Disambiguation Model • Variants and extensions to basic PCFG – We use the first stage parser of the Brown parser 5 6 1 03/10/2011 Focus on Basic Task Research Question • Can the output of existing probabilistic, datadriven parsers be exploited to judge the grammaticality of sentences? Error correction / feedback Error type classification Locating errors Sentence classification • Sentence-level grammaticality judgements -> Is the input sentence grammatical? 7 Important Factors Influencing Parse Probability Parse Probability • Probability of expanding start symbol to given tree Factors • Sentence length • Number of nodes • Part of speech • Lexical choice – Not the probability of the tree given the sentence 3rd tree 8 2nd tree Implication • Cannot use (constant) probability threshold No Best tree Ungrammatical 10-62 10-62 10-61 10-60 P(tree+yield | grammar) 10-61 Grammatical 10-60 P(tree+yield | grammar) 9 Grammaticality and Probability 10 Observations (Foster Corpus) • How does grammaticality influence the probability of the most likely analysis? • Parallel error corpus (Foster 2005) – 923 ungrammatical sentences – 1 or 2 corrections each – 2048 sentences in total 11 Effect of correcting ungrammatical sentences 1132 sentence pairs in total 12 2 03/10/2011 Observations (Gonzaga Corpus) Effect of correcting ungrammatical sentences 500 sentence pairs in total Observations (BNC) Effect of Errors on the (Logarithmic) Probability of the Best Parse -76/-65 • Agreement errors involving an article most likely to have negative effect Observations Conclusions • Manually correcting an ungrammatical sentence often increases its probability • Big variance among different sentences • Grammaticality affects parse probability • Limits of candidate correction approach -66/-52 Yeah that’s an ideas/idea 14 Summary Effect of Errors on Parse Probability • Real-word spelling errors more likely to lower probability than agreement and verb form errors Anyway, the/they left us alone. Effect of distorting grammatical sentences 199,600 sentence pairs in total 13 – van Zaanen (1999) – Lee and Seneff (2006) Same Sentence Length (250 pairs) 45 number of pairs 40 • Missing word errors often increase the probability -64/-71 30 25 20 15 10 5 23 9 17 19 21 7 5 3 11 13 15 1 -1 -3 -5 -7 0 -1 3 -1 1 -9 Doreen Ɛ/sounded incredulous. 35 rise of lo garithmic parse p ro b, interva l +/- 1 15 16 Using a Probabilistic Model of Ungrammatical Language Vanilla Treebank Error Creation + Tree Adjustment Method Overview D+R Input Sentence Grammar 1 Parsing Grammar 2 Parsing Distorted Grammar Distorted Treebank Method Distorted Treebank Method CFG Rules Found in Tree PCFG Pruning Discriminative Rule Method Basic Decision Rules Machine Learning Hand-Crafted Grammar Hand-Crafted Grammar Method Hand-Crafted Grammar Method POS n-grams POS n-gram Method POS n-gram Method Resources P2 P1 APP/EPP Method Treebank Grammar Distorted Treebank P1/P2 < C ? 17 All Combined Methods X+N+D X+N 18 3 03/10/2011 Artificial Error Corpus Authentic Error Corpus (Foster) Error Analysis Data, Training and Testing Chosen Error Types Artificial Error Corpus Automatic Error Creation Modules Cross-Validation Authentic Error Corpora (ICLE, etc.) 1234 56789X Common Grammatical Error Applied to BNC (Big) 1st Test (x10) Training Final Test 19 20 Evaluation Measures Instance of ROC Analysis • Receiver operating characteristics • Precision, F-Score and overall accuracy • Signal detection -> medical diagnostics -> machine learning – Depend on error density • Rotates accuracy graph 90° counter clockwise • Misclassification costs unknown • Proposal: measure accuracy on grammatical and ungrammatical data separately – Point in plane for single classifier – Curve for varying a threshold parameter 100% True positive rate (recall) Accuracy on grammatical data 100% 80% 60% 40% 20% 0% Selecting Optimal Classifiers (1/3) 60% 40% 20% 0% 0% 21 80% 20% 40% 60% 80% 100% Accuracy on ungrammatical data 0% 20% 40% 60% 80% 100% False positive rate (fallout) 22 Selecting Optimal Classifiers (2/3) • Elimination of inferior classifiers • Stochastic classifier interpolation – Accuracy lower on both scales – Linear combination in accuracy plane Classifier 1 Classifier 1 Random choice between classifiers 1 and 2 Classifier 2 Classifier 2 Region of degradation Region of degradation 23 24 4 03/10/2011 Selecting Optimal Classifiers (3/3) Tuning the Accuracy Tradeoff (1/2) • Convex hull including trivial classifiers • Some basic methods: threshold parameter • Decision tree classifier: – ROCCH method in machine learning – Varying error density of training data • Difficult to control Classifier 1 – Voting Classifier 2 • Trees trained on subsets of training data • Apply threshold to number of votes for “ungrammatical“ • Majority vote = threshold N/2 25 Tuning the Accuracy Tradeoff (2/2) 26 Tuning the Accuracy Tradeoff (2/2) 1 Voting with 12 Decision Trees (Distorted Treebank Method) 0.8 Accuracy on grammatical data Accuracy on grammatical data 1 0.6 0.4 0.2 0 1 0.8 2 Voting with 12 Decision Trees (Distorted Treebank Method) 3 0.6 12 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Accuracy on ungrammatical data 0 27 Results (1/3) 0.2 0.4 0.6 0.8 1 28 Accuracy on ungrammatical data Distorted Treebank Method Basic Methods (Basic Decision Rules) Distorted Treebank Method APP/EPP Method POS n-gram Method 29 30 5 03/10/2011 Summary Distorted Treebank Method Machine LearningEnhanced Methods • Methods – – – – Three methods using probabilistic parsing Implementation of baseline methods Combination of methods using classifiers Training and evaluation independent of error density • Lessons Learned – Grammaticality depends on context – Hand-crafted grammar discriminates less well than expected – ROC convex hull for selecting classifiers POS n-gram Method 31 32 Thank You! Ideas for Future Work • Use class probability estimates for tuning accuracy tradeoff • Test on 55,000 word ICLE sub-corpus annotated by Rozovskaya and Roth 2010 • Include more methods in evaluation Jennifer Foster Josef van Genabith Monica Ward Djamé Seddah – Skipgrams (Sun et al., 2007) – Candidate correction approaches • Work on locating errors • More ideas at the end of each chapter National Centre for Language Technology School of Computing, Dublin City University 33 Publications 2011 • Publications 2009 Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux, Joakim Nivre, Deirdre Hogan and Josef van Genabith (to appear Nov 2011): From News to Comment: Resources and Benchmarks for Parsing the Language of Web 2.0. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand. • Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner and Josef van Genabith (to appear Oct 2011): Comparing the use of edited and unedited text in parser selftraining. In Proceedings of the 12th International Conference on Parsing Technologies (IWPT 2011), Dublin, Ireland • Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux and Stephen Hogan (2011): #hardtoparse: POS Tagging and Parsing the Twitterverse. In Proceedings of the Workshop on Analyzing Microtext at the Twenty-Fifth Conference on Artificial Intelligence (AAAI-11), 8 August 2011, Hyatt Regency Hotel, San Francisco 34 35 • Joachim Wagner and Jennifer Foster (2009): The effect of correcting grammatical errors on parse probabilities. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT'09), Paris, France, 7th-9th October, 2009 • Joachim Wagner, Jennifer Foster and Josef van Genabith (2009): Judging Grammaticality: Experiments in Sentence Classification. In CALICO Journal, pages 474-490, volume 26, number 3 36 6 03/10/2011 Publications 2008 • Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Adapting a WSJTrained Parser to Grammatically Noisy Text. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Short Papers, pages 221-224, Columbus, OH, June 15-20, 2008 • Deirdre Hogan, Jennifer Foster, Joachim Wagner and Josef van Genabith (2008): Parser-Based Retraining for Domain Adaptation of Probabilistic Generators (Title of early draft: Investigating the Effect of Domain Variation on Generation Performance). In Proceedings of the 5th International Natural Language Generation Conference (INLG08), Salt Fork Park, Ohio, June 12-14, 2008 • Publications 2007 Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Using Decision Trees to Detect and Classify Grammatical Errors. Talk presented jointly by Jennifer and me at the Calico '08 Workshop on Automatic Analysis of Learner Language: Bridging Foreign Language Teaching Needs and NLP Possibilities, University of San Francisco, March 18 and 19, 2008, PDF • Joachim Wagner, Djamé Seddah, Jennifer Foster and Josef van Genabith (2007): CStructures and F-Structures for the British National Corpus. In Proceedings of the Twelfth International Lexical Functional Grammar Conference (LFG07), pages 418438, CSLI Publications, Stanford University, July 28-30, 2007, PDF from publisher website, DORAS repository • Joachim Wagner, Jennifer Foster and Josef van Genabith (2007): A Comparative Evaluation of Deep and Shallow Approaches to the Automatic Detection of Common Grammatical Errors. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) , Prague, June 28-30, 2007 (Extended version presented at the Summer 2007 ParGram meeting in Palo Alto.) • Jennifer Foster, Joachim Wagner, Djamé Seddah and Josef van Genabith (2007): Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain SelfTraining. In Proceedings of the 10th International Conference on Parsing Technologies (IWPT 2007), Prague, June 23-24, 2007 37 Publications 2005 and 2006 38 Publications 2004 • Joachim Wagner (2008): Nadja Nesselhauf, Collocations in a Learner Corpus. Book review in Machine Translation Vol 20, No 4, March 2006 [sic], pages 301-303, DOI: 10.1007/s10590-007-9028-8, Draft PDF • Petra Ludewig and Joachim Wagner (2004): Collocations - mediating between lexical abstractions and textual concretions. In Proc. of the sixth TALC conference, pages 32 -33, Granada, Spain - Handout • Joachim Wagner, Jennifer Foster and Josef van Genabith (2006): Detecting Grammatical Errors Using Probabilistic Parsing. Talk presented by Jennifer at the Workshop on Interfaces of Intelligent Computer-Assisted Language Learning, Ohio State University, December 17, 2006, • Cara Greene, Katrina Keogh, Thomas Koller, Joachim Wagner, Monica Ward and Josef van Genabith (2004): Using NLP Technology in CALL. In NLP and Speech Technologies in Advanced Language Learning Systems - Proc. of InSTIL/ICALL2004 Symposium on Computer Assisted Language Learning, ed. Rodolfo Delmonte, Philippe Delcloque and Sara Tonelli, pages 55 - 58, Venice, Italy - Handout, more • Gareth J. F. Jones, Michael Burke, John Judge, Anna Khasin, Adenike LamAdesina and Joachim Wagner (2005): Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval. In Multilingual Information Access for Text, Speech and Images: 5th Workshop of the Cross-Language Evaluation Forum, Carol Peters, Paul Clough, Julio Gonzalo, G.Jones, M.Kluck and B.Magnini (Eds.), Volume 3491 of Lecture Notes in Computer Science, pages 207 - 220, Springer, Heidelberg, Germany (in print), 2005. • Joachim Wagner (2004): A false friend exercise with authentic material retrieved from a corpus. In NLP and Speech Technologies in Advanced Language Learning Systems - Proc. of InSTIL/ICALL2004 Symposium on Computer Assisted Language Learning, pages 115 - 118, Venice, Italy Poster, more 39 40 Pre-PhD Publications • Joachim Wagner (2003): Datengesteuerte maschinelle Übersetzung mit flachen Analysestrukturen, Master's thesis, Universität Osnabrück, Germany • Jahn-Takeshi Saito, Joachim Wagner, Graham Katz, Philip Reuter, Michael Burke, and Sabine Reinhard (2002): Evaluation of GermaNet: Problems Using GermaNet for Automatic Word Sense Disambiguation. In Proc. of the LREC Workshop on WordNet Structure and Standardization and how THese Affect WordNet Applications and Evaluation, pages 14-29, Las Palmas de Gran Canaria • Norman Kummer and Joachim Wagner (2002): Phrase processing for detecting collocations with KoKS, In online Proc. of Colloc02 Workshop on Computational Approaches to Collocations, http://www.ai.univie.ac.at/colloc02/, Vienna, Austria more • Arno Erpenbeck, Britta Koch, Norman Kummer, Philip Reuter, Patrick Tschorn and Joachim Wagner (2002): KOKS - Korpusbasierte Kollokationssuche, technical report (Abschlussbericht), Universität Osnabrück, Germany 41 7