...

Why Detect Grammatical Errors? 03/10/2011 Detecting Grammatical Errors with Treebank-Induced,

by user

on
Category: Documents
77

views

Report

Comments

Transcript

Why Detect Grammatical Errors? 03/10/2011 Detecting Grammatical Errors with Treebank-Induced,
03/10/2011
Why Detect Grammatical Errors?
Detecting Grammatical Errors
with Treebank-Induced,
Probabilistic Parsers
• Grammar checker
– Useful tool for growing
number of people
writing at work or home
The All Ireland Linguistics
Olympiad (AILO) is a contest in
which secondary school develop
their own strategies for …
• Computer-assisted language learning (CALL)
Joachim Wagner
2011-10-03
– Grammar checking for L2
– Error diagnosis and feedback
– Learner modelling (in tutoring systems)
– Automatic essay grading
Supervisors:
Jennifer Foster and
Josef van Genabith
National Centre for Language Technology
School of Computing, Dublin City University
1
Further Applications
Hand-Crafted Grammars
• Sentence ranking in such areas as
– machine translation
– natural language generation
– optical character recognition
– automatic speech recognition
2
8
• Labour-intensive, difficult to scale
• Demo systems raised high expectations
3
• Coverage too low for unrestricted text
– Various CALL research prototypes
B
– No analysis for 1/3 of sentences
– In theory, no analysis for ungrammatical input
• Automatic post-editing and evaluation
• Selecting “quality” training material
• Augmentative and alternative communication
• Unfulfilled expectations caused scepticism in
CALL about NLP in general
3
Treebank-Induced Grammars
ParGram English LFG Core Grammar
2007 Setup: BNC
without spoken
material, poems
and headings; no
verb form errors
4
S1 -> X
X->X NP
X->SYM
NP->DT DT NN
SYM -> ‘A’
DT -> ‘a’
NN -> ‘a’
5 x 600 K
Test data
(1.00)
(0.50)
(0.50)
(1.00)
(1.00)
(1.00)
(1.00)
• Successful in other fields
• Highly robust to unexpected input
– Parse almost any input
– Wide coverage of unrestricted text
• Probabilistic Disambiguation Model
• Variants and extensions to basic PCFG
– We use the first stage parser of the Brown parser
5
6
1
03/10/2011
Focus on Basic Task
Research Question
• Can the output of existing probabilistic, datadriven parsers be exploited to judge the
grammaticality of sentences?
Error correction / feedback
Error type classification
Locating errors
Sentence classification
• Sentence-level grammaticality judgements
-> Is the input sentence grammatical?
7
Important Factors Influencing Parse
Probability
Parse Probability
• Probability of expanding start symbol to given
tree
Factors
• Sentence length
• Number of nodes
• Part of speech
• Lexical choice
– Not the probability of the tree given the sentence
3rd tree
8
2nd tree
Implication
• Cannot use (constant)
probability threshold
No
Best tree
Ungrammatical
10-62
10-62
10-61
10-60
P(tree+yield | grammar)
10-61
Grammatical
10-60
P(tree+yield | grammar)
9
Grammaticality and Probability
10
Observations (Foster Corpus)
• How does grammaticality influence the
probability of the most likely analysis?
• Parallel error corpus (Foster 2005)
– 923 ungrammatical sentences
– 1 or 2 corrections each
– 2048 sentences in total
11
Effect of correcting ungrammatical sentences
1132 sentence pairs in total
12
2
03/10/2011
Observations (Gonzaga Corpus)
Effect of correcting ungrammatical sentences
500 sentence pairs in total
Observations (BNC)
Effect of Errors on the (Logarithmic)
Probability of the Best Parse
-76/-65
• Agreement errors involving an article most likely to have
negative effect
Observations
Conclusions
• Manually correcting an
ungrammatical sentence
often increases its
probability
• Big variance among
different sentences
• Grammaticality affects
parse probability
• Limits of candidate
correction approach
-66/-52
Yeah that’s an ideas/idea
14
Summary
Effect of Errors on Parse Probability
• Real-word spelling errors more likely to lower probability
than agreement and verb form errors
Anyway, the/they left us alone.
Effect of distorting grammatical sentences
199,600 sentence pairs in total
13
– van Zaanen (1999)
– Lee and Seneff (2006)
Same Sentence Length (250 pairs)
45
number of pairs
40
• Missing word errors often increase the probability
-64/-71
30
25
20
15
10
5
23
9
17
19
21
7
5
3
11
13
15
1
-1
-3
-5
-7
0
-1
3
-1
1
-9
Doreen Ɛ/sounded incredulous.
35
rise of lo garithmic parse p ro b, interva l +/- 1
15
16
Using a Probabilistic Model of
Ungrammatical Language
Vanilla
Treebank
Error Creation + Tree Adjustment
Method Overview
D+R
Input Sentence
Grammar 1
Parsing
Grammar 2
Parsing
Distorted
Grammar
Distorted
Treebank
Method
Distorted
Treebank
Method
CFG Rules
Found in Tree
PCFG Pruning
Discriminative
Rule Method
Basic Decision
Rules
Machine Learning
Hand-Crafted
Grammar
Hand-Crafted
Grammar
Method
Hand-Crafted
Grammar
Method
POS n-grams
POS n-gram
Method
POS n-gram
Method
Resources
P2
P1
APP/EPP
Method
Treebank
Grammar
Distorted
Treebank
P1/P2 < C ?
17
All
Combined
Methods
X+N+D
X+N
18
3
03/10/2011
Artificial Error Corpus
Authentic Error
Corpus (Foster)
Error Analysis
Data, Training and Testing
Chosen Error Types
Artificial Error
Corpus
Automatic Error
Creation Modules
Cross-Validation
Authentic Error
Corpora (ICLE, etc.)
1234 56789X
Common
Grammatical Error
Applied to BNC
(Big)
1st
Test
(x10)
Training
Final
Test
19
20
Evaluation Measures
Instance of ROC Analysis
• Receiver operating characteristics
• Precision, F-Score and overall accuracy
• Signal detection -> medical diagnostics -> machine learning
– Depend on error density
• Rotates accuracy graph 90° counter clockwise
• Misclassification costs unknown
• Proposal: measure accuracy on grammatical
and ungrammatical data separately
– Point in plane for single classifier
– Curve for varying a threshold parameter
100%
True positive rate (recall)
Accuracy on grammatical data
100%
80%
60%
40%
20%
0%
Selecting Optimal Classifiers (1/3)
60%
40%
20%
0%
0%
21
80%
20%
40%
60%
80%
100%
Accuracy on ungrammatical data
0%
20%
40%
60%
80%
100%
False positive rate (fallout)
22
Selecting Optimal Classifiers (2/3)
• Elimination of inferior classifiers
• Stochastic classifier interpolation
– Accuracy lower on both scales
– Linear combination in accuracy plane
Classifier 1
Classifier 1
Random choice between classifiers 1 and 2
Classifier 2
Classifier 2
Region of
degradation
Region of
degradation
23
24
4
03/10/2011
Selecting Optimal Classifiers (3/3)
Tuning the Accuracy Tradeoff (1/2)
• Convex hull including trivial classifiers
• Some basic methods: threshold parameter
• Decision tree classifier:
– ROCCH method in machine learning
– Varying error density of training data
• Difficult to control
Classifier 1
– Voting
Classifier 2
• Trees trained on subsets of training data
• Apply threshold to number of votes for
“ungrammatical“
• Majority vote = threshold N/2
25
Tuning the Accuracy Tradeoff (2/2)
26
Tuning the Accuracy Tradeoff (2/2)
1
Voting with 12
Decision Trees
(Distorted Treebank
Method)
0.8
Accuracy on grammatical data
Accuracy on grammatical data
1
0.6
0.4
0.2
0
1
0.8
2
Voting with 12
Decision Trees
(Distorted Treebank
Method)
3
0.6
12
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Accuracy on ungrammatical data
0
27
Results (1/3)
0.2
0.4
0.6
0.8
1
28
Accuracy on ungrammatical data
Distorted Treebank Method
Basic Methods
(Basic Decision Rules)
Distorted Treebank
Method
APP/EPP Method
POS n-gram
Method
29
30
5
03/10/2011
Summary
Distorted Treebank Method
Machine LearningEnhanced Methods
• Methods
–
–
–
–
Three methods using probabilistic parsing
Implementation of baseline methods
Combination of methods using classifiers
Training and evaluation independent of error density
• Lessons Learned
– Grammaticality depends on context
– Hand-crafted grammar discriminates less well than
expected
– ROC convex hull for selecting classifiers
POS n-gram
Method
31
32
Thank You!
Ideas for Future Work
• Use class probability estimates for tuning
accuracy tradeoff
• Test on 55,000 word ICLE sub-corpus
annotated by Rozovskaya and Roth 2010
• Include more methods in evaluation
Jennifer Foster
Josef van Genabith
Monica Ward
Djamé Seddah
– Skipgrams (Sun et al., 2007)
– Candidate correction approaches
• Work on locating errors
• More ideas at the end of each chapter
National Centre for Language Technology
School of Computing, Dublin City University
33
Publications 2011
•
Publications 2009
Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux, Joakim Nivre,
Deirdre Hogan and Josef van Genabith (to appear Nov 2011): From News to
Comment: Resources and Benchmarks for Parsing the Language of Web 2.0. In
Proceedings of the 5th International Joint Conference on Natural Language
Processing (IJCNLP), Chiang Mai, Thailand.
•
Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner and Josef van Genabith (to
appear Oct 2011): Comparing the use of edited and unedited text in parser selftraining. In Proceedings of the 12th International Conference on Parsing
Technologies (IWPT 2011), Dublin, Ireland
•
Jennifer Foster, Ozlem Cetinoglu, Joachim Wagner, Joseph Le Roux and Stephen
Hogan (2011): #hardtoparse: POS Tagging and Parsing the Twitterverse. In
Proceedings of the Workshop on Analyzing Microtext at the Twenty-Fifth
Conference on Artificial Intelligence (AAAI-11), 8 August 2011, Hyatt Regency
Hotel, San Francisco
34
35
•
Joachim Wagner and Jennifer Foster (2009): The effect of correcting grammatical
errors on parse probabilities. In Proceedings of the 11th International Conference
on Parsing Technologies (IWPT'09), Paris, France, 7th-9th October, 2009
•
Joachim Wagner, Jennifer Foster and Josef van Genabith (2009): Judging
Grammaticality: Experiments in Sentence Classification. In CALICO Journal, pages
474-490, volume 26, number 3
36
6
03/10/2011
Publications 2008
•
Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Adapting a WSJTrained Parser to Grammatically Noisy Text. In Proceedings of the 46th Annual
Meeting of the Association for Computational Linguistics: Human Language
Technologies, Short Papers, pages 221-224, Columbus, OH, June 15-20, 2008
•
Deirdre Hogan, Jennifer Foster, Joachim Wagner and Josef van Genabith (2008):
Parser-Based Retraining for Domain Adaptation of Probabilistic Generators (Title of
early draft: Investigating the Effect of Domain Variation on Generation
Performance). In Proceedings of the 5th International Natural Language
Generation Conference (INLG08), Salt Fork Park, Ohio, June 12-14, 2008
•
Publications 2007
Jennifer Foster, Joachim Wagner, and Josef van Genabith (2008): Using Decision
Trees to Detect and Classify Grammatical Errors. Talk presented jointly by Jennifer
and me at the Calico '08 Workshop on Automatic Analysis of Learner Language:
Bridging Foreign Language Teaching Needs and NLP Possibilities, University of San
Francisco, March 18 and 19, 2008, PDF
•
Joachim Wagner, Djamé Seddah, Jennifer Foster and Josef van Genabith (2007): CStructures and F-Structures for the British National Corpus. In Proceedings of the
Twelfth International Lexical Functional Grammar Conference (LFG07), pages 418438, CSLI Publications, Stanford University, July 28-30, 2007, PDF from publisher
website, DORAS repository
•
Joachim Wagner, Jennifer Foster and Josef van Genabith (2007): A Comparative
Evaluation of Deep and Shallow Approaches to the Automatic Detection of
Common Grammatical Errors. In Proceedings of the 2007 Joint Conference on
Empirical Methods in Natural Language Processing and Computational Natural
Language Learning (EMNLP-CoNLL) , Prague, June 28-30, 2007 (Extended version
presented at the Summer 2007 ParGram meeting in Palo Alto.)
•
Jennifer Foster, Joachim Wagner, Djamé Seddah and Josef van Genabith (2007):
Adapting WSJ-Trained Parsers to the British National Corpus using In-Domain SelfTraining. In Proceedings of the 10th International Conference on Parsing
Technologies (IWPT 2007), Prague, June 23-24, 2007
37
Publications 2005 and 2006
38
Publications 2004
• Joachim Wagner (2008): Nadja Nesselhauf, Collocations in a Learner
Corpus. Book review in Machine Translation Vol 20, No 4, March 2006
[sic], pages 301-303, DOI: 10.1007/s10590-007-9028-8, Draft PDF
• Petra Ludewig and Joachim Wagner (2004): Collocations - mediating
between lexical abstractions and textual concretions. In Proc. of the sixth
TALC conference, pages 32 -33, Granada, Spain - Handout
• Joachim Wagner, Jennifer Foster and Josef van Genabith (2006): Detecting
Grammatical Errors Using Probabilistic Parsing. Talk presented by Jennifer
at the Workshop on Interfaces of Intelligent Computer-Assisted Language
Learning, Ohio State University, December 17, 2006,
• Cara Greene, Katrina Keogh, Thomas Koller, Joachim Wagner, Monica
Ward and Josef van Genabith (2004): Using NLP Technology in CALL. In NLP
and Speech Technologies in Advanced Language Learning Systems - Proc.
of InSTIL/ICALL2004 Symposium on Computer Assisted Language Learning,
ed. Rodolfo Delmonte, Philippe Delcloque and Sara Tonelli, pages 55 - 58,
Venice, Italy - Handout, more
• Gareth J. F. Jones, Michael Burke, John Judge, Anna Khasin, Adenike LamAdesina and Joachim Wagner (2005): Dublin City University at CLEF 2004:
Experiments in Monolingual, Bilingual and Multilingual Retrieval. In
Multilingual Information Access for Text, Speech and Images: 5th
Workshop of the Cross-Language Evaluation Forum, Carol Peters, Paul
Clough, Julio Gonzalo, G.Jones, M.Kluck and B.Magnini (Eds.), Volume
3491 of Lecture Notes in Computer Science, pages 207 - 220, Springer,
Heidelberg, Germany (in print), 2005.
• Joachim Wagner (2004): A false friend exercise with authentic material
retrieved from a corpus. In NLP and Speech Technologies in Advanced
Language Learning Systems - Proc. of InSTIL/ICALL2004 Symposium on
Computer Assisted Language Learning, pages 115 - 118, Venice, Italy Poster, more
39
40
Pre-PhD Publications
•
Joachim Wagner (2003): Datengesteuerte maschinelle Übersetzung mit flachen
Analysestrukturen, Master's thesis, Universität Osnabrück, Germany
•
Jahn-Takeshi Saito, Joachim Wagner, Graham Katz, Philip Reuter, Michael Burke,
and Sabine Reinhard (2002): Evaluation of GermaNet: Problems Using GermaNet
for Automatic Word Sense Disambiguation. In Proc. of the LREC Workshop on
WordNet Structure and Standardization and how THese Affect WordNet
Applications and Evaluation, pages 14-29, Las Palmas de Gran Canaria
•
Norman Kummer and Joachim Wagner (2002): Phrase processing for detecting
collocations with KoKS, In online Proc. of Colloc02 Workshop on Computational
Approaches to Collocations, http://www.ai.univie.ac.at/colloc02/, Vienna, Austria more
•
Arno Erpenbeck, Britta Koch, Norman Kummer, Philip Reuter, Patrick Tschorn and
Joachim Wagner (2002): KOKS - Korpusbasierte Kollokationssuche, technical report
(Abschlussbericht), Universität Osnabrück, Germany
41
7
Fly UP