Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,
by user
Comments
Transcript
Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,
DCU: Aspect-based Polarity Classification for SemEval Task 4 Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova, Jennifer Foster and Lamia Tounsi CNGL Centre for Global Intelligent Content, National Centre for Language Technology, School of Computing, Dublin City University, Dublin, Ireland Sub-Task B: Aspect Term Polarity Prediction Input: Classifier: Output: I have had so many problems with the computer. Choose one of the 4 classes positive, negative, neutral and conflict. negative + “the computer”, position 34-45 System Architecture Feature Extraction Pre-processing Normaliser • • • • • • • • • • For each grid point (C, γ) Aspect Term Position Corrector signs, e.g. dashes quotes soft-hyphen non-breakable space (U+00a0) emoticons text message jargon, e.g. U, gr8 tokenisation PTB-style escape round brackets conflate other brackets [{<>}] * changes due to normalisation Final Model Parameter Optimisation • • • • Train SVM on full training data with given C and γ For each cross-validation fold MPQA SentiWordNet General Inquirer Bing Liu’s Opinion Lexicon Train SVM on data \ fold Evaluate accuracy on fold Calculate average accuracy Pick best C and γ Making Predictions Lexicon Combination Constituent Parser • • • • review grammars (SANCL 2012) varying amount of training data number of seeds number of cycles Collect SVM predictions for test data SLSeg Discourse Parser Rule-based Feature Extractor Lorg product G1, …, G12 Lorg product G13, …, G24 Stanford Dependency Converter … • • • token distance discourse chunk distance dependency path distance For each Transformation Build XML (POS, sentiment words, aspect term) N-gram Extractor Feature Selector / Filter Table 1: N-Gram Feature Example . Lorg product G97, …, G108 Combine Back-off (flat tree) Combine (Vote) Back-off (PTB grammar) . Transformations: A = aspect, L = lowercase, S = score, R = restrict to certain POS, P = annotate POS POS Extractor Grid Search Example Results Accuracy (colour-coded) Error Analysis . . . • Sentiment not expressed explicitly: • SVM beats rule-based: . 73% The sushi is cut on blocks bigger than my cell phone. 72% γ = 0.1 I charge it at night and skip taking the cord with me because of the good battery life. . 71% • Non-obvious expression of negation: 70% . 69% The Management was less than accomodating [sic]. 68% . γ =1 C=1 C=10 Laptop Data, Feature Frequency Threshold 9 Table 2: Accuracy on training (5-fold cross-validation) and test sets • Conflict cases: The training data contains too few examples of conflict sentences for the system to learn to detect them. . Try the rose roll (not on menu). . The gnocchi literally melts in your mouth! . Only 2 usb ports ... seems kind of ... limited . • Rule-based beats SVM: . The chocolate raspberry cake is heavenly - not too sweet, but full of flavor. This research is supported by the Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL (www.cngl.ie)