Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,

by user

on 15-09-2016

Category: Documents

>> Downloads: 9

views

Report

Comments

Description

Download Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,

Transcript

Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,

DCU: Aspect-based Polarity Classification for SemEval Task 4
Joachim Wagner, Piyush Arora, Santiago Cortes, Utsab Barman, Dasha Bogdanova,
Jennifer Foster and Lamia Tounsi
CNGL Centre for Global Intelligent Content, National Centre for Language Technology,
School of Computing, Dublin City University, Dublin, Ireland
Sub-Task B: Aspect Term Polarity Prediction
Input:
Classifier:
Output:
I have had so many problems with the computer.
Choose one of the 4 classes
positive, negative, neutral and
conflict.
negative
+
“the computer”, position 34-45
System Architecture
Feature Extraction
Pre-processing
Normaliser
•
•
•
•
•
•
•
•
•
•
For each grid point (C, γ)
Aspect Term Position
Corrector
signs, e.g. dashes
quotes
soft-hyphen
non-breakable space (U+00a0)
emoticons
text message jargon, e.g. U, gr8
tokenisation
PTB-style
escape round brackets
conflate other brackets [{<>}]
* changes due to normalisation
Final Model
Parameter Optimisation
•
•
•
•
Train SVM on full
training data with
given C and γ
For each cross-validation fold
MPQA
SentiWordNet
General Inquirer
Bing Liu’s
Opinion Lexicon
Train SVM on data \ fold
Evaluate accuracy on fold
Calculate average accuracy
Pick best C and γ
Making
Predictions
Lexicon
Combination
Constituent Parser
•
•
•
•
review grammars (SANCL 2012)
varying amount of training data
number of seeds
number of cycles
Collect SVM
predictions for
test data
SLSeg
Discourse
Parser
Rule-based
Feature
Extractor
Lorg product G1, …, G12
Lorg product G13, …, G24
Stanford
Dependency
Converter
…
•
•
•
token distance
discourse chunk
distance
dependency path
distance
For each Transformation
Build XML
(POS, sentiment words, aspect term)
N-gram Extractor
Feature Selector / Filter
Table 1: N-Gram Feature Example
.
Lorg product G97, …, G108
Combine
Back-off
(flat tree)
Combine (Vote)
Back-off (PTB grammar)
.
Transformations:
A = aspect, L = lowercase, S = score,
R = restrict to certain POS,
P = annotate POS
POS Extractor
Grid Search Example
Results
Accuracy
(colour-coded)
Error Analysis
.
.
.
• Sentiment not expressed
explicitly:
• SVM beats rule-based:
.
73%
The sushi is cut on blocks
bigger than my cell phone.
72%
γ = 0.1
I charge it at night and skip taking the cord
with me because of the good battery life.
.
71%
• Non-obvious expression of
negation:
70%
.
69%
The Management was less
than accomodating [sic].
68%
.
γ =1
C=1
C=10
Laptop Data,
Feature Frequency
Threshold 9
Table 2: Accuracy on training (5-fold
cross-validation) and test sets
• Conflict cases: The training
data contains too few
examples of conflict
sentences for the system to
learn to detect them.
.
Try the rose roll (not on
menu).
.
The gnocchi literally melts in
your mouth!
.
Only 2 usb ports ... seems
kind of ... limited
.
• Rule-based beats SVM:
.
The chocolate raspberry
cake is heavenly - not too
sweet, but full of flavor.
This research is supported by the Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL (www.cngl.ie)