...

Using WordNet to Improve Reordering in Hierarchical Statistical Machine Translation Arefeh Kazemi

by user

on
Category: Documents
44

views

Report

Comments

Transcript

Using WordNet to Improve Reordering in Hierarchical Statistical Machine Translation Arefeh Kazemi
Using WordNet to Improve Reordering in Hierarchical
Statistical Machine Translation
Arefeh Kazemi *, Antonio Toral†, Aandy Way †
Department of Computer Engineering*
University of Isfahan
Isfahan
Iran
ADAPT Center†
School of Computing
Dublin City University
Dublin 9, Ireland
GWC 2016
Bucharest, 27-30 January 2016
Statistical Machine Translation (SMT)
Parallel corpora are available in several language pair
• Statistical Machine Translation
Data driven approach to machine translation
• Basic idea: use a parallel corpus as a training set of translation examples
• try to learn how to translate from past translation examples
•
English sentences
A computer is a general purpose device
Their parents were watching the news when it was raining.
We will be sitting in the class and studying tomorrow morning.
Some of you aren’t used to standing in front of the class but
don’t worry, gradually you will get used to it.
Farsi sentences
‫کامپیوتر یک وسیلهی همه منظوره است‬
.‫والدينشان وقتی باران میآمد داشتند اخبار تماشا ميكردند‬
.‫فردا صبح در کالس نشستهايم و داريم درس مي خوانيم‬
‫بعض ي از شماعادت نداريد جلوي كالس بايستيد اما نگران نباشيد كم كم عادت‬
.‫خواهيد كرد‬
2/20
Statistical Machine Translation
•
Generative story
•
Segmenting the source sentence into phrases
• Find the translation of each phrase
•
Order the translated phrases to make the target sentence
3/20
Statistical Machine Translation
•
Many translation hypothesis
•
•
•
Many sentence segmentation
Many candidate phrase translations
Many orders: 𝑛! possible permutations to order 𝑛 phrases
Rubah
ghahve az ruie
The brown fox
A brown fox
Foxes
The sepia fox
The brown fox on
A brown fox over
A brown fox in
Foxes over
The brown fox
A brown fox
Foxes
The sepia fox
•
from
on
over
in
sage tanbal
paryd.
lazy dog
Lazy dogs
the lazy dog
the lazy dogs
jumps
jumped
jump
has jumped
lazy dog jumps
the lazy dog jumped
lazy dog has jumped
dog jumped
over the lazy dog
from the lazy dogs
upon the lazy dog
over the lazy dogs
We should choose the best hypothesis
•
Define “best”
jumps
jumped
jump
has jumped
4/20
Statistical Machine Translation
•
What is the “best” translation?
•
Goal: translating foreign sentence 𝐹 into English sentence 𝐸
•
What is a good translation hypothesis?
•
Faithfulness to the source sentence 𝐸
•
•
Fluency
•
•
•
Transfer the meaning of the source sentence
Natural as an utterance in the target language 𝐸
Compromise between faithfulness and fluency
Ebest 
argmax
EEnglish sentences
Build probabilistic models of faithfulness and fluency
 argmax
faithfulness( F , E ) fluency( E )
Language mode
EEnglish sentences
P( F | E )  P( E )
Translation model`
5/20
Statistical Machine Translation
Ebest 
EEnglish sentences

•
argmax
argmax
faithfulness( F , E ) fluency( E )
EEnglish sentences
P( F | E )  P( E )
𝑃(𝐸)
Language model
• Frequency of English sentence 𝐸 in the English texts
• 𝑃 𝑠𝑚𝑎𝑙𝑙 𝑠𝑡𝑒𝑝 > 𝑃(𝑙𝑖𝑡𝑡𝑙𝑒 𝑠𝑡𝑒𝑝)
•
•
𝑃 𝐹𝐸
Translation model
• The probability of generating the English sentence 𝐸 by the foreign sentence 𝐹
• It Can not be calculated directly from the parallel corpus
• Decompose 𝑃 𝐹 𝐸 based on our generative story
•
6/20
Translation Model
•
Decompose 𝑃 𝐸 𝐹 based on our generative story
•
Swap
Monotone
Phrasal translation model
n
PTM   PTM ( f i | e i )
𝑓1
𝑓2
𝑓3
𝑓4
i 1
•
Reordering model
•
Assigns probability to each possible order
•
Orientation between pairs of source elements
•
𝑒4
𝑒2
𝑒3
Monotone and Swap orientation
PR 
•
𝑒1
 P (Ori | pair )
R
pair j Pairs
j
Reordering model will be combined with the other probabilities to find the best translation
Ebest  arg max EEnglishSentences P( F )  PR  PTM
7/20
State of the art Reordering models
PR 
P
R
pair j Pairs ( E )
(Ori | pair j )
•
Use machine learning algorithms to find the probability
•
Structure
•
•
Assign probability to different types of source elements
Features
•
Use different features to learn the probability
8/20
State of the art - Structures
Adjacent phrase pairs
[Huck et al,2013]
•
Pairs of words
[Huang et al,2013]
•
Predicate Argument Structure
[Xiong et al., 2012, Li et al., 2013]
•
Head and dependant
[Quirk 2005, Gao et al, 2011]
•
Dependant words
[This work]
•
9/20
State of the art - Structures
•
Motivation for using dependency parse tree
•
Different source sentences with the same dependency structure, have the same order
in the target
“subj” put “obj” on “prep-on”.
“subj” “obj” “prep-on”
“subj"
put
“obj"
he
puts
they
she
on
gozasht.
“Prep-on”
“subj”
“obj”
ra ruie “prep-on” gozasht
the book on
the table
ou
ketab
ra ruie miz
gozasht
put
the desk on
the ground
anha
miza
ra ruie zamin
gozashtand
put
her hand on
my shoulder
Ou
dast-ash ra ruie shane-am gozasht
10/20
State of the art - Features
Reordering model
Features Type
Features
Zens and Ney, 2006
Lexical
Surface forms of the source and target words,
Unsupervised class of the source and target words
Cherry, 2013
Lexical
Surface forms of frequent source and target words,
Unsupervised class of rare source and target words
Green et al. (2013)
Lexical,
Syntactic
Surface forms of the source words,
POS tags of the source words,
relative position of the source words, sentence length
Biasazza and Federico (2013),
Goto et al. (2013)
Lexical,
Syntactic
Surface forms and POS tags of the source words,
Surface forms and POS tags of the source context words
Gao et al. (2011),
Kazemi et al. (2015)
Lexical,
Syntactic
Surface forms of the source words, Dependency relations
This work
Lexical,
Syntactic,
Semantic
Surface forms of the source words, Dependency relations,
WordNet Synset of the source words
11
State of the Art – Our work
•
Motivations to use semantic features
•
Machine Learning point of view
•
•
Generalization from words seen in the training data to any of their synonyms
Machine Translation point of view
•
Adding in syntax is OK, but eventually we will need to add in semantics to improve MT
•
This paper: “Using WordNet to improve Reordering in Hierarchical Phrase-based
Statistical Machine Translation”
•
Special characteristic of the work
•
While “semantic structures” such as PAS, and SRL have been previously used in
reordering in MT, this is the first work that uses “semantic features”
12/20
Proposed method: Dependency-based Reordering Model with
Semantic features
Head
Dependant
•
Source dependency parse tree
det
The
amod
prep-over
brown fox jumped over the
Rubah ghahve
•
subj
amod
lazy
dog
az ruie sage tanbal paryd.
brown
the
fox
brown
jumped
fox
dog
lazy
jumped
dog
Depenant
dependant
fox
dog
Try to predict the orientation of head-dependant and dependant-dependant pairs
Find orientation from the word alignment between source and target words
• Use maximum entropy classifier to estimate ordering probability PR 
 PR (Ori | pairj )
•
pair j Pairs ( E )
13/20
Proposed method: Training phase
this is a sentence
In yek jomle ast
Parallel Corpus
English Sentences
this is a sentence
English and Farsi
Sentences
this is a sentence
in yek jomle ast
Word Aligner
dependency parser
English dependency
tree
Word alignment
Constituent pair
Extractor
(this,sentence),(is,sentence)
(a,sentence)(this,is)
(this,a)(is,a)
((this,sentence),M),((is,sentence), S)
((a,sentence),M)((this,is),M)
((this,a),M)((is,a),S)
Constituent pairs
Orientation Extractor
Feature extractor
14/20
Maximum Entropy Classifier
Experiments. Setup
•
Parallel English-Farsi corpus: Mizan
Train: 1,016,758
• Tune: 3000
• Test: 1000
•
•
Source-side dependency parser: Stanford dependency parser
•
Word alignment: Giza++
•
Extracted pairs from Train data set
Head-dependent: 6,391,255
• Dependant-dependant: 5,247,133
•
•
Baseline MT system
•
Moses implementation of Hierarchical Phrase-based model with standard settings
15/20
MT Results
•
The impact of using two constituent pairs and also different features on BLEU and
TER scores
•
6 MT systems according to one constituent type with and without synset as
features
head-dep with surface(ℎ𝑑 − 𝑠𝑢𝑟𝑓𝑎𝑐𝑒) [Gao,2011]
• head-dep with synset(ℎ𝑑 − 𝑠𝑦𝑛𝑠𝑒𝑡)
• head-dep with surface and synset(ℎ𝑑 − 𝑏𝑜𝑡ℎ)
• dep-dep with surface(𝑑𝑑 − 𝑠𝑢𝑟𝑓𝑎𝑐𝑒) [Gao,2011]
• dep-dep with synset(𝑑𝑑 − 𝑠𝑦𝑛𝑠𝑒𝑡)
• dep-dep with surface and synset(𝑑𝑑 − 𝑏𝑜𝑡ℎ)
•
•
compare our systems to the standard HPB-SMT system
16/20
MT Results
•
All the systems were evaluated based on two automatic metrics: BLEU and TER
•
We report average bleu scores across three different tuning runs
System
BLEU
TER
Avg
diff
p-value
Avg
diff
p-value
baseline
10.9
-
-
80.3
-
-
dd-surface
11.4
4.58%
0.00
79.7
-0.74%
0.01
dd-syn
11.3
3.66%
0.01
79.8
-0.62%
0.05
dd-both
11.5
5.50%
0.00
79.8
-0.62%
0.02
hd-surface
11.1
2.18%
0.08
80.9
0.74%
0.01
hd-syn
11.3
3.66%
0.00
80.5
0.24%
0.4
hd-both
11.1
2.18%
0.06
81.1
0.99%
0.00
The scores obtained by our reordering model between pairs of dependents are better than
those of baseline and hd- systems, based on both evaluation metrics
• The use of semantic features based on WordNet synsets leads to better scores for both
17/20
head-dep and dep-dep constituent pairs according to both evaluation metrics
•
•
except for dd- system according to TER, with a slight but insignificant increase (79.8 vs 79.7)
Conclusion
•
Dependency-based reordering model for HPB-SMT that predicts translation order
of head-dep and dep-dep constituent pairs
•
Use semantic features based on WordNet synset
•
First paper on dependency-based reordering for a pair other than Chinese-toEnglish
•
The inclusion of WordNet synsets has led to the best BLEU score in our
experiments, outperforming the baseline by 0.6 point absolute
18/20
Future Work
•
Investigate the extent to which using a WordNet-informed approach (as presented
here) outperforms an unsupervised method via clustering
•
In-depth human analysis of translations produced by our models to gain further
insights of exact contribution of WordNet to translate output
19/20
Thank you for your attention!
20/20
Fly UP