Using WordNet to Improve Reordering in Hierarchical Statistical Machine Translation Arefeh Kazemi
by user
Comments
Transcript
Using WordNet to Improve Reordering in Hierarchical Statistical Machine Translation Arefeh Kazemi
Using WordNet to Improve Reordering in Hierarchical Statistical Machine Translation Arefeh Kazemi *, Antonio Toral†, Aandy Way † Department of Computer Engineering* University of Isfahan Isfahan Iran ADAPT Center† School of Computing Dublin City University Dublin 9, Ireland GWC 2016 Bucharest, 27-30 January 2016 Statistical Machine Translation (SMT) Parallel corpora are available in several language pair • Statistical Machine Translation Data driven approach to machine translation • Basic idea: use a parallel corpus as a training set of translation examples • try to learn how to translate from past translation examples • English sentences A computer is a general purpose device Their parents were watching the news when it was raining. We will be sitting in the class and studying tomorrow morning. Some of you aren’t used to standing in front of the class but don’t worry, gradually you will get used to it. Farsi sentences کامپیوتر یک وسیلهی همه منظوره است .والدينشان وقتی باران میآمد داشتند اخبار تماشا ميكردند .فردا صبح در کالس نشستهايم و داريم درس مي خوانيم بعض ي از شماعادت نداريد جلوي كالس بايستيد اما نگران نباشيد كم كم عادت .خواهيد كرد 2/20 Statistical Machine Translation • Generative story • Segmenting the source sentence into phrases • Find the translation of each phrase • Order the translated phrases to make the target sentence 3/20 Statistical Machine Translation • Many translation hypothesis • • • Many sentence segmentation Many candidate phrase translations Many orders: 𝑛! possible permutations to order 𝑛 phrases Rubah ghahve az ruie The brown fox A brown fox Foxes The sepia fox The brown fox on A brown fox over A brown fox in Foxes over The brown fox A brown fox Foxes The sepia fox • from on over in sage tanbal paryd. lazy dog Lazy dogs the lazy dog the lazy dogs jumps jumped jump has jumped lazy dog jumps the lazy dog jumped lazy dog has jumped dog jumped over the lazy dog from the lazy dogs upon the lazy dog over the lazy dogs We should choose the best hypothesis • Define “best” jumps jumped jump has jumped 4/20 Statistical Machine Translation • What is the “best” translation? • Goal: translating foreign sentence 𝐹 into English sentence 𝐸 • What is a good translation hypothesis? • Faithfulness to the source sentence 𝐸 • • Fluency • • • Transfer the meaning of the source sentence Natural as an utterance in the target language 𝐸 Compromise between faithfulness and fluency Ebest argmax EEnglish sentences Build probabilistic models of faithfulness and fluency argmax faithfulness( F , E ) fluency( E ) Language mode EEnglish sentences P( F | E ) P( E ) Translation model` 5/20 Statistical Machine Translation Ebest EEnglish sentences • argmax argmax faithfulness( F , E ) fluency( E ) EEnglish sentences P( F | E ) P( E ) 𝑃(𝐸) Language model • Frequency of English sentence 𝐸 in the English texts • 𝑃 𝑠𝑚𝑎𝑙𝑙 𝑠𝑡𝑒𝑝 > 𝑃(𝑙𝑖𝑡𝑡𝑙𝑒 𝑠𝑡𝑒𝑝) • • 𝑃 𝐹𝐸 Translation model • The probability of generating the English sentence 𝐸 by the foreign sentence 𝐹 • It Can not be calculated directly from the parallel corpus • Decompose 𝑃 𝐹 𝐸 based on our generative story • 6/20 Translation Model • Decompose 𝑃 𝐸 𝐹 based on our generative story • Swap Monotone Phrasal translation model n PTM PTM ( f i | e i ) 𝑓1 𝑓2 𝑓3 𝑓4 i 1 • Reordering model • Assigns probability to each possible order • Orientation between pairs of source elements • 𝑒4 𝑒2 𝑒3 Monotone and Swap orientation PR • 𝑒1 P (Ori | pair ) R pair j Pairs j Reordering model will be combined with the other probabilities to find the best translation Ebest arg max EEnglishSentences P( F ) PR PTM 7/20 State of the art Reordering models PR P R pair j Pairs ( E ) (Ori | pair j ) • Use machine learning algorithms to find the probability • Structure • • Assign probability to different types of source elements Features • Use different features to learn the probability 8/20 State of the art - Structures Adjacent phrase pairs [Huck et al,2013] • Pairs of words [Huang et al,2013] • Predicate Argument Structure [Xiong et al., 2012, Li et al., 2013] • Head and dependant [Quirk 2005, Gao et al, 2011] • Dependant words [This work] • 9/20 State of the art - Structures • Motivation for using dependency parse tree • Different source sentences with the same dependency structure, have the same order in the target “subj” put “obj” on “prep-on”. “subj” “obj” “prep-on” “subj" put “obj" he puts they she on gozasht. “Prep-on” “subj” “obj” ra ruie “prep-on” gozasht the book on the table ou ketab ra ruie miz gozasht put the desk on the ground anha miza ra ruie zamin gozashtand put her hand on my shoulder Ou dast-ash ra ruie shane-am gozasht 10/20 State of the art - Features Reordering model Features Type Features Zens and Ney, 2006 Lexical Surface forms of the source and target words, Unsupervised class of the source and target words Cherry, 2013 Lexical Surface forms of frequent source and target words, Unsupervised class of rare source and target words Green et al. (2013) Lexical, Syntactic Surface forms of the source words, POS tags of the source words, relative position of the source words, sentence length Biasazza and Federico (2013), Goto et al. (2013) Lexical, Syntactic Surface forms and POS tags of the source words, Surface forms and POS tags of the source context words Gao et al. (2011), Kazemi et al. (2015) Lexical, Syntactic Surface forms of the source words, Dependency relations This work Lexical, Syntactic, Semantic Surface forms of the source words, Dependency relations, WordNet Synset of the source words 11 State of the Art – Our work • Motivations to use semantic features • Machine Learning point of view • • Generalization from words seen in the training data to any of their synonyms Machine Translation point of view • Adding in syntax is OK, but eventually we will need to add in semantics to improve MT • This paper: “Using WordNet to improve Reordering in Hierarchical Phrase-based Statistical Machine Translation” • Special characteristic of the work • While “semantic structures” such as PAS, and SRL have been previously used in reordering in MT, this is the first work that uses “semantic features” 12/20 Proposed method: Dependency-based Reordering Model with Semantic features Head Dependant • Source dependency parse tree det The amod prep-over brown fox jumped over the Rubah ghahve • subj amod lazy dog az ruie sage tanbal paryd. brown the fox brown jumped fox dog lazy jumped dog Depenant dependant fox dog Try to predict the orientation of head-dependant and dependant-dependant pairs Find orientation from the word alignment between source and target words • Use maximum entropy classifier to estimate ordering probability PR PR (Ori | pairj ) • pair j Pairs ( E ) 13/20 Proposed method: Training phase this is a sentence In yek jomle ast Parallel Corpus English Sentences this is a sentence English and Farsi Sentences this is a sentence in yek jomle ast Word Aligner dependency parser English dependency tree Word alignment Constituent pair Extractor (this,sentence),(is,sentence) (a,sentence)(this,is) (this,a)(is,a) ((this,sentence),M),((is,sentence), S) ((a,sentence),M)((this,is),M) ((this,a),M)((is,a),S) Constituent pairs Orientation Extractor Feature extractor 14/20 Maximum Entropy Classifier Experiments. Setup • Parallel English-Farsi corpus: Mizan Train: 1,016,758 • Tune: 3000 • Test: 1000 • • Source-side dependency parser: Stanford dependency parser • Word alignment: Giza++ • Extracted pairs from Train data set Head-dependent: 6,391,255 • Dependant-dependant: 5,247,133 • • Baseline MT system • Moses implementation of Hierarchical Phrase-based model with standard settings 15/20 MT Results • The impact of using two constituent pairs and also different features on BLEU and TER scores • 6 MT systems according to one constituent type with and without synset as features head-dep with surface(ℎ𝑑 − 𝑠𝑢𝑟𝑓𝑎𝑐𝑒) [Gao,2011] • head-dep with synset(ℎ𝑑 − 𝑠𝑦𝑛𝑠𝑒𝑡) • head-dep with surface and synset(ℎ𝑑 − 𝑏𝑜𝑡ℎ) • dep-dep with surface(𝑑𝑑 − 𝑠𝑢𝑟𝑓𝑎𝑐𝑒) [Gao,2011] • dep-dep with synset(𝑑𝑑 − 𝑠𝑦𝑛𝑠𝑒𝑡) • dep-dep with surface and synset(𝑑𝑑 − 𝑏𝑜𝑡ℎ) • • compare our systems to the standard HPB-SMT system 16/20 MT Results • All the systems were evaluated based on two automatic metrics: BLEU and TER • We report average bleu scores across three different tuning runs System BLEU TER Avg diff p-value Avg diff p-value baseline 10.9 - - 80.3 - - dd-surface 11.4 4.58% 0.00 79.7 -0.74% 0.01 dd-syn 11.3 3.66% 0.01 79.8 -0.62% 0.05 dd-both 11.5 5.50% 0.00 79.8 -0.62% 0.02 hd-surface 11.1 2.18% 0.08 80.9 0.74% 0.01 hd-syn 11.3 3.66% 0.00 80.5 0.24% 0.4 hd-both 11.1 2.18% 0.06 81.1 0.99% 0.00 The scores obtained by our reordering model between pairs of dependents are better than those of baseline and hd- systems, based on both evaluation metrics • The use of semantic features based on WordNet synsets leads to better scores for both 17/20 head-dep and dep-dep constituent pairs according to both evaluation metrics • • except for dd- system according to TER, with a slight but insignificant increase (79.8 vs 79.7) Conclusion • Dependency-based reordering model for HPB-SMT that predicts translation order of head-dep and dep-dep constituent pairs • Use semantic features based on WordNet synset • First paper on dependency-based reordering for a pair other than Chinese-toEnglish • The inclusion of WordNet synsets has led to the best BLEU score in our experiments, outperforming the baseline by 0.6 point absolute 18/20 Future Work • Investigate the extent to which using a WordNet-informed approach (as presented here) outperforms an unsupervised method via clustering • In-depth human analysis of translations produced by our models to gain further insights of exact contribution of WordNet to translate output 19/20 Thank you for your attention! 20/20