Comments
Description
Transcript
milano - ACORN Aston Corpus Network
Strategies, norms or universals? Investigating variation in translation Silvia Bernardini University of Bologna, Italy [email protected] Aston Corpus Symposium 2009 Resuming… • Last year’s talk: Theoretical background • Target-oriented approach to the study of translation (Toury 1995) • Focus on the TT within its context of fruition • Identification of norms and laws of translation, e.g. – Law of growing standardisation » More frequent target language options are preferred – Law of interference » Source text linguistic features are transferred onto the target text • Descriptive rather than prescriptive/pedagogic focus • Corpus-based approach to the study of translation (Baker 1993, Olohan 2004) Theoretical background “the most important task that awaits the application of corpus techniques in translation studies […] is the elucidation of the nature of translated text as a mediated communicative event. In order to do this, it will be necessary to develop tools that will enable us to identify universal features of translation, that is features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems”. (Baker 1993: 243) Theoretical background • Tools – Monolingual comparable corpora • Originals in language A and translations into the same language from 1 or more other languages • Universal features (hypothesised) – e.g.: explicitness, simplification, disambiguation, preference for conventional grammar, avoidance of repetition, normalisation… • Types of observations – – – – Lower % of content vs. grammatical words (Laviosa 1998) Fewer contractions (Olohan 2003) Fewer TL-specific “unique items” (Tirkkonen-Condit 2004) … Summary of old study: corpora • 2 small monolingual comparable corpora of fiction text samples – One in English (original and translated from It) – One in Italian (original and translated from En) • 2 small parallel corpora – The translations from the corpora above, aligned to their source texts + Reference corpora of English and Italian Summary of old study: method 1. Collect token frequencies from reference corpora for all candidate collocation types observed in monolingual comparable corpora 2. Rank (MI/Fq) and compare rankings (Mann-Whitney ranks test) 3. For significantly different rankings, analyse translation shifts at parallel level Summary of old study: findings • MCC analysis: – Translated fiction texts (Italian and English) tend to be (overall) richer in collocations than original texts in the same language • Parallel analysis: – Confirms that differences due to translation shifts rather than unrelated variables • The data provide support for the law of growing standardisation Moving on: technical translation • Are results re: translation norms and strategies observed in fiction corpora confirmed by analyses of technical translation corpora? • i.e., is there (more) evidence of – Growing standardisation or – Interference • In translations compared to (comparable) originals? Choosing an LSP Perl documentation • • • • Practical Extraction and Report Language Popular programming language Most communication happens in English Efforts to produce documentation (original and translated) in Italian – Winning more people to the cause Why perl? • Initial stimulus: technical translation course at SSLMIT (1 year of MA) – pod2it project • Very favourable authentic conditions, nearexperimental – Neatly delimited topic/discourse community – Both originals and translations drafted by area experts (not linguists) Originals (En) and translations (It) (e.g.) perl pods NAME perlboot - Beginner's ObjectOriented Tutorial NOME perlboot - Introduzione alla tecnologia Orientata agli Oggetti (titolo originale: Beginner's Object-Oriented Tutorial) DESCRIPTION If you're not familiar with objects from other languages, some of the other Perl object documentation may be a little daunting, such as perlobj, a basic reference in using objects, and perltoot, which introduces readers to the peculiarities of Perl's object system in a tutorial way. DESCRIZIONE Se non avete già una certa familiarità con la tecnologia ad oggetti degli altri linguaggi di programmazione, parte della documentazione sulla OOP in Perl potrebbe essere un po‘ intimidatoria: perlobj, una guida di riferimento sull'utilizzo degli oggetti e perltoot che introduce il lettore alle particolarità della tecnologia ad oggetti del Perl con un taglio introduttivo. Italian originals (e.g.) Method • Corpus design – Monolingual component 1. Translated Italian texts (PERLTRIT) 2. Original Italian texts (PERLORIT) – Parallel component • • (English Source texts of translated component) (PERLOREN) Translated Italian texts (PERLTRIT) The perl corpus Original English (STs of PERLTRIT) Original Italian (comparable) PERLOREN Translated Italian (TTs of PERLOREN) PERLORIT PERLTRIT tokens 298,346 305,537 321,405 types 18,639 22,495 22,768 texts 43 89 43 authors translators 16 --- 30 --- --11 Corpus preparation • Download texts (plain txt) • Record relevant meta-data (readme file) – url, author, author’s cv, notes • Tag and lemmatise (Tree Tagger) • Align parallel component (EasyAlign) • Index with the CWB Assembling evidence • Research question – Translated fiction texts (Italian and English) show evidence of growing standardisation (at the collocational level) Universal or norm/law-governed? What happens in technical translation? Evidence of standardisation support for the “universality” hypothesis Evidence of interference support for the “norm/law” hypothesis Assembling evidence • Look for differences btwn originals and translations in Italian that: – could be interpreted as a consequence of either interference or standardisation – are not (likely to be) the result of unrelated variables – are sufficiently frequent in this technical field to allow confident judgement ? Case study: borrowings and calques • • • English words New Italian words based on English terms or new senses derived from English “false friends” English morphosyntactic marks (plural) • More frequent in 1. originals or 2. translations? Case study: borrowings and calques if 1, than translators could be seen as conforming to TL “normal” use more than original authors of comparable texts => standardisation If 2, than translators could be hypothesised to be more subject to interference from the SL than original authors of comparable texts => interference Identifying foreign/calqued words in corpora 1. Keywords – each corpus is used in turn as a reference corpus a. All words (to identify borrowings) b. Verbs only (to identify calques) 2. Words ending in –s • To compare use of non-Italian morphological marks (unadapted borrowings) 1a Keyword analysis: all words • Use one corpus as a reference corpus to highlight words that are significantly more frequent in the other • Define what counts as a keyword • Cut-off point: 5 • Log-likelihood ordering • Top 100 types • Browse lists, select potential keyborrowings, check concordances Problems • Most “keywords” identify topics – that’s what keywords are meant to do after all • Some signal differences btwn English/Italian writing strategies or possibly slight genre differences • For instance… PERLORIT PERLTRIT PERLTRIT 178.4 package 148.2 match* 94.6 char PERLTRIT (cont’d) 65.6 local 63.7 buffer 54.9 point PERLORIT 131.0 script 130.7 expression 123.2 regular 87.7 filehandle 83.7 locale 83.3 require 54.4 record 53.4 long 51.7 pack 118.7 array 75.0 overloading 54.1 print 72.3 unpack 66.9 socket 66.9 shift 50.5 thread 48.6 Encode 46.5 pipe 50.7 reference 37.5 matching* 34.1 Hello More borrowings in translated Italian than in original Italian…? Looking closer: PERLTRIT • Unrelated variables – Larger amount of code text • char, filehandle, shift, require, (un)pack – Different topics • locale, encode, (code) point, long – Morphological differences • match/matching – Dubious cases • socket, buffer, record, thread, pipe Alternatives? 1. Socket – – 2. “…anche chiamato zoccolo, è una tipologia di connettore utilizzata in elettronica” Zoccolo: 0 occurrences in corpus Buffer – – 3. “…letteralmente tampone: in italiano, memoria tampone o anche intermediaria, di transito” Tampone, intermediaria, di transito: 0 occ’s in corpus Record – 4. “In informatica il record è un oggetto di un database strutturato in dati che contiene un insieme di campi o elementi, ciascuno dei quali possiede nome e tipo propri.” Thread – 5. “Un thread o thread di esecuzione è una suddivisione di un programma in due o più task che vengono eseguiti in modo concorrente.” Pipe – “Nei sistemi operativi una pipe è uno degli strumenti disponibili per far comunicare tra loro dei processi. “ Wikipedia One candidate left: package % pacchetto % package + pacchetto % PERLTRIT 357 78.8 96 21.1 453 100 PERLORIT 81 84.3 15 15.6 96 100 In fact, if anything, translations would seem to show a slight preference for “pacchetto” compared to original texts Looking closer: originals… PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello PERLORIT regular expression PERLORIT PERLTRIT % espressione regolare % reg. expr. + espr. reg. % 109 50.9 105 49.0 214 100 10 5.9 157 94.0 167 100 Searches: [word="regular" %cd] [word="expressions?" %cd]; [lem="espressione" %cd] [lem="regolare" %cd]; PERLORIT reference % riferimento % reference + riferimento % PERLORIT 88 38.2 142 61.7 230 100 PERLTRIT 19 3.9 464 96.0 483 100 Searches: [word=“references?" %cd]; [lem=“riferimento" %cd]; PERLORIT hello % 31 31 69 69 100 100 1 2.2 43 97.7 44 100 PERLORIT PERLTRIT Hello world vs Ciao mondo ciao % hello+ciao % Searches: [word=“hello?" %cd]; [word=“ciao" %cd]; Looking closer: originals… PERLORIT 131.0 script 130.7 expression 123.2 regular 118.7 array 75.0 overloading 54.1 print 50.7 reference 37.5 matching* 34.1 Hello Summing up: 1a borrowings (all) • The translated corpus contains more keyborrowings than the original corpus • However, in most cases this is due to topic differences • In no cases could we identify English words found in the translated corpus with alternative Italian renderings favoured in the original corpus • On the other hand, at least 4 out of 8 keyborrowings found in the original corpus have alternative Italian renderings favoured in the translated corpus 1b Calqued verbs • Verbs that are significantly more frequent in PERLORIT than in PERLTRIT and viceversa • Cut-off point: 2 • Log-likelihood ordering • Top 100 types • Separate searches for: – Lemmas that are “unknown” to the tagger • To search for real calques – Lemmas that are “not unknown” to the tagger • To search for existing Italian verbs with calqued meanings Results PERLORIT known lemma ritornare fq: 90 LL: 35.9 processare fq: 26 LL: 15.6 PERLTRIT known lemma uccidere fq: 6 LL: 8.3 unknown lemma cicliamo fq: 2 LL: 3.3 cicla fq: 2 LL: 3.3 splittare fq: 3 LL: 4.9 unknown lemma 0 PERLTRIT: uccidere (un processo) (kill (a process)) PERLTRIT> [lem="uccidere"]; 1. <perlfaq8>: il segnale che ha <ucciso> il processo -->perloren: the signal the process died from 2. <perlfork>: <Uccidere> il processo genitore -->perloren: Killing the parent process 3. <perlfork>: genitore viene <ucciso>(usando la funzione kill( ) -->perloren: process is killed (either using Perl's kill( ) builtin 4. <perlipc>: {HUP} ad 'IGNORE' per evitare di <uccidere> sé stesso) -->perloren: $ SIG{HUP} to IGNORE so it doesn't kill itself) 5. <perlipc>: "fork( )" e "exec( )", ed <uccidere> i processi figli -->perloren: fork( ) and exec( ), and kill the errant child process. 6. <perlthrtut>: probabilmente si bloccherà finché non lo <uccidete>. -->perloren: This program will probably hang until you kill it . kill + inanimate object in ukWaC-01: game (14), process (2), security (2), NHS (2), soul (2), flu (2), time (2), … uccidere + inanimate object in itWaC3-01: musica (5, music), speranza (5, hope), amore (4, love), concorrenza (3, competition), innocenza (3, innocence), percezione (3, perception), realtà (3, reality), … PERLORIT: ritornare (selected) (return) 1. <corso>: testuale mentre exit <ritorna> solo un codice nume 2. <Dalla_shell_al_web>: ript; <ritornando> poi la struttura re 3. <frameperl>: Tale funzione <ritorna> 0 sei il comando è 4. <frameperl>: exec che però non <ritorna> alcun valore. La 5. <javaperl>: metodo / accept( )/ <ritorna> una istanza della 6. <javaperl>: la funzione <ritornerebbe> un valore vero per 7. <mb_corso_perl_5_print>: slash ( \ ) <ritorna> una reference 8. <mostraLezione.php_puglisi>: iavi e le <ritorna> assemblate 9. <Perl_Tutorial>: ) ; viene <ritornato> vero A dire il vero 10. Perl_Tutorial>: L' espressione $cibo[ 2 ] <ritorna> uva. NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Fq PERLORIT 90 Alternatives: restituire, produrre, … Fq PERLTRIT 28 PERLTRIT: ritornare (selected) (return) 1. <scopo_dello_scope>: il seme, e <ritorna> il risultato 2. <perlboot>: classe per <ritornare> a questo package. 3. <perlembed>: esaminare i valori <ritornati>, avrete 4. <perlfaq>: mai exec( ) non <ritorna>? Si possono fare 5. <perlfaq6>: di matching <ritorna> le coppie che ha tr 6. <perlfaq9>: he gli errori fatali <ritornino> al browser 7. <perlfork>: processo; il figlio <ritorna> dalla fork( ) 8. <perlfunc>: di sistema e non <ritorna>, usate "system" 9. <perlfunc>: ESPR return <Ritorna> da una subroutine , 10. <perlipc>: ostra FIFO. chdir; <ritorna> a casa $FIFO = NB: [lem="ritornare"] [pos="N.*"] Fq PERLTRIT 0 Fq PERLORIT 16 Fq PERLORIT 90 Alternatives: restituire, produrre, … Fq PERLTRIT 28 PERLORIT: processare (selected) (process) 1. <coisson_puntata72>: nga adatta ad essere <processata> dalla shell dei 2. <eb_irc_check>: specificato , verrà <processato> dalla funzione on_l 3. <e_solo_fortuna_printable>: codice viene <processato> con un foglio 4. <introduzione_al_printable>: il software deve <processare> il testo 5. <mb_corso_perl_10_print>: truzioni, essa <processa> tutti gli elementi 6. <mb_corso_perl_10_print>: e di <processarlo> con il seguente cod 7. <mod_perl1tutorial_print>: infatti <processerà> tutte le direttive 8. <Perl_Tutorial>: che crei o comunque <processi> pagine html, sorge 9. <sostituire_ma_c_printable>: il nostro script <processa>, invece di 10.<tegels_usare_il_perl>: file di log viene <processata>. La variabile Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare… PERLTRIT: processare (process) PERLTRIT> [lem="processare"]; 1. <perlfaq8>: poiché la shell <processa> le redirezioni 2. <perlfunc>: output vengono <processati> (consultate 3. <perlfunc>: a finire in $var <processa> la lista degl 4. <perlthrtut>: riato affinché venga <processato> . Una 5. <perlvar>: routine per <processare> gli avvertimenti Fq PERLORIT 26 Fq PERLTRIT 5 Alternatives: elaborare, manipolare… PERLORIT: ciclare (cicle) PERLORIT> [word="cicl.*" & pos="V.*"]; 1. <coisson_puntata71>: inviati); ora <cicliamo> sull' array 2. <garau_guida_perl> consente di <ciclare> un determinato blo 3. <perl_tutorial_sciabarra>: il foreach <cicla> su un array e 4. <sostituire_ma_c_printable>: Perl <cicla> linea per linea e 5. <tegels_usare_il_perl>: aperto, <cicliamo> attraverso le sue Fq PERLORIT 5 Fq PERLTRIT 0 Alternatives: iterare PERLORIT: splittare split PERLORIT> [word="splitt.*"]; 1. <perl_valsesia>: in cui <splittare> il pattern. 2. <perl_valsesia>: si può voler <splittare> una linea 3. <soltanto_un_alt_printable>: <splittato> e passato 4. <Split_in_perl>: "<splittare>" cioè dividere una str Fq PERLORIT 4 Fq PERLTRIT 0 Alternatives: dividere, separare Summing up: calques • The comparative analysis of key verbs in the original and in the translated subcorpora suggests that authors are more at ease with the use of English (technical) calques than translators. 2. -s words 1. Search for words ending in –s in original Italian and translated Italian (fq >1) 2. Select from output only plurals (unadapted borrowings) used (rather than quoted) in Italian discourse in the two sub corpora 3. Which corpus displays greater use of unadapted borrowings ending in –s? Words ending in –s • • PERLORIT • • 95 types 711 tokens 1. warnings 2. Windows 3. unless 4. Mongers 5. Associates 6. keys 7. SomeClass 8. files 9. alias 10. Class 11. … 69 56 54 38 31 20 19 16 15 14 144 types 1000 tokens 1. unless 2. this 3. bless 4. alias 5. exists 6. threads 7. Windows 8. warnings 9. Class 10. vars 11. … 85 60 46 39 38 37 36 34 24 24 Search: [word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"]; PERLTRIT Results from the PERLIT corpus PERLORIT Word fq files 16 subroutines 10 backquotes 6 scripts 4 forms 4 links 4 expressions 3 cookies 3 references 2 PERLTRIT Word backticks closures fq 2 1 PERLORIT: “files” perlorit perltrit PERLORIT: “forms” perlorit perltrit PERLTRIT: closures and backticks 1. <perlmod>: riguardo alle chiusure [<closures>, N.d.T.]. 2. <perlref>: come le <closures> [ letteralmente " chiusure 1. <perlfaq8>: system( ) con quello dei <backticks> (`). 2. <perlfaq8>: uscita). I <backticks> (``) lanciano il coma 3. <perlfaq8>: shell, con i <backticks> ciò non è possibile. Summing up: unadapted borrowings • Despite superficial quantitative evidence (higher numbers of types and tokens for words ending in –s in translated than in original corpora), translators appear to disfavour unadapted borrowings ending in –s with respect to original authors General conclusion • Results of study 2 lend support to conclusions of study 1: – In both fiction translation and technical translation, – Despite differences in translator profile, translation “commission”, topic, genre, readership etc., – And regardless of differences in methodological design/object of corpus study… General conclusions • The law of growing standardization seems to predominate over the law of interference (in present-day translation practice between English and Italian etc. etc.) • Two small steps toward the bottom-up identification of universal trends… General conclusions • The lessons to be learnt – Relying on superficial quantitative data in the search for translation universals can be very misleading – Insights and hypotheses should emerge from • the accumulation of results of (painstaking) analyses • conducted on closely comparable corpora, • checked against their parallel text component(s) and/or taking into account alternatives offered by the target language Thank you References Pym, A. 2008. “On Toury's laws of how translators translate”. In Pym, A., M. Schlesinger and D. Simeoni (eds.). Beyoond Descriptive Translation Studies. Benjamins. 311-328. Toury, G. 1995. Descriptive Translation Studies and Beyond. Amsterdam: Benjamins. Tirkkonen-Condit, S. 2004. “Unique items — over- or under-represented in translated language?”. In Mauranen, A. and P. Kujamäki (eds.), Translation Universals. Benjamins. 177–184. Baker, M. 1993. “Corpus linguistics and translation studies. Implications and applications”. In Baker, M. G. Francis and E. Tognini-Bonelli (eds.). Text and Technology. Benjamins. 233-250. Laviosa, S. 1998. “Core patterns of lexical use in a comparable corpus of English narrative prose”. Meta 43(4). 557-570. Olohan, M. 2003. “How frequent are the contractions? A study of contracted forms in the translational English corpus”, Target 15(1):59-89. Olohan, M. 2004. Introducing Corpora in Translation Studies. Routledge. Recent critiques “Baker (1995: 235), re-affirmed by Olohan (2004: 43), argues that translations can be studied by comparing them with non-translations in the same language, without focusing on source texts or source languages. This means we can describe translational English in opposition to non-translational English, doing all the research on English. The result is perhaps the major methodological advance associated with corpus studies. It has many economic advantages: it cuts out all the bother of learning foreign languages and cultures; it controls numerous tricky variables associated with suspicions of linguistic and cultural relativism. In the English-only research on optional that, there is thus strictly no way of knowing about any kind of foreign interference causing the frequencies of the linguistic variable, since in principle the source texts are not in the corpus”. [Pym 2008, p. 14 of pre-print version]