milano - ACORN Aston Corpus Network

by user

on 06 июля 2016

Category: Documents

>> Downloads: 5

views

Report

Comments

Description

Download milano - ACORN Aston Corpus Network

Transcript

milano - ACORN Aston Corpus Network

Strategies, norms or universals?
Investigating variation in translation
Silvia Bernardini
University of Bologna, Italy
[email protected]
Aston Corpus Symposium 2009
Resuming…
• Last year’s talk:
Theoretical background
• Target-oriented approach to the study of
translation (Toury 1995)
• Focus on the TT within its context of fruition
• Identification of norms and laws of translation, e.g.
– Law of growing standardisation
» More frequent target language options are preferred
– Law of interference
» Source text linguistic features are transferred onto the
target text
• Descriptive rather than prescriptive/pedagogic focus
• Corpus-based approach to the study of
translation (Baker 1993, Olohan 2004)
Theoretical background
“the most important task that awaits the
application of corpus techniques in translation
studies […] is the elucidation of the nature of
translated text as a mediated communicative
event. In order to do this, it will be necessary to
develop tools that will enable us to identify
universal features of translation, that is features
which typically occur in translated text rather than
original utterances and which are not the result of
interference from specific linguistic systems”.
(Baker 1993: 243)
Theoretical background
• Tools
– Monolingual comparable corpora
• Originals in language A and translations into the same language
from 1 or more other languages
• Universal features (hypothesised)
– e.g.: explicitness, simplification, disambiguation, preference for
conventional grammar, avoidance of repetition, normalisation…
• Types of observations
–
–
–
–
Lower % of content vs. grammatical words (Laviosa 1998)
Fewer contractions (Olohan 2003)
Fewer TL-specific “unique items” (Tirkkonen-Condit 2004)
…
Summary of old study: corpora
• 2 small monolingual comparable corpora
of fiction text samples
– One in English (original and translated from It)
– One in Italian (original and translated from En)
• 2 small parallel corpora
– The translations from the corpora above,
aligned to their source texts
+ Reference corpora of English and Italian
Summary of old study: method
1. Collect token frequencies from reference
corpora for all candidate collocation types
observed in monolingual comparable
corpora
2. Rank (MI/Fq) and compare rankings
(Mann-Whitney ranks test)
3. For significantly different rankings,
analyse translation shifts at parallel level
Summary of old study: findings
• MCC analysis:
– Translated fiction texts (Italian and English)
tend to be (overall) richer in collocations than
original texts in the same language
• Parallel analysis:
– Confirms that differences due to translation
shifts rather than unrelated variables
• The data provide support for the law of
growing standardisation
Moving on: technical translation
• Are results re: translation norms and
strategies observed in fiction corpora
confirmed by analyses of technical
translation corpora?
• i.e., is there (more) evidence of
– Growing standardisation or
– Interference
• In translations compared to (comparable)
originals?
Choosing an LSP
Perl documentation
•
•
•
•
Practical Extraction and Report Language
Popular programming language
Most communication happens in English
Efforts to produce documentation (original
and translated) in Italian
– Winning more people to the cause
Why perl?
• Initial stimulus: technical translation course
at SSLMIT (1 year of MA)
– pod2it project
• Very favourable authentic conditions, nearexperimental
– Neatly delimited topic/discourse community
– Both originals and translations drafted by area
experts (not linguists)
Originals (En) and translations (It) (e.g.)
perl pods
NAME
perlboot - Beginner's ObjectOriented Tutorial
NOME
perlboot - Introduzione alla
tecnologia Orientata agli Oggetti
(titolo originale: Beginner's
Object-Oriented Tutorial)
DESCRIPTION
If you're not familiar with objects
from other languages, some of
the other Perl object
documentation may be a little
daunting, such as perlobj, a basic
reference in using objects, and
perltoot, which introduces readers
to the peculiarities of Perl's object
system in a tutorial way.
DESCRIZIONE
Se non avete già una certa
familiarità con la tecnologia ad
oggetti degli altri linguaggi di
programmazione, parte della
documentazione sulla OOP in
Perl potrebbe essere un po‘
intimidatoria: perlobj, una guida di
riferimento sull'utilizzo degli
oggetti e perltoot che introduce il
lettore alle particolarità della
tecnologia ad oggetti del Perl con
un taglio introduttivo.
Italian originals (e.g.)
Method
•
Corpus design
– Monolingual component
1. Translated Italian texts (PERLTRIT)
2. Original Italian texts (PERLORIT)
– Parallel component
•
•
(English Source texts of translated component)
(PERLOREN)
Translated Italian texts (PERLTRIT)
The perl corpus
Original English
(STs of PERLTRIT)
Original Italian
(comparable)
PERLOREN
Translated Italian
(TTs of PERLOREN)
PERLORIT
PERLTRIT
tokens
298,346
305,537
321,405
types
18,639
22,495
22,768
texts
43
89
43
authors
translators
16
---
30
---
--11
Corpus preparation
• Download texts (plain txt)
• Record relevant meta-data (readme file)
– url, author, author’s cv, notes
• Tag and lemmatise (Tree Tagger)
• Align parallel component (EasyAlign)
• Index with the CWB
Assembling evidence
• Research question
– Translated fiction texts (Italian and English)
show evidence of growing standardisation (at
the collocational level)
Universal or norm/law-governed?
What happens in technical translation?
Evidence of standardisation
 support for the “universality” hypothesis
Evidence of interference
 support for the “norm/law” hypothesis
Assembling evidence
• Look for differences btwn originals and
translations in Italian that:
– could be interpreted as a consequence of
either interference or standardisation
– are not (likely to be) the result of unrelated
variables
– are sufficiently frequent in this technical field
to allow confident judgement
?
Case study:
borrowings and calques
•
•
•
English words
New Italian words based on English
terms or new senses derived from
English “false friends”
English morphosyntactic marks (plural)
•
More frequent in
1. originals or
2. translations?
Case study:
borrowings and calques
if 1, than translators could be seen as
conforming to TL “normal” use more than
original authors of comparable texts
=> standardisation
If 2, than translators could be
hypothesised to be more subject to
interference from the SL than original
authors of comparable texts
=> interference
Identifying foreign/calqued words in
corpora
1. Keywords
– each corpus is used in turn as a reference
corpus
a. All words (to identify borrowings)
b. Verbs only (to identify calques)
2. Words ending in –s
•
To compare use of non-Italian morphological
marks (unadapted borrowings)
1a Keyword analysis: all words
• Use one corpus as a reference corpus to
highlight words that are significantly more
frequent in the other
• Define what counts as a keyword
• Cut-off point: 5
• Log-likelihood ordering
• Top 100 types
• Browse lists, select potential keyborrowings, check concordances
Problems
• Most “keywords” identify topics
– that’s what keywords are meant to do after all
• Some signal differences btwn
English/Italian writing strategies or
possibly slight genre differences
• For instance…
PERLORIT
PERLTRIT
PERLTRIT
178.4 package
148.2 match*
94.6 char
PERLTRIT (cont’d)
65.6 local
63.7 buffer
54.9 point
PERLORIT
131.0 script
130.7 expression
123.2 regular
87.7 filehandle
83.7 locale
83.3 require
54.4 record
53.4 long
51.7 pack
118.7 array
75.0 overloading
54.1 print
72.3 unpack
66.9 socket
66.9 shift
50.5 thread
48.6 Encode
46.5 pipe
50.7 reference
37.5 matching*
34.1 Hello
More borrowings in translated Italian than
in original Italian…?
Looking closer: PERLTRIT
• Unrelated variables
– Larger amount of code text
• char, filehandle, shift, require, (un)pack
– Different topics
• locale, encode, (code) point, long
– Morphological differences
• match/matching
– Dubious cases
• socket, buffer, record, thread, pipe
Alternatives?
1.
Socket
–
–
2.
“…anche chiamato zoccolo, è una tipologia di connettore utilizzata in
elettronica”
Zoccolo: 0 occurrences in corpus
Buffer
–
–
3.
“…letteralmente tampone: in italiano, memoria tampone o anche
intermediaria, di transito”
Tampone, intermediaria, di transito: 0 occ’s in corpus
Record
–
4.
“In informatica il record è un oggetto di un database strutturato in dati
che contiene un insieme di campi o elementi, ciascuno dei quali
possiede nome e tipo propri.”
Thread
–
5.
“Un thread o thread di esecuzione è una suddivisione di un
programma in due o più task che vengono eseguiti in modo
concorrente.”
Pipe
–
“Nei sistemi operativi una pipe è uno degli strumenti disponibili per
far comunicare tra loro dei processi. “
Wikipedia
One candidate left:
package
% pacchetto
% package +
pacchetto
%
PERLTRIT
357
78.8
96
21.1
453
100
PERLORIT
81
84.3
15
15.6
96
100
In fact, if anything, translations would
seem to show a slight preference for
“pacchetto” compared to original texts
Looking closer: originals…
PERLORIT
131.0 script
130.7 expression
123.2 regular
118.7 array
75.0 overloading
54.1 print
50.7 reference
37.5 matching*
34.1 Hello
PERLORIT
regular
expression
PERLORIT
PERLTRIT
% espressione
regolare
% reg. expr. +
espr. reg.
%
109
50.9
105
49.0
214
100
10
5.9
157
94.0
167
100
Searches:
[word="regular" %cd] [word="expressions?" %cd];
[lem="espressione" %cd] [lem="regolare" %cd];
PERLORIT
reference
% riferimento
% reference +
riferimento
%
PERLORIT
88
38.2
142
61.7
230
100
PERLTRIT
19
3.9
464
96.0
483
100
Searches:
[word=“references?" %cd];
[lem=“riferimento" %cd];
PERLORIT
hello
%
31
31
69
69
100
100
1
2.2
43
97.7
44
100
PERLORIT
PERLTRIT
Hello world
vs
Ciao mondo
ciao
% hello+ciao
%
Searches:
[word=“hello?" %cd];
[word=“ciao" %cd];
Looking closer: originals…
PERLORIT
131.0 script
130.7 expression
123.2 regular
118.7 array
75.0 overloading
54.1 print
50.7 reference
37.5 matching*
34.1 Hello
Summing up: 1a borrowings (all)
• The translated corpus contains more keyborrowings than the original corpus
• However, in most cases this is due to topic
differences
• In no cases could we identify English words
found in the translated corpus with alternative
Italian renderings favoured in the original corpus
• On the other hand, at least 4 out of 8 keyborrowings found in the original corpus have
alternative Italian renderings favoured in the
translated corpus
1b Calqued verbs
• Verbs that are significantly more frequent in
PERLORIT than in PERLTRIT and viceversa
• Cut-off point: 2
• Log-likelihood ordering
• Top 100 types
• Separate searches for:
– Lemmas that are “unknown” to the tagger
• To search for real calques
– Lemmas that are “not unknown” to the tagger
• To search for existing Italian verbs with calqued meanings
Results
PERLORIT
known lemma
ritornare fq: 90 LL: 35.9
processare fq: 26 LL: 15.6
PERLTRIT
known lemma
uccidere fq: 6 LL: 8.3
unknown lemma
cicliamo fq: 2 LL: 3.3
cicla fq: 2 LL: 3.3
splittare fq: 3 LL: 4.9
unknown lemma
0
PERLTRIT: uccidere (un processo)
(kill (a process))
PERLTRIT> [lem="uccidere"];
1. <perlfaq8>: il segnale che ha <ucciso> il processo
-->perloren: the signal the process died from
2. <perlfork>: <Uccidere> il processo genitore
-->perloren: Killing the parent process
3. <perlfork>: genitore viene <ucciso>(usando la funzione kill( )
-->perloren: process is killed (either using Perl's kill( ) builtin
4. <perlipc>: {HUP} ad 'IGNORE' per evitare di <uccidere> sé stesso)
-->perloren: $ SIG{HUP} to IGNORE so it doesn't kill itself)
5. <perlipc>: "fork( )" e "exec( )", ed <uccidere> i processi figli
-->perloren: fork( ) and exec( ), and kill the errant child process.
6. <perlthrtut>: probabilmente si bloccherà finché non lo <uccidete>.
-->perloren: This program will probably hang until you kill it .
kill + inanimate object in
ukWaC-01: game (14),
process (2), security (2), NHS
(2), soul (2), flu (2), time (2),
…
uccidere + inanimate object in itWaC3-01:
musica (5, music), speranza (5, hope), amore
(4, love), concorrenza (3, competition),
innocenza (3, innocence), percezione (3,
perception), realtà (3, reality), …
PERLORIT: ritornare (selected)
(return)
1. <corso>: testuale mentre exit <ritorna> solo un codice nume
2. <Dalla_shell_al_web>: ript; <ritornando> poi la struttura re
3. <frameperl>: Tale funzione <ritorna> 0 sei il comando è
4. <frameperl>: exec che però non <ritorna> alcun valore. La
5. <javaperl>: metodo / accept( )/ <ritorna> una istanza della
6. <javaperl>: la funzione <ritornerebbe> un valore vero per
7. <mb_corso_perl_5_print>: slash ( \ ) <ritorna> una reference
8. <mostraLezione.php_puglisi>: iavi e le <ritorna> assemblate
9. <Perl_Tutorial>: ) ; viene <ritornato> vero A dire il vero
10. Perl_Tutorial>: L' espressione $cibo[ 2 ] <ritorna> uva.
NB: [lem="ritornare"] [pos="N.*"]
Fq PERLTRIT 0
Fq PERLORIT 16
Fq PERLORIT 90
Alternatives: restituire, produrre, …
Fq PERLTRIT 28
PERLTRIT: ritornare (selected)
(return)
1. <scopo_dello_scope>: il seme, e <ritorna> il risultato
2. <perlboot>: classe per <ritornare> a questo package.
3. <perlembed>: esaminare i valori <ritornati>, avrete
4. <perlfaq>: mai exec( ) non <ritorna>? Si possono fare
5. <perlfaq6>: di matching <ritorna> le coppie che ha tr
6. <perlfaq9>: he gli errori fatali <ritornino> al browser
7. <perlfork>: processo; il figlio <ritorna> dalla fork( )
8. <perlfunc>: di sistema e non <ritorna>, usate "system"
9. <perlfunc>: ESPR return <Ritorna> da una subroutine ,
10. <perlipc>: ostra FIFO. chdir; <ritorna> a casa $FIFO =
NB: [lem="ritornare"] [pos="N.*"]
Fq PERLTRIT 0
Fq PERLORIT 16
Fq PERLORIT 90
Alternatives: restituire, produrre, …
Fq PERLTRIT 28
PERLORIT: processare (selected)
(process)
1. <coisson_puntata72>: nga adatta ad essere <processata> dalla shell dei
2. <eb_irc_check>: specificato , verrà <processato> dalla funzione on_l
3. <e_solo_fortuna_printable>: codice viene <processato> con un foglio
4. <introduzione_al_printable>: il software deve <processare> il testo
5. <mb_corso_perl_10_print>: truzioni, essa <processa> tutti gli elementi
6. <mb_corso_perl_10_print>: e di <processarlo> con il seguente cod
7. <mod_perl1tutorial_print>: infatti <processerà> tutte le direttive
8. <Perl_Tutorial>: che crei o comunque <processi> pagine html, sorge
9. <sostituire_ma_c_printable>: il nostro script <processa>, invece di
10.<tegels_usare_il_perl>: file di log viene <processata>. La variabile
Fq PERLORIT 26
Fq PERLTRIT 5
Alternatives: elaborare, manipolare…
PERLTRIT: processare
(process)
PERLTRIT> [lem="processare"];
1. <perlfaq8>: poiché la shell <processa> le redirezioni
2. <perlfunc>: output vengono <processati> (consultate
3. <perlfunc>: a finire in $var <processa> la lista degl
4. <perlthrtut>: riato affinché venga <processato> . Una
5. <perlvar>: routine per <processare> gli avvertimenti
Fq PERLORIT 26
Fq PERLTRIT 5
Alternatives: elaborare, manipolare…
PERLORIT: ciclare
(cicle)
PERLORIT> [word="cicl.*" & pos="V.*"];
1. <coisson_puntata71>: inviati); ora <cicliamo> sull' array
2. <garau_guida_perl> consente di <ciclare> un determinato blo
3. <perl_tutorial_sciabarra>: il foreach <cicla> su un array e
4. <sostituire_ma_c_printable>: Perl <cicla> linea per linea e
5. <tegels_usare_il_perl>: aperto, <cicliamo> attraverso le sue
Fq PERLORIT 5
Fq PERLTRIT 0
Alternatives: iterare
PERLORIT: splittare
split
PERLORIT> [word="splitt.*"];
1. <perl_valsesia>: in cui <splittare> il pattern.
2. <perl_valsesia>: si può voler <splittare> una linea
3. <soltanto_un_alt_printable>: <splittato> e passato
4. <Split_in_perl>: "<splittare>" cioè dividere una str
Fq PERLORIT 4
Fq PERLTRIT 0
Alternatives: dividere, separare
Summing up: calques
• The comparative analysis of key verbs in
the original and in the translated
subcorpora suggests that authors are
more at ease with the use of English
(technical) calques than translators.
2. -s words
1. Search for words ending in –s in original Italian
and translated Italian (fq >1)
2. Select from output only plurals (unadapted
borrowings) used (rather than quoted) in Italian
discourse in the two sub corpora
3. Which corpus displays greater use of
unadapted borrowings ending in –s?
Words ending in –s
•
•
PERLORIT
•
•
95 types
711 tokens
1. warnings
2. Windows
3. unless
4. Mongers
5. Associates
6. keys
7. SomeClass
8. files
9. alias
10. Class
11. …
69
56
54
38
31
20
19
16
15
14
144 types
1000 tokens
1. unless
2. this
3. bless
4. alias
5. exists
6. threads
7. Windows
8. warnings
9. Class
10. vars
11. …
85
60
46
39
38
37
36
34
24
24
Search:
[word="[a-zA-Z][a-zA-Z]+-?[a-zA-Z]?s"];
PERLTRIT
Results from the PERLIT corpus
PERLORIT
Word
fq
files
16
subroutines 10
backquotes 6
scripts
4
forms
4
links
4
expressions 3
cookies
3
references
2
PERLTRIT
Word
backticks
closures
fq
2
1
PERLORIT: “files”
perlorit
perltrit
PERLORIT: “forms”
perlorit
perltrit
PERLTRIT: closures and backticks
1. <perlmod>: riguardo alle chiusure [<closures>, N.d.T.].
2. <perlref>: come le <closures> [ letteralmente " chiusure
1. <perlfaq8>: system( ) con quello dei <backticks> (`).
2. <perlfaq8>: uscita). I <backticks> (``) lanciano il coma
3. <perlfaq8>: shell, con i <backticks> ciò non è possibile.
Summing up:
unadapted borrowings
• Despite superficial quantitative evidence
(higher numbers of types and tokens for
words ending in –s in translated than in
original corpora), translators appear to
disfavour unadapted borrowings ending in
–s with respect to original authors
General conclusion
• Results of study 2 lend support to
conclusions of study 1:
– In both fiction translation and technical
translation,
– Despite differences in translator profile,
translation “commission”, topic, genre,
readership etc.,
– And regardless of differences in
methodological design/object of corpus
study…
General conclusions
• The law of growing standardization seems
to predominate over the law of
interference (in present-day translation
practice between English and Italian etc.
etc.)
• Two small steps toward the bottom-up
identification of universal trends…
General conclusions
• The lessons to be learnt
– Relying on superficial quantitative data in the
search for translation universals can be very
misleading
– Insights and hypotheses should emerge from
• the accumulation of results of (painstaking)
analyses
• conducted on closely comparable corpora,
• checked against their parallel text component(s)
and/or taking into account alternatives offered by
the target language
Thank you
References
Pym, A. 2008. “On Toury's laws of how translators translate”. In Pym, A., M.
Schlesinger and D. Simeoni (eds.). Beyoond Descriptive Translation
Studies. Benjamins. 311-328.
Toury, G. 1995. Descriptive Translation Studies and Beyond. Amsterdam:
Benjamins.
Tirkkonen-Condit, S. 2004. “Unique items — over- or under-represented in
translated language?”. In Mauranen, A. and P. Kujamäki (eds.),
Translation Universals. Benjamins. 177–184.
Baker, M. 1993. “Corpus linguistics and translation studies. Implications
and applications”. In Baker, M. G. Francis and E. Tognini-Bonelli (eds.).
Text and Technology. Benjamins. 233-250.
Laviosa, S. 1998. “Core patterns of lexical use in a comparable corpus of
English narrative prose”. Meta 43(4). 557-570.
Olohan, M. 2003. “How frequent are the contractions? A study of
contracted forms in the translational English corpus”, Target 15(1):59-89.
Olohan, M. 2004. Introducing Corpora in Translation Studies. Routledge.
Recent critiques
“Baker (1995: 235), re-affirmed by Olohan (2004: 43),
argues that translations can be studied by comparing
them with non-translations in the same language,
without focusing on source texts or source languages.
This means we can describe translational English in
opposition to non-translational English, doing all the
research on English. The result is perhaps the major
methodological advance associated with corpus
studies. It has many economic advantages: it cuts out all
the bother of learning foreign languages and cultures; it
controls numerous tricky variables associated with
suspicions of linguistic and cultural relativism. In the
English-only research on optional that, there is thus strictly
no way of knowing about any kind of foreign
interference causing the frequencies of the linguistic
variable, since in principle the source texts are not in the
corpus”. [Pym 2008, p. 14 of pre-print version]