Help
Difference between revisions of "List translation"
List translation is the process, when no relevant wordlist exist an a target language (L2), to take a relevant existing list in a larger language spoken by that community (L1), and to efficiently translate that list into your target language (L2). This process is both highly efficient, yet biased and technical. This page guides a willing lexicographer to properly translate a list for LinguaLibre. This page takes for case study a Portugese list translated into Amazonia's Surui language.
Line 37: | Line 37: | ||
Qualitatively, this whole will constitute a solid bilingual Surui <=> Portuguese vocabulary, and with Lingualibre, a multimedia dictionary. | Qualitatively, this whole will constitute a solid bilingual Surui <=> Portuguese vocabulary, and with Lingualibre, a multimedia dictionary. | ||
− | == | + | == Translation rules for lists == |
− | The Surui list will ultimately be a single Surui wordlist ready to save. A few rules must be taken into account to carry out this effort | + | The Surui list will ultimately be a single Surui wordlist ready to save. A few rules must be taken into account to properly carry out this effort. |
− | 1) | + | 1) The source language is called ''L1'' for ''« Language 1 »'' (L1 = Portugese), we add to it the target language nicknamed ''L2'' for « Language 2 » (L2 = Surui) |
− | |||
− | + | 2) The format of our vocabulary list is therefor as follows: | |
− | 3) Keep the | + | * <code>L1 → L2</code> |
− | 4) To the Portuguese L1 word on the left, we add as the L2 translation on the right its most common synonym. Example for L1 Portuguese → L2 English: | + | * i.e.: <code>Portugese → Surui</code> |
+ | |||
+ | 3) Keep the <big><code>→</code></big> sign, it's an important ''separator''. | ||
+ | |||
+ | 4) To the Portuguese L1 word on the left, we add as the L2 translation on the right its most common synonym. Example for ''L1 Portuguese → L2 English'': | ||
<pre> | <pre> | ||
* que → that | * que → that | ||
Line 72: | Line 75: | ||
* como → like | * como → like | ||
</pre> | </pre> | ||
− | 5) | + | |
+ | 5) If an L1 word there is no L2 translation, go to the next word, because we want to translate as many words as possible. Example, "Arranha-céu" is ignored: | ||
<pre> | <pre> | ||
* uma → one | * uma → one | ||
Line 79: | Line 83: | ||
</pre> | </pre> | ||
− | 6) | + | 6) If for an L1 word several very common L2 translations exist, duplicate the line and translate: |
<pre> | <pre> | ||
* como → like | * como → like | ||
Line 90: | Line 94: | ||
</pre> | </pre> | ||
− | 7) | + | 7) If for an L1 word too many L2 translations exist, by variations, add only the base. Example for verb variations for English L1 and Portuguese L2: |
<pre> | <pre> | ||
* be → ser | * be → ser | ||
Line 97: | Line 101: | ||
* build → construir | * build → construir | ||
</pre> | </pre> | ||
− | + | We only translate the core base, we do not translate variations. | |
<pre> | <pre> | ||
* build → construir O | * build → construir O | ||
Line 109: | Line 113: | ||
</pre> | </pre> | ||
− | 8) | + | 8) If there is no L1 word, and there is a specific L2 word, add a line and a word after the arrow. Example, if Portuguese has no word for "Penguin", then: |
<pre> | <pre> | ||
* Papagaio → Parrot | * Papagaio → Parrot | ||
Line 115: | Line 119: | ||
* Monkey → Macaco | * Monkey → Macaco | ||
</pre> | </pre> | ||
− | |||
− | + | 9) Secret: the "empty sides" produced in 5) and 8) can be completed later. | |
− | + | ||
+ | == To push further == | ||
+ | The source Portugese list covers 20,000+ common words to got further. The following lists are available for translation into Surui, in order to quickly build a basic Surui vocabulary of 9000 Surui words (assuming 10% words will be skipped) : | ||
* [[List:Sru/words-by-frequency-02001-to-04000|List:Sru/words-by-frequency-02001-to-04000]] | * [[List:Sru/words-by-frequency-02001-to-04000|List:Sru/words-by-frequency-02001-to-04000]] | ||
Line 124: | Line 129: | ||
* [[List:Sru/words-by-frequency-06001-to-08000|List:Sru/words-by-frequency-06001-to-08000]] | * [[List:Sru/words-by-frequency-06001-to-08000|List:Sru/words-by-frequency-06001-to-08000]] | ||
* [[List:Sru/words-by-frequency-08001-to-10000|List:Sru/words-by-frequency-08001-to-10000]] | * [[List:Sru/words-by-frequency-08001-to-10000|List:Sru/words-by-frequency-08001-to-10000]] | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
== See also == | == See also == |
Revision as of 06:33, 24 May 2022
Community
In this case study, the community we work with here is the Surui people, located in North-Western Amazonia.
It's a small population of 30 Amazonia forest's villages and 1,600 native speakers.
The languages they speak is Surui and Portugese, the national language of Brazil, at various levels.
Westernization is radically changing daily practices, culture and language.
Almir Surui, their elected chief, has been contacted and is willing to lead such effort.
Recent effort to preserve and revitalize their language are underway : they recently normalized Surui language's writing, elementary school is in Surui language, with associated new books.
There is no existing wordlist to reuse to record in Lingualibre.
Identify your source list
Given their second language is Portugese…
Given academic researches recommending word list sorted by frequency…
And given we have such list in Portugese…
We will work with Surui natives to translate such Portuguese list into Surui :
- List:Por/words-by-frequency-00001-to-02000 - les 2000 mots portugais les plus utilisés à traduire en Surui.
Identifying the translators
Almir has, as requested, identified a translator.
This translator will need to either translate :
- online via the "EDIT" button (you must create a Wikipedia account)
- offline, and forward the results.
Linguistic biases
Surui language being very different from Portuguese, we can advance in phases :
- 1st phase (quantitative): bluntly translating from Portuguese to Surui is adapted to a first massive effort, in order to produce 90% of the common vocabulary.
- 2nd phase (qualitative): it will naturally be appropriate to complete these Portuguese-centered lists with lexicons and concepts specific to Surui and its environment. This supplement is a work of a different nature.
Qualitatively, this whole will constitute a solid bilingual Surui <=> Portuguese vocabulary, and with Lingualibre, a multimedia dictionary.
Translation rules for lists
The Surui list will ultimately be a single Surui wordlist ready to save. A few rules must be taken into account to properly carry out this effort.
1) The source language is called L1 for « Language 1 » (L1 = Portugese), we add to it the target language nicknamed L2 for « Language 2 » (L2 = Surui)
2) The format of our vocabulary list is therefor as follows:
L1 → L2
- i.e.:
Portugese → Surui
3) Keep the →
sign, it's an important separator.
4) To the Portuguese L1 word on the left, we add as the L2 translation on the right its most common synonym. Example for L1 Portuguese → L2 English:
* que → that * a → a * o → the * de → from * não → no * é → is * e → and * um → one * para → to * eu → me * se → if * me → me * no → * uma → one * está → is * por → by * com → with * os → the * do → from * te → you * em → in * ele → he * como → like
5) If an L1 word there is no L2 translation, go to the next word, because we want to translate as many words as possible. Example, "Arranha-céu" is ignored:
* uma → one * Arranha-céu → * como → like
6) If for an L1 word several very common L2 translations exist, duplicate the line and translate:
* como → like * como → as * como → similar to * Monkey → Macaca * Monkey → Macaco * Monkeys → Macacas * Monkeys → Macacos
7) If for an L1 word too many L2 translations exist, by variations, add only the base. Example for verb variations for English L1 and Portuguese L2:
* be → ser * make → fazer * move → mover * build → construir
We only translate the core base, we do not translate variations.
* build → construir O * build → construo X * build → constrói X * build → construiu X * build → construímos X * build → constroem X * build → constroem X * …
8) If there is no L1 word, and there is a specific L2 word, add a line and a word after the arrow. Example, if Portuguese has no word for "Penguin", then:
* Papagaio → Parrot * → Penguin * Monkey → Macaco
9) Secret: the "empty sides" produced in 5) and 8) can be completed later.
To push further
The source Portugese list covers 20,000+ common words to got further. The following lists are available for translation into Surui, in order to quickly build a basic Surui vocabulary of 9000 Surui words (assuming 10% words will be skipped) :
- List:Sru/words-by-frequency-02001-to-04000
- List:Sru/words-by-frequency-04001-to-06000
- List:Sru/words-by-frequency-06001-to-08000
- List:Sru/words-by-frequency-08001-to-10000