Help

Difference between revisions of "List translation"

List translation is the process, when no relevant wordlist exist in a target language (L2), to pick a relevant existing list from a larger language spoken by that community (L1), to efficiently translate that L1 list into your target L2 language. This process is both highly efficient, yet biased and technical. This page guides a willing lexicographer to properly translate a list for LinguaLibre. This page takes for case study a Portuguese list translated into Amazonia's Surui language. This work lean toward field linguistic and lexicography.

(Created page with "Bonjour à tous, Cet email fait le point sur l'aspect technique de la traduction de la liste de vocabulaire Portugais vers le Surui. ;Liste à traduire Voici là liste de vo...")
 
 
(18 intermediate revisions by the same user not shown)
Line 1: Line 1:
Bonjour à tous,
+
{{#SUBTITLE:'''List translation''' is the process, when no relevant wordlist exist in a target language (L2), to pick a relevant existing list from a larger language spoken by that community (L1), to efficiently translate that L1 list into your target L2 language. This process is both highly efficient, yet biased and technical. This page guides a willing lexicographer to properly translate a list for LinguaLibre. This page takes for case study a Portuguese list translated into Amazonia's Surui language. This work lean toward field linguistic and lexicography.}}
 +
{{draft|}}
  
Cet email fait le point sur l'aspect technique de la traduction de la liste de vocabulaire Portugais vers le Surui.
 
  
;Liste à traduire
+
== Community ==
Voici là liste de vocabulaire des mots portuguais les plus utilisés, en commençant par les plus fréquents (=les plus utiles) :
+
[[File:Ordem_do_Mérito_Cultural_(8162244806)_(cropped).jpg|300px|thumb|[[:fr:Almir Surui|Almir Surui]], 2012, receiving a culture price from the Brazilian government. Almir is the native speaker willing to audio-document his Surui language using LinguaLibre.]]
 +
In this case study, the community we work with here is the [[:en:Surui people|Surui people]], located in North-Western Amazonia.
 +
<br>It's a small population of 30 Amazonia forest's villages and '''1,600 native speakers'''.
 +
<br>Suruis speak both their own Surui language and Portuguese language as the national language of Brazil. Levels of bilingual mastery varies.
 +
<br>Westernization is radically changing daily practices, culture and language.
  
* [[List:Sru/words-by-frequency-00001-to-02000|List:Sru/words-by-frequency-00001-to-02000]] - les 2000 mots portugais les plus utilisés à traduire en Surui.
+
Almir Surui, their elected chief, is our contact and has shown willingness to lead such effort.<br>
 +
Recent effort to preserve and revitalize their language are underway : they recently normalized Surui language's writing, elementary school is in Surui language, with associated new books.
  
Votre équipe de traducteurs devra trouver un mode d'édition pour cette :
+
There is no existing wordlist to reuse to record in Lingualibre.
* en ligne via le button "EDIT" ou "EDITER" ou "EDITAR" (il faut créer un compte Wikipedia)
 
* hors ligne.
 
  
;Specificités linguistiques et phases de création
+
== Identify your source list ==
Le Surui étant très différent du Portugais, on peut avancer par phases :
+
Given their second language is Portuguese…
* 1ère phase : cette traduction du portuguais au Surui est adapté à un premier effort massif, afin de produire 90% du vocabulaire courrant.
+
<br>Given academic researches recommending word list sorted by frequency…
* 2nd phase : il conviendra naturellement de compléter ces liste portugo-centrées par du lexique et des concepts spécifique au Surui et son environement. Ce complément est un travail de nature différente.
+
<br>And given we have such list in Portuguese…
 +
<br>We will work with Surui natives to translate such Portuguese list into Surui :
  
Qualitativement, ce tout constitura un lexique bilingue Surui <=> Portugais solide, et avec Lingualibre, un dictionnaire multimédia.
+
The L1 source:
 +
* [[List:Por/words-by-frequency-00001-to-02000]] - les 2000 mots portugais les plus utilisés
  
;Règles de traduction de la liste de vocabulaire
+
Our <code>L1 → L2</code> working page:
La liste en Surui sera au finale une liste de mots Surui unique prête à enregistrer. Quelques règles sont à prendre en compte pour mener cet effort proprement.
 
  
1) On parle de L1 pour la langue à traduire (ex: L1 Portugais) et de L2 pour les traductions ajoutées (ex: L2 Surui).
+
* [https://lingualibre.org/index.php?title=List:Sru/words-by-frequency-00001-to-02000&oldid=759502 List:Sru/words-by-frequency-00001-to-02000] - les 2000 mots portugais les plus utilisés '''à traduire en Surui.'''
2) Le format de notre liste de vocabulaire est ainsi :
 
  
* L1 (Portugais) → L2 (Surui)
+
== Identifying the translators ==
 +
Almir has, as requested, identified a translator.
  
3) Conserver le signe "", il est très important pour la suite.
+
This translator will need to either translate :
4) Au mot en L1 Portugais à gauche, nous ajoutons comme traduction L2 à droite son synonyme le plus courant. Exemple pour L1 Portuguais → L2 Anglais :
+
* online via the "EDIT" button (you must create a Wikipedia account)
 +
* offline, and forward the results by email.
  
 +
== Linguistic biases ==
 +
Surui language being very different from Portuguese, we can advance in phases :
 +
* 1st phase (quantitative): bluntly translating from Portuguese to Surui is adapted to a first massive effort, in order to produce 90% of the common vocabulary.
 +
* 2nd phase (qualitative): it will naturally be appropriate to complete these foreign-inspired lists with lexicons and concepts specific to your language and its environment. This supplement is a work of a different nature.
 +
 +
Qualitatively, this whole will constitute a solid bilingual Surui <=> Portuguese vocabulary, and with Lingualibre, a multimedia dictionary.
 +
 +
== Translation rules for lists ==
 +
The Surui list will ultimately be a single Surui wordlist ready to save. A few rules must be taken into account to properly carry out this effort.
 +
 +
1) The source language is called ''L1'' for ''« Language 1 »'' (L1 = Portuguese), we add to it the target language nicknamed ''L2'' for « Language 2 » (L2 = Surui)
 +
 +
2) The format of our vocabulary list is therefor as follows:
 +
 +
* <code>L1 → L2</code>
 +
* i.e.: <code>Portuguese → Surui</code>
 +
 +
3) Keep the <big><code>→</code></big> sign, it's an important ''separator''.
 +
 +
4) To the Portuguese L1 word on the left, we add as the L2 translation on the right its most common synonym. Example for ''L1 Portuguese → L2 English'':
 +
<pre>
 
* que → that
 
* que → that
 
* a → a
 
* a → a
Line 53: Line 79:
 
* ele → he
 
* ele → he
 
* como → like
 
* como → like
 +
</pre>
  
5) Si un mot L1 il n'existe pas de traduction L2, passer au mot suivant, car nous souhaitons traduire le plus de mots possible. Exemple, "Arranha-céu" est ignoré:
+
5) If an L1 word there is no L2 translation, go to the next word, because we want to translate as many words as possible. Example, "Arranha-céu" is ignored:
 
+
<pre>
 
* uma → one
 
* uma → one
 
* Arranha-céu →
 
* Arranha-céu →
 
* como → like
 
* como → like
 +
</pre>
  
6) Si pour un mot L1 plusieurs traductions L2 très communes existent, dupliquer la ligne et traduire:
+
6) If for an L1 word several very common L2 translations exist, duplicate the line and translate:
 
+
<pre>
 
* como → like
 
* como → like
 
* como → as
 
* como → as
Line 69: Line 97:
 
* Monkeys → Macacas
 
* Monkeys → Macacas
 
* Monkeys → Macacos
 
* Monkeys → Macacos
 +
</pre>
  
7) Si pour un mot L1 de trop nombreuses traductions L2 existent, par variations, n'ajouter que la base. Exemple pour les variations verbales pour L1 anglais et L2 portugais :
+
7) If for an L1 word too many L2 translations exist, by variations, add only the base. Example for verb variations for English L1 and Portuguese L2:
 
+
<pre>
 
* be → ser
 
* be → ser
 
* make → fazer
 
* make → fazer
 
* move → mover
 
* move → mover
 
* build → construir
 
* build → construir
 
+
</pre>
Nous traduisons uniquement la base fondamentale, nous ne détaillons les variations infinis :
+
We only translate the core base, we do not translate variations.
 
+
<pre>
 
* build → construir O
 
* build → construir O
 
* build → construo X
 
* build → construo X
Line 86: Line 115:
 
* build → constroem X
 
* build → constroem X
 
* build → constroem X
 
* build → constroem X
* ...
+
*
 
+
</pre>
 
 
8) Si il n'existe pas de mot L1, et qu'il existe un mot L2 spécifique, ajouter une ligne et un mot après la flèche. Exemple, si le portugais n'a pas de mot pour "Penguin", alors:
 
  
 +
8) If there is no L1 word, and there is a specific L2 word, add a line and a word after the arrow. Example, if Portuguese has no word for "Penguin", then:
 +
<pre>
 
* Papagaio → Parrot
 
* Papagaio → Parrot
 
* → Penguin
 
* → Penguin
 
* Monkey → Macaco
 
* Monkey → Macaco
 +
</pre>
  
9) Secret: les "trous" produits en 5) et 8) pourront être complétés plus tard.
+
9) Secret: the "empty sides" produced in 5) and 8) can be completed later.
 
 
;Pour aller plus loin...
 
Les listes suivantes sont disponibles à la traduction vers le Surui, afin de constituer rapidement une liste Surui de base de 6 à 8000 mots Surui :
 
 
 
* [[List:Sru/words-by-frequency-02001-to-04000|List:Sru/words-by-frequency-02001-to-04000]]
 
* [[List:Sru/words-by-frequency-04001-to-06000|List:Sru/words-by-frequency-04001-to-06000]]
 
* [[List:Sru/words-by-frequency-06001-to-08000|List:Sru/words-by-frequency-06001-to-08000]]
 
* [[List:Sru/words-by-frequency-08001-to-10000|List:Sru/words-by-frequency-08001-to-10000]]
 
  
 +
== List cleaning ==
 +
[[File:LinguaLibre 2022 Paris Surui training-02.jpg|thumb|300px|A 4 hours training session was organized for Almir with :<br>1) Strategic consideration for long term collaboration, presentation of existing learning apps<br>2) Discussion around Surui vocabulary and variability, clarifying the focus on word stems only.<br>3) Presentation of LinguaLibre and first trial usage with 100 words (work not uploaded to Commons)<br>4) Pause / Cleaning of the provided list<br>5) Productive recording session with audios sent to Commons.]]
 +
A local bilingual teacher started to translate the first list of 2,000 common Portuguese words. There was, naturally in this first work, some formating error to refine :
  
;Coordination
+
{| class="wikitable"
Un second email est en cour d'écriture pour les aspects de coordinations relatif au 20 Mai 2022 à Paris.
+
! Initial work by translator || Structure || Error description || Correct formatting || Structure
 +
|-
 +
| # Monkey → Macaca/Macaco || # L1 → L2/L2 || Multiple L2 inlined: must be on several lines || # Monkey → Macaca<br># Monkey → Macaco || # L1 → L2<br># L1 → L2
 +
|-
 +
| # Parrot → Papagaio || # L2 → L1 || Inverted position || # Papagaio → Parrot  || # L1 → L2
 +
|-
 +
| # Monkey →Macaca || # L1 →L2 || Broken formating: separator space removed || # Monkey → Macaca  || # L1 → L2
 +
|}
  
Bien à vous,
+
== To push further ==
 +
The source Portuguese list covers 20,000+ common words to got further. The following lists are available for translation into Surui, in order to quickly build a basic Surui vocabulary of 9,000 Surui words (assuming 10% words will be skipped) :
  
 +
* [[List:Sru/words-by-frequency-02001-to-04000]]
 +
* [[List:Sru/words-by-frequency-04001-to-06000]]
 +
* [[List:Sru/words-by-frequency-06001-to-08000]]
 +
* [[List:Sru/words-by-frequency-08001-to-10000]]
  
 
== See also ==
 
== See also ==
 
* [[Help:Homographs]]
 
* [[Help:Homographs]]
* [[Lists]]
+
* [[Help:Lists]]
[[Category:Lingua Libre:Help]]
+
 
 +
{{Helps}}

Latest revision as of 18:15, 14 September 2022

Draft
Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg


Community

Almir Surui, 2012, receiving a culture price from the Brazilian government. Almir is the native speaker willing to audio-document his Surui language using LinguaLibre.

In this case study, the community we work with here is the Surui people, located in North-Western Amazonia.
It's a small population of 30 Amazonia forest's villages and 1,600 native speakers.
Suruis speak both their own Surui language and Portuguese language as the national language of Brazil. Levels of bilingual mastery varies.
Westernization is radically changing daily practices, culture and language.

Almir Surui, their elected chief, is our contact and has shown willingness to lead such effort.
Recent effort to preserve and revitalize their language are underway : they recently normalized Surui language's writing, elementary school is in Surui language, with associated new books.

There is no existing wordlist to reuse to record in Lingualibre.

Identify your source list

Given their second language is Portuguese…
Given academic researches recommending word list sorted by frequency…
And given we have such list in Portuguese…
We will work with Surui natives to translate such Portuguese list into Surui :

The L1 source:

Our L1 → L2 working page:

Identifying the translators

Almir has, as requested, identified a translator.

This translator will need to either translate :

  • online via the "EDIT" button (you must create a Wikipedia account)
  • offline, and forward the results by email.

Linguistic biases

Surui language being very different from Portuguese, we can advance in phases :

  • 1st phase (quantitative): bluntly translating from Portuguese to Surui is adapted to a first massive effort, in order to produce 90% of the common vocabulary.
  • 2nd phase (qualitative): it will naturally be appropriate to complete these foreign-inspired lists with lexicons and concepts specific to your language and its environment. This supplement is a work of a different nature.

Qualitatively, this whole will constitute a solid bilingual Surui <=> Portuguese vocabulary, and with Lingualibre, a multimedia dictionary.

Translation rules for lists

The Surui list will ultimately be a single Surui wordlist ready to save. A few rules must be taken into account to properly carry out this effort.

1) The source language is called L1 for « Language 1 » (L1 = Portuguese), we add to it the target language nicknamed L2 for « Language 2 » (L2 = Surui)

2) The format of our vocabulary list is therefor as follows:

  • L1 → L2
  • i.e.: Portuguese → Surui

3) Keep the sign, it's an important separator.

4) To the Portuguese L1 word on the left, we add as the L2 translation on the right its most common synonym. Example for L1 Portuguese → L2 English:

* que → that
* a → a
* o → the
* de → from
* não → no
* é → is
* e → and
* um → one
* para → to
* eu → me
* se → if
* me → me
* no →
* uma → one
* está → is
* por → by
* com → with
* os → the
* do → from
* te → you
* em → in
* ele → he
* como → like

5) If an L1 word there is no L2 translation, go to the next word, because we want to translate as many words as possible. Example, "Arranha-céu" is ignored:

* uma → one
* Arranha-céu →
* como → like

6) If for an L1 word several very common L2 translations exist, duplicate the line and translate:

* como → like
* como → as
* como → similar to
* Monkey → Macaca
* Monkey → Macaco
* Monkeys → Macacas
* Monkeys → Macacos

7) If for an L1 word too many L2 translations exist, by variations, add only the base. Example for verb variations for English L1 and Portuguese L2:

* be → ser
* make → fazer
* move → mover
* build → construir

We only translate the core base, we do not translate variations.

* build → construir O
* build → construo X
* build → constrói X
* build → construiu X
* build → construímos X
* build → constroem X
* build → constroem X
* …

8) If there is no L1 word, and there is a specific L2 word, add a line and a word after the arrow. Example, if Portuguese has no word for "Penguin", then:

* Papagaio → Parrot
* → Penguin
* Monkey → Macaco

9) Secret: the "empty sides" produced in 5) and 8) can be completed later.

List cleaning

A 4 hours training session was organized for Almir with :
1) Strategic consideration for long term collaboration, presentation of existing learning apps
2) Discussion around Surui vocabulary and variability, clarifying the focus on word stems only.
3) Presentation of LinguaLibre and first trial usage with 100 words (work not uploaded to Commons)
4) Pause / Cleaning of the provided list
5) Productive recording session with audios sent to Commons.

A local bilingual teacher started to translate the first list of 2,000 common Portuguese words. There was, naturally in this first work, some formating error to refine :

Initial work by translator Structure Error description Correct formatting Structure
# Monkey → Macaca/Macaco # L1 → L2/L2 Multiple L2 inlined: must be on several lines # Monkey → Macaca
# Monkey → Macaco
# L1 → L2
# L1 → L2
# Parrot → Papagaio # L2 → L1 Inverted position # Papagaio → Parrot # L1 → L2
# Monkey →Macaca # L1 →L2 Broken formating: separator space removed # Monkey → Macaca # L1 → L2

To push further

The source Portuguese list covers 20,000+ common words to got further. The following lists are available for translation into Surui, in order to quickly build a basic Surui vocabulary of 9,000 Surui words (assuming 10% words will be skipped) :

See also

Lingua Libre Help pages
General help pages Help:InterfaceHelp:Your first recordHelp:Choosing a microphoneHelp:Configure your microphoneHelp:TranslateHelp:LangtagsLinguaLibre:Language codes systems used across LinguaLibreLinguaLibre:List of languages
Linguistic help pages Help:Add a new languageHelp:HomographsHelp:List translationHelp:Ethics
Lists help pages Help:Create your own listsHelp:How to create a frequency list?Help:Why wordlists matter?Help:Swadesh listsHelp:ListsHelp:Create a new generator
Events, Outreach Lingualibre:EventsLingualibre:RolesLingualibre:WorkshopsLingualibre:HackathonLingualibre:Interested communitiesLingualibre:Events/2022 Public Relations CampaignLingualibre:MailingLingualibre:JargonLingualibre:AppsLingualibre:CitationsService civique 2022-2023
Strategy Lingualibre 2022 Review (including outreach)2022-2023 Lingualibre wishlist • {{Wikimedia Language Diversity/Projects}} • Speakers map • Voices gender • StatsLingua Libre SignIt/2022 report • {{Grants}}