LinguaLibre
Difference between revisions of "Interested communities"
Interested communities gather and share pointers toward linguistic communities or actors who expressed interest toward LinguaLibre recording. Those communities could be individuals or organisations, from Wikimedia, academia, civilian associations and cultural activists. These pointers are shared below in order to avoid loss of those valuable contacts via a low 'bus factor' on our side. Emails are not to be displayed, but organisation, individual names and links to webpages and discussion are welcome.
(10 intermediate revisions by 2 users not shown) | |||
Line 137: | Line 137: | ||
|- | |- | ||
| France || (1) French, other? || [https://didac-ressources.eu/2021/03/24/massalia-vox-tiers-lieu-inclusif-notre-nouvelle-adresse/ Massalia VoX] || [[:meta:Special:EmailUser/FiloSophie|@FiloSophie]] || Comment: French association with diversity and languages-enthusiastic focus, can provides rentable rooms for recording session. See [http://didac-ressources.eu/wp-content/uploads/2021/03/espaces-et-grille-tarifaire-massaliavoX.pdf Location de salle].<br>'''Address:''' Massalia VoX, 15 boulevard de la liberté, Marseille https://goo.gl/maps/1PhRX4b6EJK3xoWb8 | | France || (1) French, other? || [https://didac-ressources.eu/2021/03/24/massalia-vox-tiers-lieu-inclusif-notre-nouvelle-adresse/ Massalia VoX] || [[:meta:Special:EmailUser/FiloSophie|@FiloSophie]] || Comment: French association with diversity and languages-enthusiastic focus, can provides rentable rooms for recording session. See [http://didac-ressources.eu/wp-content/uploads/2021/03/espaces-et-grille-tarifaire-massaliavoX.pdf Location de salle].<br>'''Address:''' Massalia VoX, 15 boulevard de la liberté, Marseille https://goo.gl/maps/1PhRX4b6EJK3xoWb8 | ||
+ | |- | ||
+ | | France || (?) multiple || Sorosoro [https://www.sorosoro.org/le-programme-sorosoro/le-conseil-scientifique/ Team] || [[:fr:Rozenn Milin|Rozenn Milin]] || [https://www.sorosoro.org/mentions-legales/ CC-BY-NC-ND] | ||
+ | |- | ||
+ | | Global actors || (1000+) multiple || Facebook<br>* [https://ai.meta.com/blog/multilingual-model-speech-recognition/ Introducing speech-to-text, text-to-speech, and more for 1,100+ languages]<br>* [https://huggingface.co/spaces/mms-meta/MMS MMS: Scaling Speech Technology to 1000+ languages demo] || || ? | ||
|} | |} | ||
+ | |||
+ | == Researchers == | ||
+ | :''Below are researchers who did not express interests but who could be interested. See also [[LinguaLibre talk:Citations]].'' | ||
+ | * [https://fr.linkedin.com/in/karenfort Karën Fort] : Maîtresse de conférences HDR en informatique, spécialisée en traitement automatique des langues, création de ressources et en éthique du TAL. | ||
+ | |||
+ | Word lists by Google / Unilex researches | ||
+ | * https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages | ||
+ | ** Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018). | ||
+ | * https://research.google/pubs/pub46952/ cleaning them up; | ||
+ | ** Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference. | ||
+ | * https://arxiv.org/abs/2103.15845 open-sourced; | ||
+ | ** Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs]. | ||
+ | * https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler | ||
+ | ** Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus". | ||
+ | * https://research.google/pubs/pub50211/ cleaning up web-crawled text | ||
+ | ** Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL. | ||
+ | * https://arxiv.org/abs/2205.03983 building machine translation systems from them | ||
+ | ** Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs]. | ||
+ | * https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post | ||
+ | ** "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30. | ||
+ | |||
+ | == To process == | ||
+ | * [https://en.wikiversity.org/wiki/OpenSpeaks OpenSpeaks] on Wikiversity. | ||
== See also == | == See also == |
Latest revision as of 08:35, 8 January 2024
Region | Language(s) | Organisation | Contact | Project and Comments |
---|---|---|---|---|
Northern America | (17) Native Americans | SIL | _ | _ |
South America | (1) Native American > Surui : srn | Aquaverde | Lili: Yug ; Association AquaVerde: Almir Surui, Thomas Pizer. | Ongoing, see meta:LinguaLibre/Atelier_de_formation_à_LinguaLibre_pour_le_Surui/en |
Europe | (1) Catalan | ? | Lili: Yug ; Perpignan: Susanna Peidro I Sutil | Militante du Catalan en France, très motivée. |
World | (0) Wiki Journalist | Wikimedia | User:Uprising Man | African Wikimedian with journalism skills, willing to co-author article on Sign language with Yug. |
Europe | (1) French Sign Language | ? | Lili: Yug ; Toulouse: User:Seejayer | |
? | (?) Sign Languages | DeepMind, Berkeley AI | Kayo Yin (en/fr/ja/zh) | * https://kayoyin.github.io |
World | (8) Sign Languages | Wikisigns | ? | http://wikisigns.org / tw: @wikisigns |
West | (3) English/American Sign Languages | en:WP:WikiProject Deaf | ? | 2022.09: Minimal contact meta:Talk:Deaf Wikimedians |
Central Asia | (1) Kazakh Sign Language | Nazarbayev University | ? |
|
Africa | (1) Ghana Sign Language | Special Education Department at the University of Education Winneba Wikimedia Ghana, |
Sign Language and Wikipedia: Ghanaian hearing-impaired students undergo training on Wiki projects | |
Europe | (1) Ladino | CollectivaT, Barcelona | Lili:Yug, CollectivaT: Alp on CollectivaT-dev.cat |
|
Cameroun | 200 | Wikimedia Cameroun | ArnoBOUJIKA | microfi, lingualibre, |
Rwanda | Langues (3) | Wikimedia Rwanda | Cnyirahabihirwe123 - WMRD cofounder | Toute aide. |
Europe | (1) Breton | Bretagne numerique | David Lesvenan, Laurence Le Goff | Besoin: Accompagnement atelier. |
Europe | (1) Breton | Research center IRISA.fr | Lili: VIGNERON; IRISA: Annie Foret |
|
Europe | (1) Tatar | ? | ? | |
Sub-Sahara Africa | (1) Kenya > en:Kikuyu language/Gikuyu (Q329) | Wikimedia | Lili: Yug ↔ Ngangaesther en:User_talk:Ngangaesther#Follow_up_! |
|
N. Africa and Middle East | (7) Arabic languages | Palestinian Arabic research | Mustafa Jarrar |
|
Southern Asia | (17) Indian languages | Universal Knowledge Core (UKC) Tentro University India Europe |
UKC: Nandu Chandra Nair PhD |
|
Southern Asia | (1+) Tibetans | Cambridge, SOAS, Dublin. |
| |
Eastern Asia | (17) Taiwan Aboriginal Languages | Center for Aboriginal Studies, NCCU | vickylin771015 (2016) or Ûi-iū Kán <iyumu> (2018-present) |
|
Eastern Asia | (1) Ancient Korean | ? | Park Chanjun |
|
Global actors | (1000s) multiple, native Americans | SIL | Aaron_Hemphill | See also Lingualibre:Apps |
Global actors | (1000s) multiple | Panlex.org | ? | Seems to be web scrapping or low quality data. |
Global actors | (1000s) multiple | WMFR: User:Adélaïde Calais WMFr ; Google: Daan van Esch | ||
Global actors | (100s?) multiple | WMFR/FB: User:Exilexi | Wikimedian and Facebook employee in Paris. Knows the i18n team. | |
Global actors | (10s) multiple | Endangered and Lesser-resourced Languages in Eurasia (EURALI) | ? |
|
Global actors | (10s?) multiple | International Standard Language Resource Number (ISLRN, www.islrn.org) | ? |
|
Global actors | (10s?) multiple | Global Alliance for Lexicography (Globalex) | ? |
|
Global actors | (10s?) multiple | ELEXIS: European Lexicographic Infrastructure | ? |
|
Global actors | (10s?) multiple | NexusLinguarum – European network for Web-centred linguistic data science | ? |
|
Global actors | (10s?) multiple | EURALEX Conference: | ? |
|
Global actors | (375) multiple | Corpora by University of Leipzig | ? |
|
Global actors | (1001) multiple | UNILEX, Google/UNICODE's freelance | Lili: Yug ; Unilex: ? |
|
Global actors | (130+) multiple | [1] | No contact |
|
Global actors | (?) multiple | meta:Oral Culture Transcription Toolkit | Amrit Sufi |
Has documentation for consensual recording with local speakers. |
France | (1) French, other? | Massalia VoX | @FiloSophie | Comment: French association with diversity and languages-enthusiastic focus, can provides rentable rooms for recording session. See Location de salle. Address: Massalia VoX, 15 boulevard de la liberté, Marseille https://goo.gl/maps/1PhRX4b6EJK3xoWb8 |
France | (?) multiple | Sorosoro Team | Rozenn Milin | CC-BY-NC-ND |
Global actors | (1000+) multiple | Facebook * Introducing speech-to-text, text-to-speech, and more for 1,100+ languages * MMS: Scaling Speech Technology to 1000+ languages demo |
? |
Researchers
- Below are researchers who did not express interests but who could be interested. See also LinguaLibre talk:Citations.
- Karën Fort : Maîtresse de conférences HDR en informatique, spécialisée en traitement automatique des langues, création de ressources et en éthique du TAL.
Word lists by Google / Unilex researches
- https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
- Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
- https://research.google/pubs/pub46952/ cleaning them up;
- Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
- https://arxiv.org/abs/2103.15845 open-sourced;
- Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
- https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
- Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
- https://research.google/pubs/pub50211/ cleaning up web-crawled text
- Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
- https://arxiv.org/abs/2205.03983 building machine translation systems from them
- Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
- https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
- "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.
To process
- OpenSpeaks on Wikiversity.
See also
- LinguaLibre:Workshops, see "Outreach"
- LinguaLibre:Mailing
- LinguaLibre:Hackathon
- Help:List translation
- https://wikitongues.org/cohort/