Interested communities

Region	Language(s)	Organisation	Contact	Project and Comments
Northern America	(17) Native Americans	SIL	_	_
South America	(1) Native American > Surui : srn	Aquaverde	Lili: Yug ; Association AquaVerde: Almir Surui, Thomas Pizer.	Ongoing, see meta:LinguaLibre/Atelier_de_formation_à_LinguaLibre_pour_le_Surui/en
Europe	(1) Catalan	?	Lili: Yug ; Perpignan: Susanna Peidro I Sutil	Militante du Catalan en France, très motivée.
World	(0) Wiki Journalist	Wikimedia	User:Uprising Man	African Wikimedian with journalism skills, willing to co-author article on Sign language with Yug.
Europe	(1) French Sign Language	?	Lili: Yug ; Toulouse: User:Seejayer	Created Sign2Sign peertube (about, Charte) Met at Forom^[1]
?	(?) Sign Languages	DeepMind, Berkeley AI	Kayo Yin (en/fr/ja/zh)	* https://kayoyin.github.io
World	(8) Sign Languages	Wikisigns	?	http://wikisigns.org / tw: @wikisigns
West	(3) English/American Sign Languages	en:WP:WikiProject Deaf	?	2022.09: Minimal contact meta:Talk:Deaf Wikimedians
Central Asia	(1) Kazakh Sign Language	Nazarbayev University	?	Met at LREC^[1]
Africa	(1) Ghana Sign Language	Special Education Department at the University of Education Winneba Wikimedia Ghana,		Sign Language and Wikipedia: Ghanaian hearing-impaired students undergo training on Wiki projects
Europe	(1) Ladino	CollectivaT, Barcelona	Lili:Yug, CollectivaT: Alp on CollectivaT-dev.cat	Met at LREC^[1] Small team who created a Ladino translation tool and text-to-speech Vivid example of what we could do: machine learning-based text-to-speech. Possible partner for {{Grants table}} "Alliance fund". Summary: Highly advanced project on endangered diaspora Jewish language with 2000 speakers. Funded by Europe and technically as good as Gascons or better. They also use Tacotron2, an easy Google machine learning tool, to create translation and text to speech system. Website: https://data.sefarad.com.tr : CC data ! Translate and t2s: https://translate.sefarad.com.tr Uses Tacotron2 ! Github: https://github.com/CollectivaT-dev/ /judeo-espanyol-resources/blob/main/resources/dictionaries/diksionaryo_ladino_espanyol.txt Team (size): few people.
Cameroun	200	Wikimedia Cameroun	ArnoBOUJIKA	microfi, lingualibre,
Rwanda	Langues (3)	Wikimedia Rwanda	Cnyirahabihirwe123 - WMRD cofounder	Toute aide.
Europe	(1) Breton	Bretagne numerique	David Lesvenan, Laurence Le Goff	Besoin: Accompagnement atelier.
Europe	(1) Breton	Research center IRISA.fr	Lili: VIGNERON; IRISA: Annie Foret	Annie.foret on Irisa : travaille sur un annotateur syntactic tree pour le breton. Pourrait organiser l'audio documentation du breton dans sont labo a Rennes. Phase 1: Ecrire email + demander de 1000+30 mots pour Lingualibre. Dastum: collecte de chansons bretonnes en audio. (Sous droits d'auteur) Melalie Jouiteau, linguiste CNRS: etat de l'art des resources en breton arbres.iker.cnrs.fr Wikigrammaire du Breton
Europe	(1) Tatar	?	?	Corpus: https://www.corpus.tatar/stat_en.htm
Sub-Sahara Africa	(1) Kenya > en:Kikuyu language/Q329	Wikimedia	Lili: Yug ↔ Ngangaesther en:User_talk:Ngangaesther#Follow_up_!	Wordlist: no wordlist → translation suggested.
N. Africa and Middle East	(7) Arabic languages	Palestinian Arabic research	Mustafa Jarrar	Libyan - Sudanese - Yemeni - Palestinian - Levantine - Iraqi - Egyptian (?) Hopefully make a presentation to them tomorrow. Met at LREC^[1]
Southern Asia	(17) Indian languages	Universal Knowledge Core (UKC) Tentro University India Europe	UKC: Nandu Chandra Nair PhD	Site: http://Ukc.disi.unitn.it Send email to start recording, enlighten diversity. Video: IndoUKC: a Concept-Centered Indian Multilingual Lexical Resource Met at LREC^[1]
Southern Asia	(1+) Tibetans	Cambridge, SOAS, Dublin.		Repository: https://github.com/lothelanor/actib
Eastern Asia	(17) Taiwan Aboriginal Languages	Center for Aboriginal Studies, NCCU	vickylin771015 (2016) or Ûi-iū Kán <iyumu> (2018-present)	Site: https://web.alcd.center Center for Aboriginal Studies, National Chengchi University (NCCU), is an academic team dedicated to Taiwanese aboriginal studies, maintaining and animating 16 Wikipedias in native languages. Founded in 1999, initially “Center for Aboriginal Languages, Cultures and eDucation” (ALCD). 2019
Eastern Asia	(1) Ancient Korean	?	Park Chanjun	Neural Machine Translation Repository: https://parkchanjun.github.io Site: kunmt.org
Global actors	(1000s) multiple, native Americans	SIL	Aaron_Hemphill	See also Lingualibre:Apps
Global actors	(1000s) multiple	Panlex.org	?	Seems to be web scrapping or low quality data.
Global actors	(1000s) multiple	Google	WMFR: User:Adélaïde Calais WMFr ; Google: Daan van Esch
Global actors	(100s?) multiple	Facebook	WMFR/FB: User:Exilexi	Wikimedian and Facebook employee in Paris. Knows the i18n team.
Global actors	(10s) multiple	Endangered and Lesser-resourced Languages in Eurasia (EURALI)	?	Met at LREC^[1]
Global actors	(10s?) multiple	International Standard Language Resource Number (ISLRN, www.islrn.org)	?	Met at LREC^[1]
Global actors	(10s?) multiple	Global Alliance for Lexicography (Globalex)	?	Met at LREC^[1]
Global actors	(10s?) multiple	ELEXIS: European Lexicographic Infrastructure	?	Met at LREC^[1]
Global actors	(10s?) multiple	NexusLinguarum – European network for Web-centred linguistic data science	?	Met at LREC^[1]
Global actors	(10s?) multiple	EURALEX Conference:	?	Met at LREC^[1]
Global actors	(375) multiple	Corpora by University of Leipzig	?	Contains 375 languages and far more corpora, extracted from online resources, including wikipedias. Re-run periodically (!) Partly copyrighted, partly CC-BY. Download page has CC-BY sentenses corpora, from which frequency list can be created.
Global actors	(1001) multiple	UNILEX, Google/UNICODE's freelance	Lili: Yug ; Unilex: ?	Contains 1001 languages and their frequency lists MIT-like license. One shoot, barely maintained.
Global actors	(130+) multiple	[1]	No contact	Collects researchers' treebanks. Has 130 languages Could have few rare languages
Global actors	(?) multiple	meta:Oral Culture Transcription Toolkit	Amrit Sufi	Has documentation for consensual recording with local speakers.
France	(1) French, other?	Massalia VoX	@FiloSophie	Comment: French association with diversity and languages-enthusiastic focus, can provides rentable rooms for recording session. See Location de salle. Address: Massalia VoX, 15 boulevard de la liberté, Marseille https://goo.gl/maps/1PhRX4b6EJK3xoWb8
France	(?) multiple	Sorosoro Team	Rozenn Milin	CC-BY-NC-ND
(1000+) multiple	Facebook * Introducing speech-to-text, text-to-speech, and more for 1,100+ languages * MMS: Scaling Speech Technology to 1000+ languages demo		?

Researchers

Below are researchers who did not express interests but who could be interested

Karën Fort : Maîtresse de conférences HDR en informatique, spécialisée en traitement automatique des langues, création de ressources et en éthique du TAL.

Word lists by Google / Unilex researches

https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
- Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
https://research.google/pubs/pub46952/ cleaning them up;
- Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
https://arxiv.org/abs/2103.15845 open-sourced;
- Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
- Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
https://research.google/pubs/pub50211/ cleaning up web-crawled text
- Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
https://arxiv.org/abs/2205.03983 building machine translation systems from them
- Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
- "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.

References

↑ ^1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 ^1.10 Pad LREC 2022

[LREC-1] 1.00 ^1.01 ^1.02 ^1.03 ^1.04 ^1.05 ^1.06 ^1.07 ^1.08 ^1.09 ^1.10 Pad LREC 2022

[1]

Lingua Libre Help pages
General help pages	Help:Interface • Help:Your first record • Help:Choosing a microphone • Help:Configure your microphone • Help:Translate • Help:Langtags • LinguaLibre:Language codes systems used across LinguaLibre • LinguaLibre:List of languages
Linguistic help pages	Help:Add a new language • Help:Homographs • Help:List translation • Help:Ethics
Lists help pages	Help:Create your own lists • Help:How to create a frequency list? • Help:Why wordlists matter? • Help:Swadesh lists • Help:Lists • Help:Create a new generator
Events, Outreach	Lingualibre:Events • Lingualibre:Roles • Lingualibre:Workshops • Lingualibre:Hackathon • Lingualibre:Interested communities • Lingualibre:Events/2022 Public Relations Campaign • Lingualibre:Mailing • Lingualibre:Jargon • Lingualibre:Apps • Lingualibre:Citations • Service civique 2022-2023
Strategy	Lingualibre 2022 Review (including outreach) • 2022-2023 Lingualibre wishlist • {{Wikimedia Language Diversity/Projects}} • Speakers map • Voices gender • Stats • Lingua Libre SignIt/2022 report • {{Grants}}

Researchers

See also

References