LinguaLibre
Citations
Citations gathers all citations of LinguaLibre by external actors.
Press
France
World
Wikimedia Newsrooms
Academic
Lingualibre
- https://www.researchgate.net/publication/361565674_Crowd-sourcing_for_Less-resourced_Languages_Lingua_Libre_for_Polish
- Mathilde Hutin, Marc Allassonnière-Tang (2022), Crowd-sourcing for Less-resourced Languages: Lingua Libre for Polish
- https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf
- Xavier Marjou (2021), GIPFA: Generating IPA Pronunciation from Audio
Peripharic
Word lists by Google / Unilex researches
- https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
- Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
- https://research.google/pubs/pub46952/ cleaning them up;
- Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
- https://arxiv.org/abs/2103.15845 open-sourced;
- Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
- https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
- Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
- https://research.google/pubs/pub50211/ cleaning up web-crawled text
- Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
- https://arxiv.org/abs/2205.03983 building machine translation systems from them
- Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
- https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
- "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.