LinguaLibre
Difference between revisions of "Citations"
Citations gathers all citations of LinguaLibre by external actors.
(Add GIPFA study that used 80K LL samples) |
|||
Line 10: | Line 10: | ||
* https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf | * https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf | ||
− | + | * https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages | |
+ | ** Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018). | ||
+ | * https://research.google/pubs/pub46952/ cleaning them up; | ||
+ | ** Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference. | ||
+ | * https://arxiv.org/abs/2103.15845 open-sourced; | ||
+ | ** Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs]. | ||
+ | * https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler | ||
+ | ** Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus". | ||
+ | * https://research.google/pubs/pub50211/ cleaning up web-crawled text | ||
+ | ** Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL. | ||
+ | * https://arxiv.org/abs/2205.03983 building machine translation systems from them | ||
+ | ** Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs]. | ||
+ | * https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post | ||
+ | ** "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30. | ||
== See also == | == See also == |
Revision as of 21:09, 30 June 2022
Press
France
World
Wikimedia Newsrooms
Academic
Lingualibre
- https://www.researchgate.net/publication/361565674_Crowd-sourcing_for_Less-resourced_Languages_Lingua_Libre_for_Polish
- https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf
- https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
- Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
- https://research.google/pubs/pub46952/ cleaning them up;
- Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
- https://arxiv.org/abs/2103.15845 open-sourced;
- Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
- https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
- Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
- https://research.google/pubs/pub50211/ cleaning up web-crawled text
- Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
- https://arxiv.org/abs/2205.03983 building machine translation systems from them
- Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
- https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
- "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.