LinguaLibre

Difference between revisions of "Citations"

Citations gathers all citations of LinguaLibre by external actors.

(Add GIPFA study that used 80K LL samples)
Line 10: Line 10:
 
* https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf
 
* https://elex.link/elex2021/wp-content/uploads/2021/08/eLex_2021_38_pp588-597.pdf
  
=== Impacting Lingualibre ===
+
* https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
 +
** Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
 +
* https://research.google/pubs/pub46952/ cleaning them up;
 +
** Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
 +
* https://arxiv.org/abs/2103.15845 open-sourced;
 +
** Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
 +
* https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
 +
** Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
 +
* https://research.google/pubs/pub50211/ cleaning up web-crawled text
 +
** Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
 +
* https://arxiv.org/abs/2205.03983 building machine translation systems from them
 +
** Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
 +
* https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
 +
** "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.
  
 
== See also ==
 
== See also ==

Revision as of 21:09, 30 June 2022

Draft
Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg

This page is a work in progress.

Press

France

World

Wikimedia Newsrooms

Academic

Lingualibre

  • https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
    • Prasad, Manasa; Breiner, Theresa; Esch, Daan van (2018). "Mining Training Data for Language Modeling across the World's Languages" (PDF). Proceedings of the 6th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2018).
  • https://research.google/pubs/pub46952/ cleaning them up;
    • Chua, Mason; Esch, Daan van; Coccaro, Noah; Cho, Eunjoon; Bhandari, Sujeet; Jia, Libin (2018). "Text Normalization Infrastructure that Scales to Hundreds of Language Varieties". Proceedings of the 11th edition of the Language Resources and Evaluation Conference.
  • https://arxiv.org/abs/2103.15845 open-sourced;
    • Zupon, Andrew; Crew, Evan; Ritchie, Sandy (2021-03-29). "Text Normalization for Low-Resource Languages of Africa". arXiv:2103.15845 [cs].
  • https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
    • Caswell, Isaac; Breiner, Theresa; Esch, Daan van; Bapna, Ankur (2020). "Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus".
  • https://research.google/pubs/pub50211/ cleaning up web-crawled text
    • Kreutzer, Julia; Caswell, Isaac; Wang, Lisa; Wahab, Ahsan; Esch, Daan van; Ulzii-Orshikh, Nasanbayar; Tapo, Allahsera Auguste; Subramani, Nishant; Sokolov, Artem; Sikasote, Claytone; Setyawan, Monang (2022). "Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets". TACL.
  • https://arxiv.org/abs/2205.03983 building machine translation systems from them
    • Bapna, Ankur; Caswell, Isaac; Kreutzer, Julia; Firat, Orhan; van Esch, Daan; Siddhant, Aditya; Niu, Mengmeng; Baljekar, Pallavi; Garcia, Xavier; Macherey, Wolfgang; Breiner, Theresa (2022-05-16). "Building Machine Translation Systems for the Next Thousand Languages". arXiv:2205.03983 [cs].
  • https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html blog post
    • "Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate". Google AI Blog. Retrieved 2022-06-30.

See also

Lingualibre:Help