LinguaLibre talk
Difference between revisions of "Citations"
(2 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== Massively hyper lingual projects == | == Massively hyper lingual projects == | ||
− | + | === Network === | |
+ | * [https://www.connectedpapers.com/main/767dcc48c7ad2c943f3c1a25c46b873e7b8b3bc8/Glot500%3A-Scaling-Multilingual-Corpora-and-Language-Models-to-500-Languages/graph Glot500] | ||
+ | === Alphabet === | ||
* https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages | * https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages | ||
* https://research.google/pubs/pub46952/ cleaning them up; open-sourced in https://arxiv.org/abs/2103.15845 | * https://research.google/pubs/pub46952/ cleaning them up; open-sourced in https://arxiv.org/abs/2103.15845 | ||
Line 7: | Line 9: | ||
* https://arxiv.org/abs/2205.03983 building machine translation systems from them; blog post https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html | * https://arxiv.org/abs/2205.03983 building machine translation systems from them; blog post https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html | ||
* https://arxiv.org/abs/2305.13516 https://huggingface.co/spaces/mms-meta/MMS | * https://arxiv.org/abs/2305.13516 https://huggingface.co/spaces/mms-meta/MMS | ||
− | + | * https://aclanthology.org/2024.lrec-main.331/ review of available languages resources | |
+ | * https://aclanthology.org/2022.lrec-1.538/ Writing system and speaker demographics for 2,800+ language | ||
+ | * https://aclanthology.org/2024.lrec-main.921/ Consolidated metadata for 7000 languages | ||
+ | ** LinguaMeta : https://github.com/google-research/url-nlp/tree/main/linguameta | ||
+ | |||
+ | === Facebook === | ||
+ | * https://ai.meta.com/blog/multilingual-model-speech-recognition/ Introducing speech-to-text, text-to-speech, and more for 1,100+ languages | ||
+ | * https://arxiv.org/abs/2305.13516 Scaling Speech Technology to 1,000+ Languages | ||
+ | * https://arxiv.org/abs/2305.12182 Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages | ||
+ | |||
+ | === Others === | ||
+ | * https://www.semanticscholar.org/paper/e4aa101556fc5b238a88d99c07c1055fe3bc4764 Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages |
Latest revision as of 15:53, 11 July 2024
Massively hyper lingual projects
Network
Alphabet
- https://research.google/pubs/pub47206/ for mining wordlists (Unilex-style) from 2,000+ languages
- https://research.google/pubs/pub46952/ cleaning them up; open-sourced in https://arxiv.org/abs/2103.15845
- https://research.google/pubs/pub49814/ using these wordlists to find sentences using our web crawler
- https://research.google/pubs/pub50211/ cleaning up web-crawled text
- https://arxiv.org/abs/2205.03983 building machine translation systems from them; blog post https://ai.googleblog.com/2022/05/24-new-languages-google-translate.html
- https://arxiv.org/abs/2305.13516 https://huggingface.co/spaces/mms-meta/MMS
- https://aclanthology.org/2024.lrec-main.331/ review of available languages resources
- https://aclanthology.org/2022.lrec-1.538/ Writing system and speaker demographics for 2,800+ language
- https://aclanthology.org/2024.lrec-main.921/ Consolidated metadata for 7000 languages
- https://ai.meta.com/blog/multilingual-model-speech-recognition/ Introducing speech-to-text, text-to-speech, and more for 1,100+ languages
- https://arxiv.org/abs/2305.13516 Scaling Speech Technology to 1,000+ Languages
- https://arxiv.org/abs/2305.12182 Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages
Others
- https://www.semanticscholar.org/paper/e4aa101556fc5b238a88d99c07c1055fe3bc4764 Taxi1500: A Multilingual Dataset for Text Classification in 1500 Languages