LinguaLibre

Interested communities

Revision as of 17:51, 18 July 2022 by Yug (talk | contribs)

Interested communities gather and share pointers toward linguistic communities who expressed interest toward Lingualibre recording. Those communities could be individuals or organisations, from Wikimedia, academia, civilian associations and cultural activists. These pointers are shared below in order to avoid loss of those valuable contacts via a low on our side. Emails are not to be displayed, but organisation, individual names and links to webpages and discussion are welcome.


Region Language(s) Organisation Contact Project and Comments
Northern America (17) Native Americans SIL _ _
South America (1) Native American > Surui : srn Aquaverde Lili: Yug ; Association AquaVerde: Almir Surui, Thomas Pizer. Ongoing, see meta:LinguaLibre/Atelier_de_formation_à_LinguaLibre_pour_le_Surui/en
Europe (1) Catalan ? Lili: Yug ; Toulouse:
Europe (1) French Sign Language ? Lili: Yug ; Toulouse: User:Seejayer
Europe (1) Ladino CollectivaT, Barcelona Lili:Yug, CollectivaT: Alp on CollectivaT-dev.cat
  • Met at LREC[1]
  • Small team who created a Ladino translation tool and text-to-speech
  • Vivid example of what we could do: machine learning-based text-to-speech.
  • Possible partner for {{Grants table}} "Alliance fund".
  • Summary: Highly advanced project on endangered diaspora Jewish language with 2000 speakers. Funded by Europe and technically as good as Gascons or better. They also use Tacotron2, an easy Google machine learning tool, to create translation and text to speech system.
  • Website: https://data.sefarad.com.tr : CC data !
  • Github: https://github.com/CollectivaT-dev/
    • /judeo-espanyol-resources/blob/main/resources/dictionaries/diksionaryo_ladino_espanyol.txt
  • Team (size): few people.
Europe (1) Breton Research center IRISA.fr Lili: VIGNERON; IRISA: Annie Foret
  1. Annie.foret on Irisa : travaille sur un annotateur syntactic tree pour le breton.
    Pourrait organiser l'audio documentation du breton dans sont labo a Rennes.
    Phase 1: Ecrire email + demander de 1000+30 mots pour Lingualibre.
  2. Dastum: collecte de chansons bretonnes en audio. (Sous droits d'auteur)
  3. Melalie Jouiteau, linguiste CNRS: etat de l'art des resources en breton
    • arbres.iker.cnrs.fr
    • Wikigrammaire du Breton
Sub-Sahara Africa (1) Kenya > en:Kikuyu language/Q329 Wikimedia Lili: Yug ↔ Ngangaesther en:User_talk:Ngangaesther#Follow_up_!
  • Wordlist: no wordlist → translation suggested.
N. Africa and Middle East Palestinian Arabic research (7) ? ?
Southern Asia (17) Indian languages ? ?
Central Asia (1) Kazakh Sign Language ? ?
Eastern Asia (17) Taiwan Aboriginal Languages Center for Aboriginal Studies, NCCU vickylin771015 (2016) or Ûi-iū Kán <iyumu> (2018-present)
  • Site: https://web.alcd.center
  • Center for Aboriginal Studies, National Chengchi University (NCCU), is an academic team dedicated to Taiwanese aboriginal studies, maintaining and animating 16 Wikipedias in native languages. Founded in 1999, initially “Center for Aboriginal Languages, Cultures and eDucation” (ALCD).
Global actors (1000s) multiple, native Americans SIL Aaron_Hemphill See also Lingualibre:Apps
Global actors (1000s) multiple Google WMFR: User:Adélaïde Calais WMFr ; Google: Daan van Esch
Global actors (100s?) multiple Facebook WMFR/FB: User:Exilexi Wikimedian and Facebook employee in Paris. Knows the i18n team.
Global actors (10s) multiple Endangered and Lesser-resourced Languages in Eurasia (EURALI) ?
Global actors (10s?) multiple International Standard Language Resource Number (ISLRN, www.islrn.org) ?
Global actors (10s?) multiple Global Alliance for Lexicography (Globalex) ?
Global actors (10s?) multiple ELEXIS: European Lexicographic Infrastructure ?
Global actors (10s?) multiple NexusLinguarum – European network for Web-centred linguistic data science ?
Global actors (10s?) multiple EURALEX Conference: ?
Global actors (375) multiple Corpora by University of Leipzig ?
  • Contains 375 languages and far more corpora, extracted from online resources, including wikipedias.
  • Re-run periodically (!)
  • Partly copyrighted, partly CC-BY.
  • Download page has CC-BY sentenses corpora, from which frequency list can be created.
Global actors (1001) multiple UNILEX, Google/UNICODE's freelance Lili: Yug ; Unilex: ?
  • Contains 1001 languages and their frequency lists
  • MIT-like license.
  • One shoot, barely maintained.
Global actors (130+) multiple [1] No contact
  • Collects researchers' treebanks.
  • Has 130 languages
  • Could have few rare languages

See also

References