LinguaLibre

Difference between revisions of "Technical board"

Line 43: Line 43:
 
== Datasets has become super slow ? ==
 
== Datasets has become super slow ? ==
 
I try to interpret and understand how /datasets are generate.
 
I try to interpret and understand how /datasets are generate.
* On April 2020, French dataset of about 100,000 audios is processed in '''one hour'''.
+
* On April 2020, French dataset of about 100,000 audios is processed in '''51 minutes'''.
 
* On February 2021, Bengali dataset of about 50,000 audios is processed in '''18 hours'''.
 
* On February 2021, Bengali dataset of about 50,000 audios is processed in '''18 hours'''.
  

Revision as of 00:12, 13 February 2021

Draft
Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg

This page is a work in progress.
Welcome to Lingua Libre Technical board !
Where to start?
  • Local developments are easy. You can customize your css and your js, including creating a local WikiJS script, even with limited edit rights.
  • LinguaLibre Bot (Python, github) is a high-impact project. Help is needed to authorize it on more wikis.
  • Join us on Phabricator and GitHub.
Skills we look for…
  • Developers: we especially look for Bot Masters (Python, NodeJS), SPARQL experts, VueJS developpers, issues coordinators, but everyone is welcome.
  • Projects coordinators: we also look for organizers of recording/hacking meet-ups, who are able to build a network with language learning, language conservation and NLP actors.
Happy Coding!
  • Please announce your hacking project here to raise awareness and gather feedbacks.
  • Most of our actions remain small in scope and volunteer-based. In case your project is large enough, you could learn about some of the funding options.
Development & Technical reports
Flash Technical News
  • January 25th, 2023: the latest Github revision has been pushed on the production server. Kurdish Wiktionary is now supported. Oriya Wiktionary will be very soon. Support of more Wiktionary versions should follow.

Please visit LinguaLibre:About to learn more about the project.

Migration of technical contents

Hello all, Please help migrate technical contents from the main LinguaLibre:Chat room to here. Yug (talk) 18:49, 12 February 2021 (UTC)

2021 Github refreshing : call for volunteers and discussion

See also Github.com/lingua-libre

Hello all,
Since November 2020 there is an ongoing effort to clean up, document, fix the 11 github repositories upon which LinguaLibre.org stands. A summary is available on the main forum and will be migrated here shortly. This section will focus on gathering users with development skills and discuss about possible fields of action (repositories). We especially look for Bot Masters (Python, NodeJS), Sparql expert, VueJS developpers, issues coordinators. Yug (talk) 15:57, 12 February 2021 (UTC)

Early 2021 codings : Wikivalley & volunteers communication board !

WikiValley have been selected to make a notable technical push deeper on the LinguaLibre Suite where volunteer developers are not enough. They will coordinate with volunteers developers in order to smooth everyone's work, avoid duplicate work and git conflicts. The Start and End columns below are especially important, please keep them up to date, respect them, or change them. If you need to work on a repository under work, contact the developer listed there and organize as needed. Our objective here is to keep clarity and to progress smoothly. Please prefer communicating here within subsections so we can all be somehow aware of how are things going. Yug (talk) 15:56, 12 February 2021 (UTC)

Note: Volunteers started working around in December. WikiValley around Feb. 11th. Yug (talk) 15:56, 12 February 2021 (UTC)
Past developments
Start End Contacts/dev Team Repository Advancement & result so far.
2021/02/01 2021/02/10 Yug Volunteers SignIt Get back control (access right) ; fix video query ; test locally ; publish new version on Mozilla store
→ Fixed Firefox extension
2021/02/01 2021/02/16? Yug
Michael
Volunteers
WM-France
/operations
/CommonDownloadTool
Explore possible breakpoints ; identify likely cause ; fix ; deploy ; run
→ Fixed https://lingualibre.org/datasets/
2021/02/? 2021/02/11 Vigneron
WikiValley
Wikivalley QueryViz
Other?
Explore possible breakpoints ; identify cause ; fix ; deploy ; inquire on numbers differences
→ Fixed LinguaLibre:Stats
Current developments
2021/02/01 2022/01/01 Poslovitch Volunteers Lingua-Libre-Bot Maintain, update and operate the bot.
2021 Q1 [WIP]: Refactor the bot to ease implementations of additional Wiktionaries.
2021/02/? 2021/02/? WikiLucas00
Yug
Volunteers CustomSubtitle
BlueLL
Explore Subtitle's ribbon's bug ; identify cause.
Planned developments
2021/02/01 2021/03/?? Poslovitch Volunteers /operations
/CommonDownloadTool
Project: Explore datasets scripts and queries. May require SPARQL assistance.

User box ?

Babel user information
mar
mr-N या सदस्याला मराठी चे स्थानिक स्तराचे ज्ञान आहे.
cmn-1 This user has basic knowledge of Mandarin Chinese.
Users by language

It may be cool to create an userbox "dev" {{Userbox-dev}}, on the model of {{Userbox-records}}, with Python, Javascript, PHP, VueJS, Wikimedia Bot as specific sub-categorization ? Yug (talk) 15:59, 12 February 2021 (UTC)

Datasets has become super slow ?

I try to interpret and understand how /datasets are generate.

  • On April 2020, French dataset of about 100,000 audios is processed in 51 minutes.
  • On February 2021, Bengali dataset of about 50,000 audios is processed in 18 hours.

What do I miss ? Yug (talk) 00:09, 13 February 2021 (UTC)


Zip file Date Bits
lingualibre_full.zip 2019-May-17:01:18 1989664440
Q101-srr-Serer.zip 2019-Nov-05:03:09 14967
Q113-cmn-Mandarin_Chinese.zip 2019-Nov-05:03:09 112613
Q115107-bcl-Central_Bikol.zip 2019-Nov-05:03:09 166323
Q127-tam-Tamil.zip 2019-Nov-05:03:09 154352
Q130-zho-Chinese.zip 2019-Nov-05:03:10 2724328
Q131-hye-Armenian.zip 2019-Nov-05:03:10 824117
Q141-cym-Welsh.zip 2019-Nov-05:03:10 12905993
Q154-amh-Amharic.zip 2019-Nov-05:03:11 2653977
Q165-hat-Haitian_Creole.zip 2019-Nov-05:03:11 233588
Q169-tgl-Tagalog.zip 2019-Nov-05:03:11 77198
Q170137-mos-Mossi.zip 2019-Nov-05:03:11 1158142
Q205-gre-Greek.zip 2019-Nov-05:03:11 239390
Q231-myv-Erzya.zip 2019-Nov-05:03:21 205878
Q242-fon-Fon.zip 2019-Nov-05:03:21 1538614
Q258-nso-Northern_Sotho.zip 2019-Nov-05:03:24 774299
Q311-oci-Occitan.zip 2019-Nov-05:03:33 511332485
Q318-bam-Bambara.zip 2019-Nov-05:03:33 277786
Q321-gaa-Ga.zip 2019-Nov-05:03:33 3247380
Q336-ori-Odia.zip 2019-Nov-05:03:34 38697693
Q339-sat-Santali.zip 2019-Nov-05:03:34 128941
Q34-mar-Marathi.zip 2019-Nov-05:03:34 2274397
Q35-nld-Dutch.zip 2019-Nov-05:03:34 36279372
Q385-ita-Italian.zip 2019-Nov-05:03:34 3440247
Q388-que-Quechua.zip 2019-Nov-05:03:35 397476
Q39-tel-Telugu.zip 2019-Nov-05:03:35 85571
Q397-heb-Hebrew.zip 2019-Nov-05:03:35 1657223
Q405-bas-Basaa_language.zip 2019-Nov-05:03:35 1515700
Q437-mal-Malayalam.zip 2019-Nov-05:03:35 138601
Q446-pan-Punjabi.zip 2019-Nov-05:03:35 11004
Q4465-mis-Teochew_dialect.zip 2019-Nov-05:03:35 69734
Q45-nor-Norwegian.zip 2019-Nov-05:03:35 431566
Q46-ltz-Luxembourgish.zip 2019-Nov-05:03:35 1679618
Q51299-hav-Havu.zip 2019-Nov-05:03:37 56823
Q51302-tay-Atayal.zip 2019-Nov-05:03:37 65533
Q52067-bbj-Ghomala'_language.zip 2019-Nov-05:03:37 1765823
Q52068-bum-Bulu_language.zip 2019-Nov-05:03:37 1382789
Q52071-dua-Duala.zip 2019-Nov-05:03:37 1206427
Q52073-bdu-Oroko.zip 2019-Nov-05:03:37 1723960
Q52074-bzm-Londo.zip 2019-Nov-05:03:37 1750380
Q52295-atj-Atikamekw.zip 2019-Nov-05:03:37 7315215
Q74905-mis-Sursilvan.zip 2019-Nov-05:03:37 14618
Q83641-gcf-Guadeloupean_Creole_French.zip 2019-Nov-05:03:38 7412512
Q930-mis-Gascon_dialect.zip 2019-Nov-05:03:39 179656450
Q931-mis-Languedocien_dialect.zip 2019-Nov-05:03:40 191575650
Q123-hin-Hindi.zip 2020-Apr-25:03:30 1704401
Q126-por-Portuguese.zip 2020-Apr-25:03:31 43732966
Q129-rus-Russian.zip 2020-Apr-25:03:32 60844464
Q150-afr-Afrikaans.zip 2020-Apr-25:04:18 42363003
Q159-dyu-Dioula_language.zip 2020-Apr-25:04:18 784432
Q19858-bci-Baoulé.zip 2020-Apr-25:04:18 1268304
Q203-cat-Catalan.zip 2020-Apr-25:04:18 9738365
Q204940-ken-Nyang_language.zip 2020-Apr-25:04:18 483396
Q208-vie-Vietnamese.zip 2020-Apr-25:04:18 8822067
Q219-ara-Arabic.zip 2020-Apr-25:04:19 85373129
Q21-fra-French.zip 2020-Apr-25:05:10 2112950650
Q221062-mis-Cantonese.zip 2020-Apr-25:05:10 3895600
Q22-eng-English.zip 2020-Apr-25:05:12 131688602
Q25-epo-Esperanto.zip 2020-Apr-25:05:19 445662713
Q264201-ary-Moroccan_Arabic.zip 2020-Apr-25:05:19 1371064
Q273-kab-Kabyle.zip 2020-Apr-25:05:19 370876
Q298-pol-Polish.zip 2020-Apr-25:05:21 145009958
Q299-eus-Basque.zip 2020-Apr-25:05:21 46035866
Q33-fin-Finnish.zip 2020-Apr-25:05:46 19473062
Q386-spa-Spanish.zip 2020-Apr-25:05:46 28434220
Q389-jpn-Japanese.zip 2020-Apr-25:05:46 145688
Q392-ces-Czech.zip 2020-Apr-25:05:46 96844
Q44-swe-Swedish.zip 2020-Apr-25:05:46 166237
Q4901-shy-Shawiya_language.zip 2020-Apr-25:05:47 15804835
Q6714-arq-Algerian_Arabic.zip 2020-Apr-25:05:47 3420182
Q80-kan-Kannada.zip 2020-Apr-25:05:47 3662223
Q24-deu-German.zip 2021-Feb-11:15:32 258363332
Q307-ben-Bengali.zip 2021-Feb-12:07:28 1079637723