LinguaLibre
Technical board
Revision as of 21:41, 13 February 2021 by Poslovitch (talk | contribs) (→Datasets has become super slow ?)
- Local developments are easy. You can customize your css and your js, including creating a local WikiJS script, even with limited edit rights.
- LinguaLibre Bot (Python, github) is a high-impact project. Help is needed to authorize it on more wikis.
- Join us on Phabricator and GitHub.
- Developers: we especially look for Bot Masters (Python, NodeJS), SPARQL experts, VueJS developpers, issues coordinators, but everyone is welcome.
- Projects coordinators: we also look for organizers of recording/hacking meet-ups, who are able to build a network with language learning, language conservation and NLP actors.
- Please announce your hacking project here to raise awareness and gather feedbacks.
- Most of our actions remain small in scope and volunteer-based. In case your project is large enough, you could learn about some of the funding options.
- March 3rd, 2021: Wikidata Lexemes & Lingua Libre coordination assessment
- February 19th, 2021: First progress report with WikiValley and VIGNERON
- January 25th, 2023: the latest Github revision has been pushed on the production server. Kurdish Wiktionary is now supported. Oriya Wiktionary will be very soon. Support of more Wiktionary versions should follow.
Please visit LinguaLibre:About to learn more about the project.
Migration of technical contents
Hello all, Please help migrate technical contents from the main LinguaLibre:Chat room to here. Yug (talk) 18:49, 12 February 2021 (UTC)
2021 Github refreshing : call for volunteers and discussion
- See also Github.com/lingua-libre
Hello all,
Since November 2020 there is an ongoing effort to clean up, document, fix the 11 github repositories upon which LinguaLibre.org stands. A summary is available on the main forum and will be migrated here shortly. This section will focus on gathering users with development skills and discuss about possible fields of action (repositories). We especially look for Bot Masters (Python, NodeJS), Sparql expert, VueJS developpers, issues coordinators. Yug (talk) 15:57, 12 February 2021 (UTC)
Early 2021 codings : Wikivalley & volunteers communication board !
WikiValley have been selected to make a notable technical push on the LinguaLibre Suite where volunteer developers are not enough. They will coordinate with volunteers developers in order to smooth everyone's work, avoid duplicate efforts and git conflicts. The Start, End, and Repositories columns below are especially important, please keep them up to date, respect them, or change them whenever required. If you need to work on a repository under work, contact the developer listed there and organize as needed. Our objective here is to keep clarity and to progress smoothly. Please avoid emails and prefer communicating here within subsections so we can all be somehow aware of how are things going. Yug (talk) 15:56, 12 February 2021 (UTC)
- Note: Volunteers started working around in December. WikiValley around Feb. 11th. Yug (talk) 15:56, 12 February 2021 (UTC)
Past developments | |||||
---|---|---|---|---|---|
Start | End | Contacts/dev | Team | Repository | Advancement & result so far. |
2021/02/01 | 2021/02/10 | Yug | Volunteers | SignIt | Get back control (access right) ; fix video query ; test locally ; publish new version on Mozilla store → Fixed Firefox extension |
2021/02/01 | 2021/02/16? | Yug Michael |
Volunteers WM-France |
/operations /CommonDownloadTool |
Explore possible breakpoints ; identify likely cause ; fix ; deploy ; run → Fixed https://lingualibre.org/datasets/ |
2021/02/? | 2021/02/11 | Vigneron WikiValley |
Wikivalley | QueryViz Other? |
Explore possible breakpoints ; identify cause ; fix ; deploy ; inquire on numbers differences → Fixed LinguaLibre:Stats |
Current developments | |||||
2021/02/01 | 2022/01/01 | Poslovitch | Volunteers | Lingua-Libre-Bot | Maintain, update and operate the bot. 2021 Q1 [WIP]: Refactor the bot to ease implementations of additional Wiktionaries. |
2021/02/? | 2021/02/? | WikiLucas00 Yug |
Volunteers | CustomSubtitle BlueLL |
Explore Subtitle's ribbon's bug ; identify cause. |
Planned developments | |||||
2021/02/01 | 2021/03/?? | Poslovitch | Volunteers | /operations /CommonDownloadTool |
Project: Explore datasets scripts and queries. May require SPARQL assistance. |
User box ?
Babel user information | ||||||
---|---|---|---|---|---|---|
| ||||||
Users by language |
It may be cool to create an userbox "dev" {{Userbox-dev}}, on the model of {{Userbox-records}}, with Python, Javascript, PHP, VueJS, Wikimedia Bot as specific sub-categorization ? Yug (talk) 15:59, 12 February 2021 (UTC)
Datasets has become super slow ?
I try to interpret and understand how /datasets are generate.
- On April 2020, French dataset of about 100,000 audios is processed in 51 minutes.
- On February 2021, Bengali dataset of about 50,000 audios is processed in 18 hours.
What do I miss ? Yug (talk) 00:09, 13 February 2021 (UTC)
Zip file | Date | Bits |
---|---|---|
lingualibre_full.zip | 2019-May-17:01:18 | 1989664440 |
Q101-srr-Serer.zip | 2019-Nov-05:03:09 | 14967 |
Q113-cmn-Mandarin_Chinese.zip | 2019-Nov-05:03:09 | 112613 |
Q115107-bcl-Central_Bikol.zip | 2019-Nov-05:03:09 | 166323 |
Q127-tam-Tamil.zip | 2019-Nov-05:03:09 | 154352 |
Q130-zho-Chinese.zip | 2019-Nov-05:03:10 | 2724328 |
Q131-hye-Armenian.zip | 2019-Nov-05:03:10 | 824117 |
Q141-cym-Welsh.zip | 2019-Nov-05:03:10 | 12905993 |
Q154-amh-Amharic.zip | 2019-Nov-05:03:11 | 2653977 |
Q165-hat-Haitian_Creole.zip | 2019-Nov-05:03:11 | 233588 |
Q169-tgl-Tagalog.zip | 2019-Nov-05:03:11 | 77198 |
Q170137-mos-Mossi.zip | 2019-Nov-05:03:11 | 1158142 |
Q205-gre-Greek.zip | 2019-Nov-05:03:11 | 239390 |
Q231-myv-Erzya.zip | 2019-Nov-05:03:21 | 205878 |
Q242-fon-Fon.zip | 2019-Nov-05:03:21 | 1538614 |
Q258-nso-Northern_Sotho.zip | 2019-Nov-05:03:24 | 774299 |
Q311-oci-Occitan.zip | 2019-Nov-05:03:33 | 511332485 |
Q318-bam-Bambara.zip | 2019-Nov-05:03:33 | 277786 |
Q321-gaa-Ga.zip | 2019-Nov-05:03:33 | 3247380 |
Q336-ori-Odia.zip | 2019-Nov-05:03:34 | 38697693 |
Q339-sat-Santali.zip | 2019-Nov-05:03:34 | 128941 |
Q34-mar-Marathi.zip | 2019-Nov-05:03:34 | 2274397 |
Q35-nld-Dutch.zip | 2019-Nov-05:03:34 | 36279372 |
Q385-ita-Italian.zip | 2019-Nov-05:03:34 | 3440247 |
Q388-que-Quechua.zip | 2019-Nov-05:03:35 | 397476 |
Q39-tel-Telugu.zip | 2019-Nov-05:03:35 | 85571 |
Q397-heb-Hebrew.zip | 2019-Nov-05:03:35 | 1657223 |
Q405-bas-Basaa_language.zip | 2019-Nov-05:03:35 | 1515700 |
Q437-mal-Malayalam.zip | 2019-Nov-05:03:35 | 138601 |
Q446-pan-Punjabi.zip | 2019-Nov-05:03:35 | 11004 |
Q4465-mis-Teochew_dialect.zip | 2019-Nov-05:03:35 | 69734 |
Q45-nor-Norwegian.zip | 2019-Nov-05:03:35 | 431566 |
Q46-ltz-Luxembourgish.zip | 2019-Nov-05:03:35 | 1679618 |
Q51299-hav-Havu.zip | 2019-Nov-05:03:37 | 56823 |
Q51302-tay-Atayal.zip | 2019-Nov-05:03:37 | 65533 |
Q52067-bbj-Ghomala'_language.zip | 2019-Nov-05:03:37 | 1765823 |
Q52068-bum-Bulu_language.zip | 2019-Nov-05:03:37 | 1382789 |
Q52071-dua-Duala.zip | 2019-Nov-05:03:37 | 1206427 |
Q52073-bdu-Oroko.zip | 2019-Nov-05:03:37 | 1723960 |
Q52074-bzm-Londo.zip | 2019-Nov-05:03:37 | 1750380 |
Q52295-atj-Atikamekw.zip | 2019-Nov-05:03:37 | 7315215 |
Q74905-mis-Sursilvan.zip | 2019-Nov-05:03:37 | 14618 |
Q83641-gcf-Guadeloupean_Creole_French.zip | 2019-Nov-05:03:38 | 7412512 |
Q930-mis-Gascon_dialect.zip | 2019-Nov-05:03:39 | 179656450 |
Q931-mis-Languedocien_dialect.zip | 2019-Nov-05:03:40 | 191575650 |
Q123-hin-Hindi.zip | 2020-Apr-25:03:30 | 1704401 |
Q126-por-Portuguese.zip | 2020-Apr-25:03:31 | 43732966 |
Q129-rus-Russian.zip | 2020-Apr-25:03:32 | 60844464 |
Q150-afr-Afrikaans.zip | 2020-Apr-25:04:18 | 42363003 |
Q159-dyu-Dioula_language.zip | 2020-Apr-25:04:18 | 784432 |
Q19858-bci-Baoulé.zip | 2020-Apr-25:04:18 | 1268304 |
Q203-cat-Catalan.zip | 2020-Apr-25:04:18 | 9738365 |
Q204940-ken-Nyang_language.zip | 2020-Apr-25:04:18 | 483396 |
Q208-vie-Vietnamese.zip | 2020-Apr-25:04:18 | 8822067 |
Q219-ara-Arabic.zip | 2020-Apr-25:04:19 | 85373129 |
Q21-fra-French.zip | 2020-Apr-25:05:10 | 2112950650 |
Q221062-mis-Cantonese.zip | 2020-Apr-25:05:10 | 3895600 |
Q22-eng-English.zip | 2020-Apr-25:05:12 | 131688602 |
Q25-epo-Esperanto.zip | 2020-Apr-25:05:19 | 445662713 |
Q264201-ary-Moroccan_Arabic.zip | 2020-Apr-25:05:19 | 1371064 |
Q273-kab-Kabyle.zip | 2020-Apr-25:05:19 | 370876 |
Q298-pol-Polish.zip | 2020-Apr-25:05:21 | 145009958 |
Q299-eus-Basque.zip | 2020-Apr-25:05:21 | 46035866 |
Q33-fin-Finnish.zip | 2020-Apr-25:05:46 | 19473062 |
Q386-spa-Spanish.zip | 2020-Apr-25:05:46 | 28434220 |
Q389-jpn-Japanese.zip | 2020-Apr-25:05:46 | 145688 |
Q392-ces-Czech.zip | 2020-Apr-25:05:46 | 96844 |
Q44-swe-Swedish.zip | 2020-Apr-25:05:46 | 166237 |
Q4901-shy-Shawiya_language.zip | 2020-Apr-25:05:47 | 15804835 |
Q6714-arq-Algerian_Arabic.zip | 2020-Apr-25:05:47 | 3420182 |
Q80-kan-Kannada.zip | 2020-Apr-25:05:47 | 3662223 |
Q24-deu-German.zip | 2021-Feb-11:15:32 | 258363332 |
Q307-ben-Bengali.zip | 2021-Feb-12:07:28 | 1079637723 |
- IMO, this can only be investigated through the logs. Maybe the requests to Commons are taking a longer time than they used to? Maybe the datasets server is under higher load (thus slowing it)? We need you, Michaël! --Poslovitch (talk) 21:41, 13 February 2021 (UTC)