LinguaLibre

Chat room


Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.

Chat rooms in various languages:
English · 🌐

Chatroom FAQ

  • How to add missing languages ?
    • Administrators can add new languages, they do so within few days. For users, please provide your language's iso-639-3 code + link to the en.wikipedia.org's article. Optional infos are the common English name and wikidata IQ. For more, see Help:Add a new language.
  • What IRL event.s are coming ? When ? Where ?
  • How to archive sections which have been answered ?
    • After reviewing the section, add '{{done}} -- can be closed ~~~~' to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Archives

Datasets out of date

Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)

Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)

About the exclusion of already recorded words

Hi, I think the option to exclude words that I have already recorded is broken. This morning, I start a recording session and LL proposes me words that I registered two days ago. For example, I already registered Belorusino two days ago, but it does not disappear when I click exclude words already recorded. And notice the two versions of the file, which I already re-recorded it. Can someone fix this? Lepticed7 (talk) 10:07, 15 November 2020 (UTC)

I have opened a Phabricator ticket. It may be fixed in the coming months but not sure. Pamputt (talk) 20:05, 15 November 2020 (UTC)

Reminder : Grants

Hello all, I'am monitoring grants these days and there is a summary table available here LinguaLibre:Grants

I think both rapid grants mechanisms could be of help to us now, to reach out to local community via small scale events, training, hardware, food, transportation costs, flyers' designs, etc. By example, This WM-France micro-fi's request organizes 4 evenings of contribution, getting 100€ for each evening. The same user has been welcome to do several Grant requests.
Heavier, the R&D Grant could surely be used for something. I have an idea on this, but we can trust Indian contributors to come up with relevant technical ideas and teams as well. @Titodutta Yug (talk) 01:20, 8 February 2021 (UTC)

LinguaLibre Bot and Wikidata

This section should be moved to LinguaLibre:Technical board.

I have not checked the bot's contrib on Wikidata for quite some time. Yesterday I uploaded ~100 Bangal film names from Bangla Wikipedia. It looks like the bot is not active, unless I am missing something. --টিটো দত্ত (Titodutta) (কথা) 18:10, 13 February 2021 (UTC)

Update and technical improvements

Hi all,

Full information and full disclosure, I'm working now with WikiValley and Wikimédia France in a paid capacity to help improve Lingua Libre technical structure (see this - in French - for the scope of our intervention).

One of our first action last Thursday was to restart the Blazegraph updater. A lot of tools are depending on this "fundamental brick" (including but not limited to): the SPARQL endpoint (and pages using it) and bots. Now, you can see that pages like Special:MyLanguage/LinguaLibre:Stats are up-to-date again and the bots should also restart soon (you can see more technical info on this on LinguaLibre:Technical board)).

The next big step will be to update this Mediawiki from 1.31 to 1.35 and moving it to a new server.

If you see something or anything wrong or strange, don't hesitate to let me know. I'm also available for any question.

Cheers, VIGNERON (talk) 08:56, 15 February 2021 (UTC)

Nice ! Happy to see you folks jumping in. Thank you for the Stats ! We can witness our passage over 400,000 audios shortly. Yug (talk) 16:27, 15 February 2021 (UTC)

400,000

The total amount of recordings on Lingua Libre reached 400,000 a few hours ago. February is already the second most fruitful month since the beginning of the project, even though we are only halfway through. LiLi is growing faster and faster, and this is only the beginning!
Congratulations and thanks to everyone who gives some time to record voices and to spread the project around the world.
All the best — WikiLucas (🖋️) 18:10, 16 February 2021 (UTC)

And another milestone broken ! Big thanks to the Titodutta and Marathi effects, too ! Yug (talk) 21:24, 16 February 2021 (UTC)
Yug, WikiLucas and Titodutta- thanks for the support! Marathi community had decided to gift minimum 5000 records on the occasion of Marathi Language Day to be celebrated on 27 February. We have crossed 6000 records as of now. All credit goes to community members. सुबोध कुलकर्णी (talk) 05:22, 26 February 2021 (UTC)
See also Commons:Category:Lingua_Libre_pronunciation-mar
Congratulation to the Marathi community ! It's nice to see you contributes this way :) Yug (talk)

Chat room in your language

Hi all. I've created Template:Lang-CR in order to list all the chat rooms. I think it would be interesting for people to discuss in their native language. The main discussion should remain on this chat room in English in order to be understood by most of the contributors. So feel free to create a village pump/chat room in your mother tongue. Pamputt (talk) 20:21, 16 February 2021 (UTC)

It is welcome move. We need to discuss many local issues, policies, approaches, ideas etc. in own language. I have created Mar page संवाद-चर्चा दालन. Let me know whether the process is right. I will start engaging speakers here. सुबोध कुलकर्णी (talk) 05:36, 26 February 2021 (UTC)
@सुबोध कुलकर्णी that's perfect. Pamputt (talk) 06:40, 26 February 2021 (UTC)

New batch of lists available ! (1,000 languages)

Please, remember to tag the list_talk's page with {{UNILEX license}}.

Greetings!
Thanks to Tshrinivasan with who we discussed recent Indic (Marathi!) activity and lack of lists, I bumped again into UNILEX (GNU-like license), which is a Google-led Unicode Consortium project listing vocabulary for 999 languages. Data seems clean as far as I can tell. The two main maintainers are Google folks. So I suspect UNILEX uses Google's best scrappers and NLP cleaners. Within this data are tab-separated frequency lists as {item} {number_of_occurences}. I forked their github, and made a script to convert their format into Lili's List:* format such as # {item}. See:

You can check if there is your own language among the 999 available. For Marathi, replace ig by mr. I therefor created 2 local lists to test this approach :

Right now, 1000 lists are already formated in Lili's syntax within the /data/frequency-sorted-hash directory. If any community lacks wordlists on Lili's there you have them : copy, paste, done, situation unlocked ! Yug (talk) 16:40, 24 February 2021 (UTC)

@Titodutta hi! This may interest your community. There are dozen(s) Indic languages :) It could also help you. You already recorded most of those words for your language (ben), together with the "ignore already recorded words" functions, these lists can fill some gaps :) Yug (talk) 16:48, 24 February 2021 (UTC)
  • I love this. I'll inform the Marathi folks. --টিটো দত্ত (Titodutta) (কথা) 17:16, 24 February 2021 (UTC)
  • This is just amazing. You don't know how much delighted I am feeling at this moment. I checked the Bengali list, a very few random words have typos, but that should not be more than 1% I guess. Over-all this will an extremely helpful resource for the communities. --টিটো দত্ত (Titodutta) (কথা) 17:24, 24 February 2021 (UTC)
  • I share your enthusiasm ! It's bot created I'am pretty sure, the clean up is likely just statistical. Now that those lists are technically available, ideal next step would be human review by local communities. Maybe groups of 2~3 users for copyedit sprints ? :D But this is optional IMHO. Also, the corpora coming from online documents, IRL objects like `chair`, `car`, `walk`, may be further down on these lists. But they must be there in the first 20,000 items. The best is the linguistic diversity of this set. Amazing. Yug (talk) 18:10, 24 February 2021 (UTC)
  • It's a good resource indeed. Thanks! The Marathi words in the list are grammatically correct also, with nearly no typos. We have started discussion about this in our community. Currently, we have started working on Lexemes first, the recordings of the lists thus created will be done simultaneously. The community thinks this approach is more useful in long run. The separate group of speakers may adopt these lists. But then we have to devise way to avoid repetitions. We will definitely discuss more on this resource utilisation and let you know.सुबोध कुलकर्णी (talk) 05:14, 26 February 2021 (UTC)

Tshrinivasan, Yug - Marathi community plans to work on these lists. But [1] giving 404 error. Please help. सुबोध कुलकर्णी (talk) 05:54, 5 March 2021 (UTC)

Tshrinivasan, सुबोध कुलकर्णी : It's in active developements these days so I made few changes.
  • Currently at: /hugolpz/unilex-extended/frequency-sorted-hash which uses UNILEX as a git submodule to respect each project's scope.
  • I just ran the script for Marathi, so the lists are now local. When picking a list, type List:Mar/M:
See also section below. My apologize for the changes. Hope it didn't affected you too much. Yug (talk) 07:47, 5 March 2021 (UTC)

Pause before running

Long tail curves likely applies to languages ranked by number of speakers. Since macro-languages such Mandarin, English, Spanish, Hindi, etc are certain to be soon audio documented by the sheer force of demography, our effort-strategy should progressively shift toward the right, and increasingly rare languages. The rarer the languages and speakers, the more listening we should become and the more custom assistances we will have to provide.

Dragons Bot has been created, coded, tested, and is ready to import UNILEX's lists to LinguaLibre's List:{iso}/{title} namespaces. Given 1,000 pages and associated talk page will be create, I would like to pause few days to consider about this large list import / creation and why.

  • Lili > Languages > existing breath: We reached 110 languages on LinguaLibre so far.
  • Lili > Lists > non-sorted by usefulness : Sparql queries provides lists for all languages, but without prioritization on words' usefulness.
  • Lili > Lists > sorted by usefulness :
    • Hand picked frequency lists are present for about 7 languages : eng, mar, por, pol, tam, ron, kur. With optimal relevance for teaching/learning.
    • Olafbot's List:*/Lemmas-without-audio-sorted-by-number-of-wiktionaries for 72 languages, updated daily, with optimal relevance for wiktionaries.
    • UNILEX can provide frequency lists for 1,000 languages. About 10 times our current language coverage. UNILEX plugs itself upon Github.com/Google/Corpuscrawler, and open source project which plan to support more languages. I dived into these chain and it's an 'easy' NLP pipeline to contribute too. The wikimedia comunity can use it and expand it.

Core issue: the core issue from online arrival of users is to increase retention of minority and semi-rare languages by smoothing their speakers work. By example an user of Wayuu language arrived today. We local (frequency) list was available today. But UNILEX + Dragons Bot can provide a local Wayuu frequency list of 8000 items, ready to record.
Since we don't know which semi-rare languages will come next, having 1,000 languages ready is a safe yet not so excessive bet. Assuming a en:Zipf's law/en:Long tail curve for languages and their speakers we can still predict that at least one out of 10~20 new language's speaker will miss a local wordlist. But together with OlafBot's lists, we move from 6% toward 90% of our languages habing a solid, usefulness-based roadmap to walk forward. Yug (talk) 14:21, 3 March 2021 (UTC)

Well, I believe the idea to import Unilex lists is very good. One of the things a new user needs most is an idea of what to record. The Unilex lists suit this function, especially in the case of new languages, where there is no other list available, and no words have been already recorded. The only question I see is how to import the Unilex lists. Perhaps the best idea is to import 1000 most frequent words from each list. It would be even better if the recorded words were automatically removed from the lists and replaced by new ones (like in the case of Olafbot-managed lists), but even a static list is good as bait if the goal is just to attract more speakers of rare languages.
One remark: you should translate the file names from Unilex to match LiLi's language codes (or perhaps you did it, I don't know, I didn't examine the code). It's not always the same, for example, Polish is "pl" in Unilex, and "Pol" in Lili. If you leave the old codes, the list won't be automatically found when a new user presses the "Local List" button. Anyway, the newbies are likely not to notice the lists at all regardless of all our efforts. Olaf (talk) 00:55, 4 March 2021 (UTC)

jQuery.Deferred exception: this.pastRecords is undefined

This discussion may be moved to LinguaLibre:Technical board.

Hello, there.

When I try to load a list of words to record from the FR wiktionary, the modal does not disappear when I click "Done" and seems blocked trying to load the words. During this time, the JS console complains that "jQuery.Deferred exception: this.pastRecords is undefined", and the last resource loaded is, in cURL format: curl 'https://fr.wiktionary.org/w/api.php?action=query&format=json&origin=*&formatversion=2&prop=pageterms&wbptterms=label&generator=categorymembers&gcmnamespace=0&gcmtitle=%3ACat%C3%A9gorie%3ALocutions%20verbales%20en%20fran%C3%A7ais&gcmtype=page&gcmlimit=max' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Origin: https://lingualibre.org' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Referer: https://lingualibre.org/' -H 'TE: Trailers'

Looks like there is a bug…

Regards. LoquaxFR (talk) 17:21, 24 February 2021 (UTC)

Salut LoquaxFR, peux-tu décrire précisément ce que tu fais lorsque tu écris "when I try to load a list of words to record from the FR wiktionary" ? Comment charges-tu la liste de mots, le fais tu en utilisant en utalisant l'option « Catégorie Wikimedia » sur la droite ou bien en créant toi-même la liste de mots un par un ? Si tu utilises « Catégorie Wikimedia », peux-tu nous donner la catégorie que tu veux utiliser ? Est ce que tu arrives à reproduire le problème quelle que soit la catégorie avec laquelle tu veux travailler ? Merci d'avance pour ces renseignements qui je l'espère pourront permettre de cerner le problème le plus précisément possible. Pamputt (talk) 17:58, 24 February 2021 (UTC)
En français, ce sera plus simple, en effet. Le problème se reproduit systématiquement lorsque j’essaye d’utiliser une catégorie Wikimédia (celle du wiktionnaire français en l’occurrence); je n’utilise que cette possibilité pour charger des mots, et le problème apparaît pour toutes les catégories que j’essaye d’utiliser, que j’aie déjà enregistré presque tous les mots ou celles pour lesquelles je n’ai fait qu’une petite partie des milliers de termes. Le problème se produit en navigation privée également, donc ça ne semble pas être le cache ou les cookies. Si besoin de plus d’infos, n’hésite pas. LoquaxFR (talk) 18:08, 24 February 2021 (UTC)
Merci pour les infos supplémentaireS. Je viens de tester avec Firefox 78.7 et je ne rencontre pas ce problème. Peux-tu essayer avec un autre navigateur (Chromium ou autre) pour voir si le problème est inhérent à ton firefox (y compris en navigation privée). Ca peut par exemple venir d'un gadget que tu aurais installé. Pamputt (talk) 18:40, 24 February 2021 (UTC)
Addons Firefox qui casse le JS ? Yug (talk) 18:57, 24 February 2021 (UTC)
Chrome et Safari me donnent le même résultat ; j’ai également essayé depuis une autre bécane et un autre OS, sans mieux : l’erreur JS se montre toujours et rien ne se passe au moment de la validation de la modale. Est-ce que j’aurai enregistré trop de mots, faisant bugger le JS lorsqu’il essaye de retirer ceux déjà enregistrés ? Vu qu’on n’est que quelques-uns à en avoir enregistré autant, ça se pourrait. J’avais déjà remarqué que le chargement de listes depuis le Wiktionnaire mettait de plus en plus de temps pour moi (relativement, hein : quelques secondes d’attente au plus). Est-ce un autre problème lié à mon compte ? LoquaxFR (talk) 06:30, 25 February 2021 (UTC)
Merci pour les compléments d'info. J'ai ouvert T275734. Faudrait voir avec Lepticed7 et WikiLucas00, qui ont sensiblement le même nombre d'enregistrements que toi, pour tester si ils rencontrent aussi le même problème. Pamputt (talk) 06:54, 25 February 2021 (UTC)
Salut, perso, je sais pas si c’est lié, mais il y a certains enregistrements que le Record Wizard ne retire pas quand je veux retirer les mots déjà enregistrés. En atteste ce fichier, que j’ai enregistré trois fois. Lepticed7 (talk) 10:45, 28 February 2021 (UTC)

50,000

February 2021. This month. We have seen 50,000 pronunciation in a month (see LinguaLibre:Statistics). This is for the first time we saw 50,000 entries in a month. This is great. --টিটো দত্ত (Titodutta) (কথা) 08:51, 28 February 2021 (UTC)

That's really amazing. The same month we passed 400k recordings! AND the shortest month in the year! I'm going to prepare a small News to be published every month (inspired by what you did in September if I remember correctly), I think February is a very good month to start with! I'll publish it on your talk page if you'd like 🙂 All the best ! — WikiLucas (🖋️) 16:11, 28 February 2021 (UTC)
I would say, why not but I cannot lead for such project so if you are motivated to write and lead such newsletter, go ahead. Pamputt (talk) 18:39, 28 February 2021 (UTC)
On the LinguaLibre:Technical board/intro Poslovitch has started a /News section which keeps log of important milestones. It's an interesting idea because it's minimalist, therefor low maintenance.
I'am also interested by a Newsletter for both external and internal purpose. I would help around yes. Editorial line would gain to be clarified: who are the expected readers, writing stuly, overall length, major sections, sections lenghts, etc. But this can "appears" with the first few issues :) Please keep a balance so the writing workload stays modest. Yug (talk) 18:57, 28 February 2021 (UTC)
The /News of the technical board is mostly about technical news. I fully agree to the idea of a Newsletter, yet quarterly. We could grab some ideas from the French Wiktionary's Actualités. --Poslovitch (talk) 20:33, 28 February 2021 (UTC)
  • Salut, let's start with the newsletter of March. I'll add the stories I know such as 400,000 audios, 50,000 this month, the Wikimedia Wikimeet India, upcoming France-India call, French Wiktionary missed recording work etc. I'll start the draft tomorrow and ping you here.
    In future we will need mw:Extension:MassMessage to send newsletter to subscribers' talk page. A system admin is needed with access to the server and localsettings.php etc pages. I understand this will take time, so it can wait. Kind regards. --টিটো দত্ত (Titodutta) (কথা) 21:24, 28 February 2021 (UTC)
@Titodutta hi, We are having on the mailing list another discussion about networking, cooperations and outward communications. I think the LinguaLibre:Newsletter page can be modeled upon Technical board and LinguaLibre:Bot, a kind of hub for a subgroup of active users dedicated to a common goal. In this case Communication. The bimonthly Newsletter could be a core, founding element. But other discussion about outreach could take place there. We have so much to push in this direction : academic outreach, rare languages and under-represented countries, partner institutions, calling for new wikimedians, reminding far-away Wikimedian chapter of Lingualibre, etc. Having a hub dedicated to writing elegant co-edited texts, defining targets and leading the call for communication campaign would be a strong plus. I'am still focused on codes but I could help in few weeks. You seems to love it as well. Do we have other users interested to join such efforts ? Would be good to have few more folks. Yug (talk) 20:39, 2 March 2021 (UTC)

Newsletter : March 2021 review ?

You can co-edit this text. PS Titodutta: a rough summary of past months and emerging directions based on a message to an ex-contributor.

In January and February, the « Lili » community has taken back control of the technical stack (access to servers, GitHub codes, bots, etc.) and made a call for more diverse speakers. The Indian community started to show up, with key Indic languages being Bengali (50,000) and Marathi (~10,000). Romanian, Polish, Ukrainian are also on the rise around 20,000 audios each. We continue to have some dozen smaller languages showing up but no powerful push yet.

Right now, an external software company is upgrading our MediaWiki and its modules thanks to Wikimedia France's funding. The volunteer dev team is also strong and internal organization is increasing. We now have LinguaLibre:Technical board as a tech hub, LinguaLibre:Bot as a bot hub, LinguaLibre:Events as an IRL/Online event hub. When the main software upgrade settles down in a month we plan a [yet to create] LinguaLibre:Newsletter/room as an inward and outward communication hub.

In that last dimension, we could reach out to « relay users » on other wikis, who can share our news about LinguaLibre with communities of wiktionaries, wiksources, wikipedias, wikidata. We equally consider formally reaching out to non-Wikimedia groups such as Common Voice, Unicode, governmental and NGO agencies, research centers. Possibly in the form of group work and/or an online editathon when we gather to spread the news. This hub, summarizing the community's discussions, will therefore also clarify goals and strategies. We are looking for help with this matter.

This current forward dynamic is thanks to the early Autumn 2020's efforts. We weren't able to immediately convert those into actions but it still injected energy and vision into LinguaLibre which helped snowball the current dynamic. Also, many thanks to all those who got involved in this journey! Yug (talk) 07:20, 3 March 2021 (UTC)

Also, I just found out Commons grows at a speed of about 1 millions files per month. So with 50,000 audios last month, Lili makes up to 5% of Commons' new files. Yug (talk) 14:57, 3 March 2021 (UTC)

Marathi women speakers celebrate 'Women's Day' & 'Women History Month' on Lingua Libre

Greetings of coming World Women's day!
Glad to share this news. Marathi language community in Maharashtra State of India has taken initiative to record their language from the last 2 months. Out of total 26 speakers, @24 are women from 4 different places in the state. The group has decided to reach 10,000 recording mark to celebrate 'Women's Day' and 15,000 mark in March. As of now 8600+ recordings are uploaded. A small group of women have also started working on Lexicographical data, the recordings of which would be done simultaneously. The activity is being coordinated by institutional partner Jnana Prabodhini, Pune and facilitated by CIS-A2K, affiliate of WMF in India. The community needs support from all of you. Thanks, सुबोध कुलकर्णी (talk) 06:28, 5 March 2021 (UTC)

Greeting सुबोध कुलकर्णी, nice to witness this enthusiasm.
I imported UNILEX lists for Marathi. When in RecordWizard's Step 3 as you pick a list, go for Local list, then mar/M and you will see lists of the most used words. I proposed a gentle ramp approach : first list has just 200 words, see List:Mar/Most_used_words,_UNILEX_1:_words_00001_to_00200. Given my experience it will allows better on-the-ground session with new users. 200 is gently ambitious, allows to pass the uncanny valley of the first 20 words, and move to the joyful Lingualibre flow of rapid recording. Perfect for demo and on-boarding. :)
Following lists are for motivated users who chose to return. To consolidate skills, list 2 has 800 words while list 3 has 1000. At this state a nice 2,000 audio have been recorded by the speaker, while this words likely make up for 90% of daily conversations.
It then moves into committed users. List 4 has 3000, the following ones 5,000 words each. These lists are not expected to be done in one strike but over several session of one hour or less, during a dedicated day or along a week or so.
I hope these may help your language community to better on-board interested contributors :)
We also encourage development of women speakers networks, so thanks a lot for your lead. Yug (talk) 08:57, 5 March 2021 (UTC)
Added Marathi lists :
Yug (talk) 09:01, 5 March 2021 (UTC)
Many thanks Yug for detailed explanation. These are useful to start with. Our group has taken lexicographical approach now to develop lists. So we need alphabetical lists to get forms of words. For example we create list like this - शरीर, शरीरभर, शरीराकडून, शरीराकडे, शरीराचं, शरीराचा, शरीराची, शरीराचे, शरीराच्या, शरीरात...etc. The members distribute work according to letters. Therefore it will be good if we can get modified lists. - सुबोध कुलकर्णी (talk) 11:22, 5 March 2021 (UTC)
I see. सुबोध कुलकर्णी, you could use frequency-sorted-count/mr.txt, keep the 30,000 most frequent, then sort alphabetically and split by hand on each letter. See Help:How_to_create_a_frequency_list?#UNILEX.27s_lists. Yug (talk) 11:53, 5 March 2021 (UTC)
I tried to pushed it forward but it's a bit more complex than I anticipated. Ideally, you would 1) add a prefix so औ.txt becomes /Marathi_words_starting_with_औ.txt, 2) merge the rarest letters together. I must refocus on non-wiki projects, can you call for help from local wiki-developers ?
# Define language
iso=mr
# get file, cut out meta, sort by 2nd column (frequency), keep 50000, keep only word, sort by 1st column, alphabetically, save to .txt file
curl https://raw.githubusercontent.com/unicode-org/unilex/master/data/frequency/${iso}.txt | tail -n +6 | sort -k 2,2 -n -r | head -n 50000 | cut -d$'\t' -f1 | sort -k 1,1 > ${iso}.txt
# get mr.txt content, for all line starting with alpha-num, convert first letter to lowercase, then print in files depending on first symbol
cat mr.txt | awk '{file = (/^[[:alnum:]]/ ? tolower(substr($0,1,1)) : "symbol") ".txt"; print >> file; close(file)}'
# Remove a to z files
find . -regex './[a-z].txt' -delete
# Convert to wiki lists format `# {item}
sed -i -E 's/^/# /g' `find . -type f -name "?.txt"`
# See line counts, sorted numerically descendant
wc -l * | sort -n -r
# See lines count, if n<200 then print filename, add file to merged.txt
wc -l * | awk '$1 < 200 {print $2}' | xargs cat >> merged.txt
This already provides the lists by letters. It should put you solidly on the way. Yug (talk) 12:52, 5 March 2021 (UTC)
Without merge (50 files) With merging (32 files)
  99860 total
  50000 mr.txt
   4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
    182 थ.txt
    163 ओ.txt
    118 छ.txt
    115 ऑ.txt
     64 ऐ.txt
     55 ढ.txt
     44 औ.txt
     29 २.txt
     26 ई.txt
     20 ष.txt
     20 ऊ.txt
     20 १.txt
     14 ऋ.txt
      6 ऱ.txt
      4 ३.txt
      2 ९.txt
      2 ८.txt
      1 ॐ.txt
      1 ४.txt
  4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    886 merged.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
There is also a list List:Mar/Lemmas-without-audio-sorted-by-number-of-wiktionaries which is updated every day by a bot, so it should be always fresh. The list consists of words that are present in one or more Wiktionaries, but have no recording in Commons. At the top of the list, there are words with the largest number of Wiktionaries. You could probably give it a try too, सुबोध कुलकर्णी. Olaf (talk) 16:34, 5 March 2021 (UTC)

Automatically updated lists of unrecorded audio

Not everybody here is probably aware that there are lists of unrecorded words available for 72 languages. The lists are sorted by the number of the language versions of Wiktionary where a corresponding word is described, with the most popular words at the top, so the lists should maximize in a way the usefulness of the recording. Words with audio recordings present in Commons are removed automatically from the lists every night. In this way, the lists should be always fresh. The lists have always a title in the form of <language code>/Lemmas-without-audio-sorted-by-number-of-wiktionaries: afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue. Olaf (talk) 16:51, 5 March 2021 (UTC)

This is game changer. Welcoming new contributors of 72 languages will no more be a tricking question of providing relevant lists. More lists coming. We can refocus on outreach and calling for new contributors to audio document their voices, their languages, their cultures. Yug (talk) 18:15, 5 March 2021 (UTC)

Outreach

Dialects of Catalan.

I used the opportunity of bumping into a currently inactive user to go to his wikipedia (Catalan), ask him where I could announce we now have a cat list, and went to make a gentle announcement. I don't expect it to pay off soon, but by several pings, we should have some folks landing back here on Lingualibre. I didn't contact the ca:wikt community but you see the idea : leaving small many announcements here and there so people know our name. Smaller pings are ok. "Sorry all, i've been busy on LinguaLibre project those days", this would be helpful too. I tried to emphasis what service Lili provides to them (not sure I was good on that, but it's just a ping :) ). Please when you have the opportunity, reach out to local communities. Especially those not currently active. We have nice lists in 72+ languagea now. Let the wiki folks know and record more. Yug (talk) 08:24, 7 March 2021 (UTC)

@Pamputt hi, they started a light conversation-description of Catalan about cat valencia, cat central, cat balearic and cat Western (? not sure it was 3 or 4 different) pronunciations. Do you have any understanding on this Catalan issue ? Is this like Marseille French VS Paris French accents or something else ? Yug (talk) 18:25, 7 March 2021 (UTC)
I do not precisely know how different are these Catalan varieties but they are more different than French from Paris and French from Marseille because theses varieties are considered as different dialects. So it is something like Gascon (Q930) and Occitan auvernhat (Q1186) for the Occitan language. So we could start to import this dialect in Lingua Libre to be able to record in these dialects. At least, we should import the main dialects here, namely Northwestern Catalan, Valencian, Central Catalan, Balearic, Rossellonese and Alguerese. Pamputt (talk) 18:58, 7 March 2021 (UTC)
It seems to be the wish expressed by User:Vriullop too, and on another discussion I got. Yug (talk) 19:22, 7 March 2021 (UTC)
Northwestern Catalan (Q518078), Valencian (Q518079), Central Catalan (Q518087), Balearic (Q518106), Northern Catalan (Q518118), Algherese (Q518128) are now available, so we can record right now words in these dialects. Pamputt (talk) 20:09, 7 March 2021 (UTC)

License ?

Check-green.svg Done
I bumped again into cc-by-sa license for contributions. Aren't we supposed to contribute it all under CC-0 so it's Wikidata compatible ? Yug (talk) 21:39, 8 March 2021 (UTC)

The licence is up to the user's choice. --Poslovitch (talk) 21:54, 8 March 2021 (UTC)
Then what do we do on wikidata ? Ooohhh... It's just a link toward Commons, no a copy of the audio file.... Yug (talk) 22:53, 8 March 2021 (UTC)

Metrics > Accounts creations

Hi everyone !
We got about 5 times more account creations this January 2021 (~60) compare to January 2020 (~12).
Welcoming is largely done by hand these days. Having a bot for that may help.
And, given that we are all overloaded, maybe would be wise to outreach for help. Yug (talk) 23:19, 8 March 2021 (UTC)

Help - to delete word

Hi, please guide me how i can delete recorded word from lili. already uploaded on wikimedia commons by mistake. Recorded Marathi word is 'कालका', which i want to delete. Thanks in advance.

Hi Aparna Gondhalekar, there are two options depending whether "कालका" exists. If "कालका" exists but you record badly, then you just need to record it again and the new recording will replace the previous recording. Or if "कालका" does not exist, we need to delete the file directly on Wikimedia Commons. Pamputt (talk) 21:18, 9 March 2021 (UTC)

Wikimania 2021

It's not a big surprise, but it have been confirmed : Wikimania_2021 will be online only. It will limit our outreach. We used to go there and record 10~20 languages, 5-mins demoing to 30 people, and doing workshop to 40+ others. Also got plenty of small chats (100+) raising awareness about Lili and connecting with devs for fast discussions. Will need to find other way this year too. Yug (talk) 21:34, 9 March 2021 (UTC)

Return with Return

So, we are back. Almost after 50 days, we are back to work. Thanks to User:VIGNERON, User:Yug, User:Pamputt etc who were around. Let's make some noise.

Idea: I have an idea, can you record the word "Return" or "Come back" (or something similar) in your language and put it in the gallery below? Please mention the language name, and meaning in the caption. --টিটো দত্ত (Titodutta) (কথা) 02:09, 23 April 2021 (UTC)

"Return/Come back" as in "LinguaLibre is back", :en:The Lord of the Rings: The Return of the King] (70 languages) or en:Return of the Jedi (63), right ? Titodutta, please provide some examples / context. Yug (talk) 04:58, 23 April 2021 (UTC)

Return Gallery

Translate doesn't seem to work

I can't seem to be able to translate pages, is this an error on my behalf or are there something wrong with the servers? --Sabelöga (talk) 17:01, 23 April 2021 (UTC)

Indeed, something is broken. There is a Phabricator ticket to track this issue. Pamputt (talk) 18:30, 23 April 2021 (UTC)
Okay, thank you. --Sabelöga (talk) 22:01, 23 April 2021 (UTC)
Hello Pamputt, I tried to translate several pages from the Wiki directly, to test, taking inspiration from the T:xx translation markers (example: https://lingualibre.org/wiki/Translations:Help:Main/14/fr). An error occurs, always the same. I added a line in your task, notifying Tgr who may be interested. He may add the tag of the "OAuthAuthentication" project. Cordially. —Eihel (talk) 14:31, 25 April 2021 (UTC)

erreur de traduction

Translations are back. Thanks. Pamputt (talk) 18:54, 27 April 2021 (UTC)
I still can't seem to be able to translate :( @Pamputt & Eihel --Sabelöga (talk) 22:12, 28 April 2021 (UTC)
Sabelöga can you describe precisely (or post a screenshot) when you want to translate the main page? Pamputt (talk) 08:29, 29 April 2021 (UTC)
Pamputt When I click translate it looks like this, and nothing else happens. https://imgur.com/a/fgY1sSl --Sabelöga (talk) 15:42, 29 April 2021 (UTC)
Sabelöga Indeed, it is the same behaviour as before. Could it be a problem of cache? Could you try to clear it (see Wikipedia:Bypass_your_cache to know how to bypass it if needed). Seb35 and VIGNERON, do you have any idea? Pamputt (talk) 17:26, 29 April 2021 (UTC)
Pamputt I've tried to clear cache, to log in on different devices, edit on computer and mobile and translate uninlogged in incognito mode and when I tried to manualy create Translations:Help:Configure_your_microphone/1/sv this error appeared:
Internt fel
[1738fa8dc0b56f3d0f41bed6] /index.php?title=Translations:Help:Configure_your_microphone/1/sv&action=submit Error from line 294 of /opt/mediawiki/1.35/extensions/OAuthAuthentication/auth/OAuthPrimaryAuthenticationProvider.php: Class 'MediaWiki\Extensions\OAuthAuthentication\AuthBlacklist' not found

Backtrace:

#0 /opt/mediawiki/1.35/includes/auth/AuthManager.php(2470): MediaWiki\Extensions\OAuthAuthentication\OAuthPrimaryAuthenticationProvider->providerRevokeAccessForUser()
#1 /opt/mediawiki/1.35/includes/auth/AuthManager.php(864): MediaWiki\Auth\AuthManager->callMethodOnProviders()
#2 /opt/mediawiki/1.35/includes/user/User.php(848): MediaWiki\Auth\AuthManager->revokeAccessForUser()
#3 /opt/mediawiki/1.35/extensions/Translate/src/SystemUsers/FuzzyBot.php(17): User::newSystemUser()
#4 /opt/mediawiki/1.35/extensions/Translate/TranslateHooks.php(1095): MediaWiki\Extensions\Translate\SystemUsers\FuzzyBot::getUser()
#5 /opt/mediawiki/1.35/includes/HookContainer/HookContainer.php(321): TranslateHooks::validateMessage()
#6 /opt/mediawiki/1.35/includes/HookContainer/HookContainer.php(132): MediaWiki\HookContainer\HookContainer->callLegacyHook()
#7 /opt/mediawiki/1.35/includes/HookContainer/HookRunner.php(1529): MediaWiki\HookContainer\HookContainer->run()
#8 /opt/mediawiki/1.35/includes/EditPage.php(1904): MediaWiki\HookContainer\HookRunner->onEditFilterMergedContent()
#9 /opt/mediawiki/1.35/includes/EditPage.php(2232): EditPage->runPostMergeFilters()
#10 /opt/mediawiki/1.35/includes/EditPage.php(1724): EditPage->internalAttemptSave()
#11 /opt/mediawiki/1.35/includes/EditPage.php(680): EditPage->attemptSave()
#12 /opt/mediawiki/1.35/includes/actions/EditAction.php(71): EditPage->edit()
#13 /opt/mediawiki/1.35/includes/actions/SubmitAction.php(38): EditAction->show()
#14 /opt/mediawiki/1.35/includes/MediaWiki.php(527): SubmitAction->show()
#15 /opt/mediawiki/1.35/includes/MediaWiki.php(313): MediaWiki->performAction()
#16 /opt/mediawiki/1.35/includes/MediaWiki.php(940): MediaWiki->performRequest()
#17 /opt/mediawiki/1.35/includes/MediaWiki.php(543): MediaWiki->main()
#18 /opt/mediawiki/1.35/index.php(53): MediaWiki->run()
#19 /opt/mediawiki/1.35/index.php(46): wfIndexMain()
#20 {main}

--Sabelöga (talk) 21:46, 29 April 2021 (UTC)

Hello Pamputt and Sabelöga, I admit that I didn't search deeply, but I don't understand the change from status to resolved from T280972 (Translating does not work anymore). I still cannot access the Translate pages. Also, the translation wiki pages (page/xxx/code_language) are accessible via Translate, so I am willing to believe that the problem is unrelated, but I am confused. A translation page on the wiki is created and read for translation from Translate, is there no cause link? If these pages are blocked, can FuzzyBot update them? Removing the caches does not solve anything. See also phab:T281289. Why add an old extension version that does not work on MW 1.35 by adding a patch instead of adding what is recommended? Cordially. —Eihel (talk) 11:10, 30 April 2021 (UTC)
Resolved —Eihel (talk) 17:31, 30 April 2021 (UTC)
It works now, thanks! --Sabelöga (talk) 20:06, 30 April 2021 (UTC)

HIGH PRIORITY: Audio recordings have dust and clicks

Under investigation: Some users experience parasitic saturation (“Pock!”) or dust while other don't. This irregular occurrence reminds of earlier, non-solved “speed up bug”.

I've had friends record German and Romanian lists. They're using separate hardware, and have recorded thousands of words before, so I know their hardware is fine. The recordings they've done today suffer from loud clicks on half the recordings, so there seems to be a problem with the recording studio. I clearly have no idea what the problem is or how to fix it, but I hope someone else will!

Here are examples:

  • — LL-Q188_(deu)-Natschoba-der_Wunsch.wav
  • — LL-Q7913_(ron)-Andreea_Teodoraa-muscă.wav
  • — LL-Q150 (fra)-Hélène (Hsarrazin)-corné.wav

Julien Baley (User talk:Julien Baleytalk) 16:24, 24 April 2021 (UTC)

J'ai le même souci. DSwissK (talk) 17:49, 24 April 2021 (UTC)
Hmm, very annoying.I 've opened a Phabricator ticket. I hope the issue will be fixed soon. Pamputt (talk) 18:38, 24 April 2021 (UTC)
HIGH priority. No idea who can fix it. Can someone refine the diagnosis ? Can more people test with their configuration and report here ? Yug (talk) 15:33, 25 April 2021 (UTC)
I notified Mr. Vion, the original coder of the JS recorder. He may have some insights. I suspect it's a bug with either :
  • RecordWizard (studio), the mw extension interfacing the user speaking and the audio processing layers. It got recent changes due to migration to mw 1.35.
  • LinguaRecorder JS, the core JS library processing audio signal. No changes in past week.
Recent changes may have affected how the audio cuts are done. Either mw extension or the JS could need a fix.
This is a core bug preventing LinguaLibre core mission. Any insight is welcome. Yug (talk) 15:43, 25 April 2021 (UTC)
So der Wunsch (Q522922) (deu:der_Wunsch), muscă (Q522753) (ron:muscă) and corné (Q523386) (fra:corné). —Eihel (talk) 17:26, 25 April 2021 (UTC)
@Eihel the 1st and 3rd ones sounds good to me. Yug (talk) 20:38, 25 April 2021 (UTC)
@Yug the 1st and 3rd ones do not sound good to me, there's a clear click on the "der" and "cor". If you have populated the table below, perhaps your numbers are too optimistic (if we have a different judgement on these three). Julien Baley (talk) 12:56, 26 April 2021 (UTC)
@Julien Baley, DSwissK, & Eihel
I reviewed recent recordings of 4 users.
  • Two contributors have perfect audios (100% good on 8 audios checked for each user).
  • Two new users have the bug (30% of audios with saturation).
I first though it could be new users not using their hardware properly : microphone must not be overly sensitive, we should not let them vibrate, etc. It's a know-how we are transmitting when doing IRL workshops and that tech-friendly people fix quickly. Autodidact users have not been warned of this.
But it does not explain why experienced users such as DSwissK and Julien's friend have such noise. So I'am confused.
DSwissK, did you tried alternative microphone settings, with lower volume ? That you are not recently speaking louder or a changes you did not notice previously ? Yug (talk) 22:02, 25 April 2021 (UTC)
Hello Yug, I concede that the difference may be minimal on some records. You have to listen carefully, it's like "a diamond on a vinyl which jumps on a dust". Some files are more affected than others (depending on the vocal intonation), but all of the ones I have cited are problematic. To fully understand, you can try recording with Schtooka (former LiLi), then immediately redo the same recording on LiLi. As I said to Hélène, you can also compare with an existing recording corné (Q499309). Cordially. —Eihel (talk) 15:12, 26 April 2021 (UTC)
@Eihel & Julien Baley I'am officially deaf from one ear so I'am not the best judge on audios. I pushed the review as far as I can do bu could other users help to review more audios so Mr. Vion can attack this investigation with clean clues and ratios. Yug (talk) 16:15, 26 April 2021 (UTC)
@Yug I'm very happy to help review some recordings, if you want; could you suggest a list of users? (I don't know how to find users that have recently recorded). Julien Baley (talk) 17:41, 26 April 2021 (UTC)
@Julien Bale process added below. Thank you ! Note: the user I review (all those below) may have higher noise ratio since don't have a musical ear. Yug (talk) 16:56, 26 April 2021 (UTC)
@Yug ; I've checked the entire table and added a few people (Hsarazin has only 1 recent recording, so I've amended the "14" that was shown). Some people have 0% problem, some close to 100%... the problems are very characteristic. Julien Baley (talk) 19:25, 26 April 2021 (UTC)
@Pamputt & DSwissK & others, I really need help on this one. We need to review and report 10+ recording for each user uploading audios to Commons and likely to send a custom message to each affected user, on their talk page and on their Commons' talk page (ex msg, ex ping). Yug (talk) 16:36, 26 April 2021 (UTC)
@Yug not fully helpful but I added a section on LinguaLibre:Stats#The most prolific speakers for the current month, it may help to narrow down to who did recent recordings. Cheers, VIGNERON (talk) 07:20, 27 April 2021 (UTC)

/!\ The dust bug issue is confirmed as core and relatively widespread. I sent an email this morning to Wikimedia France (Adelaide, Remy, Michael) with suggested solutions : immediate, restoring a sitenotice ribon to inform our users ; short term, hiring Vion for analysis and possibly a fix. We should not be claiming to be back online and on our feet when we arent. Yug (talk) 14:09, 27 April 2021 (UTC)

Good. The CSS fixes have been deployed. → Sitenotice is back. → Indentation is back. Yug (talk) 14:11, 27 April 2021 (UTC)
@WikiLucas00 & DSwissK hi,
Given you are the two active users having this issue we need you most.
Could you record 15~30 other audios with another Web browser, such as Firefox or else. Then report the result with this ?
If you have any other hypothesis to test I'am interested. (Changing microphones, etc.) Yug (talk) 18:23, 27 April 2021 (UTC)
I had the impression (and DSwissK confirmed on Discord) that using Firefox slightly reduces the amount of problems encountered. — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)
Yup, I installed Firefox and could finally send some more audios (me and my daughter), with internal microphone on my laptop. Please review. DSwissK (talk) 00:45, 28 April 2021 (UTC)
@Yug I checked with Andreea_Teodoraa and Natschoba what browser they're using: Chrome and Safari. I asked Andreea_Teodoraa to try Firefox, she did 22 recordings (https://commons.wikimedia.org/wiki/Special:ListFiles?limit=20&user=Andreea+Teodoraa) and 20 are clearly perfect, and 2 (însene and "pe scurt" I feel I hear a problem, but cannot see anything in Audacity). Considering we were on 75% bug on Chrome, this seems to be a move in the right direction. Julien Baley (talk) 02:33, 30 April 2021 (UTC)
@Yug Have tried with another friend (https://commons.wikimedia.org/w/index.php?title=Special:ListFiles&limit=100&user=LangPao) and everything sounds bug-free, both on Chrome and Firefox; Firefox is the most recent 10). Julien Baley (talk) 13:11, 30 April 2021 (UTC)
(Answered below on 15:16, 4 May 2021 Yug (talk) 15:48, 4 May 2021 (UTC))

I think that could raise your interest : same smartphone, same internal microphone, same list (1 word). The only difference is using Chrome and Firefox version. DSwissK (talk) 19:20, 1 May 2021 (UTC)

@Julien Baley & DSwissK thank to you both. The recent A/B testing where only one parameter is changed is what we look for. Testing same users with different browser seems fruitful. Thanks also to Julien for your audacity inspections, our dev will eventually have to dig into that.
@DSwissK, from your 2 example i see mainly a difference in volume (dB). It may be nothing, but when reviewing audios I also noticed that many seemed to be low dB. Could it be that Chrome changed it's default audio recording levels, which increase the presence of noise ? In that cases other projects like Forvo (fake open license) and others should also be affected.
Anyway, if a recent Chrome version was corrupted, maybe we could recommend to use Firefox for a while. Yug (talk) 15:16, 4 May 2021 (UTC)
@Yug there is indeed a difference in volume but the problem is not the noise but the clicks. There is more noise in the Firefox version, but it isn't disturbing. At least, not as much as these clicks... DSwissK (talk) 18:29, 4 May 2021 (UTC)
Is there any chance it is related to the versions of Firefox or Chrome? I guess people upgraded their browser versions in the recent months – if I understand correctly there were a few issues before the OVH fire; perhaps more people upgraded since. (Personnally I hardly hear the issue except when there is a loud click, I don’t have an ear as developed as others here.) Seb35 (talk) 21:05, 4 May 2021 (UTC)

I reinstalled the LinguaRecorder demo on https://lingualibre.org/demo/sandbox.html with the settings identical to the RecordWizard extension (on the gear on the 'Studio' (4th) step and here in the PHP+JS code). You can play with the settings, perhaps there is something to move around the saturation? (You have to click on "Apply new options" then "start" when you change one, and the "ready" counter should be incremented.) Seb35 (talk) 20:54, 4 May 2021 (UTC)

Limiting the number of words to record

@Yug, DSwissK, VIGNERON, Seb35, Pamputt, & Titodutta I think that one important cause of the bugs is related to the RAM. Thus, loading a long list into the Record Wizard results in a maximum amount of bugs in the recordings (the length of this list -- its weight -- may vary, depending on the user's hardware and software).

I think we should try limiting (to 100 or 200 maximum) the possible number of words to be put into the Record Wizard, at least temporarily. There is no point in loading into the RW lists that are 1000-words long; taking a little break during the recording is never wrong, and it could help reducing the amount of bugs for the moment, while we try to find the source of the issue.
Best — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)

We have to test this hypothesis. Yug (talk) 21:35, 27 April 2021 (UTC)
Tested and reporting : I used very small lists (less than 10 words) and still have the same issue. I encounter that bug on my smartphone, both my computers (desktop and laptop) under Chrome (latest version). Using internal or external microphone doesn't change anything. DSwissK (talk) 00:42, 28 April 2021 (UTC)
@DSwissK thank you. This is helpful. Seems clearly software issue. I contacted Wikimedia France and Vion requesting them to jump in.
We need people with audio software skills to inspect those audios and people with JS+audio skills to review the audio input chains. Mr. Vion has both skills. Yug (talk) 10:52, 28 April 2021 (UTC)
I do not think it's RAM related.
Even with 1000 words we are dealing with 1000 words x 7KB per file = 7 MB.
Let's admit the browser stores the words in a very, very details-rich way, so the files are 1000 times heavier. We still are 7GB.
Most computers have 8~16GB of RAM by now.
I also recorded small list and apparently add the issue.
Most (all?) users affected had recorded few dozens words. Worst affected users: Natschoba → 149, Andreea Teodoraa → 247, WikiLucas00 → 64.
All but 3 users this month have recorded less than 300 words. Yug (talk) 11:02, 28 April 2021 (UTC)
Folks, I inspected our Github codes:
I can't find a clear recent change which could have affected our audios recording stream.
@VIGNERON & Seb35 are you aware of any (environmental) change which could have had affected the audio stream of RecordWizard recently ? Yug (talk) 07:57, 29 April 2021 (UTC)
I am still in the process of properly publishing code from the server to Github and Gerrit for the various extensions, but there is indeed no change related to audio.
Specifically the LinguaRecorderJS is very exactly what was installed in 1.31 and in 1.35, no change here (on the server there is only a micro-instruction to register the LinguaRecorderJS in MediaWiki environment)
For the RecordWizard, main changes are maintenance, a technical thing about serialization of Wikibase items, and related to interface (vue.js, which changed from 2.6.11 to 2.6.12, which is mainly a security release).
Seb35 (talk) 19:46, 4 May 2021 (UTC)

@VIGNERON, Seb35, Pamputt, Yug, & Poslovitch
Update: Another user (Le Commissaire) reported an audio bug (on WMFr Discord server). This was not the "click"/"pop" bug, but the speeding-up bug, but the user told that the bug occurred when loading a list of 1000 words into the RW. I suggested him to try loading a shorter list, he tried with 250 words and it worked fine, no issue. This constitutes another clue that RAM is important/long lists are a problem for several users in the RW.
In addition to a potential limitation of the RW to 350 words (for example), see this related ticket:

  • T276014, Feature request to be able to load parts of lists in RW (only possible for Categories at the moment)


Best — WikiLucas (🖋️) 15:09, 6 May 2021 (UTC)

Worth investigating. I made assumption of 7kB per word, but the audio strean could be completly different from my assumption. Natural path would requires to call back Mr. Vion or User:0x010C to investigate (none currently active), or to dive into LinguaRecorderJS, the navigator's memory, and Ram. Maybe more. Yug (talk) 18:41, 6 May 2021 (UTC)

Review process

Click to see the review process

To review recordings by another user :

  1. Go to Special:RecentChanges > Find recent recordings > Pick an user which is not already in the table below
  2. Open 10~20 of this user's recent recordings > Listen each > Count how many have unusual audio artifacts
  3. Add this user to the table below with its associated results and your comment
  4. If you feel necessary, please notify the user on Lili (ex msg) and ping the user on Commons (ex ping)

To be reviewed :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.

To be reviewed, recording with another browser or device :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.
  3. Add some information so we know which of your recording are associated with this alternative browser or device.

Review-ready

  • I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : all good for me, no review needed. Yug (talk) 14:35, 27 April 2021 (UTC)
@Yug Could you try 20 more with an up-to-date version of Chrome? — WikiLucas (🖋️) 18:38, 27 April 2021 (UTC)
@WikiLucas00 Done. I'am not sure, but I may have the bug as well. Yug (talk) 19:42, 27 April 2021 (UTC)
@Yug The majority of your last recordings contain at least a click. — WikiLucas (🖋️) 19:56, 27 April 2021 (UTC)

Samples

Under investigation: Some contributors experience parasitic saturation (“Pock!”) or dust while other don't.
Please review your recent recordings and help expand table below so we can identify a recurring pattern among affected contributors vs non-affected ones.
Username # reviewed % affected Example file Web Browser + version Comment
c User:DSwissK 15 33% (5)

New echo bug?
c User:Natschoba 20 95% (19)


Several thousands of recordings before. No hardware change.
c User:Andreea Teodoraa 11 75% (8)

Several thousands of recordings before. Tried different mics and platforms, same behaviour.
c User:GeoMechain 15 0% (0)
c User:ClasseNoes 15 0% (0)
c User:Hsarrazin 14 30% (4)

c User:ᱥᱟᱹᱜᱩᱱ ᱗ 2 100% (2)
Only 2 audios.
c User:Zoyahssn 2 100% (2) File:LL-Q1860 (eng)-Md Anan Islam (Zoyahssn)-Md Anan Islam.wav Suspects: Hardware & sound setting issue
c User:Olaf 15 0% (0) All recent recordings ok. (I have these clicks in every recording session, but I remove all such occurrences during the review phase. Only because of this it's 0%.Olaf (talk) 23:44, 1 May 2021 (UTC)))
c User:WikiLucas00 60 75% (45)


Brave 1.23.73 (Chromium: 90.0.4430.85) See my 2021-04-26 10pm CEST series
c User:WikiLucas00 300 0% (0) All files are OK Firefox 88.0.1, External microphone Perfectly fine. See my 2021-05-06 9am CEST series
c User:Le Commissaire ?? ?% (?) Opera, Desktop Computer, External microphone Speed-up bug occurred when loading a 1000-words-long list into RW. Tried with loading only 250 words and recording again, went fine.

Publish on Wikimedia Commons

Hello, I just tested, but my records are not published on Commons. My tests: on Firefox, then on Chrome, with 50, then with 1 expression (s), with license CC3.0-BY-SA and CC1.0. —Eihel (talk) 06:51, 2 May 2021 (UTC)

Problème de publication sur Wikimedia Commons
phab:T281636Eihel (talk) 07:10, 2 May 2021 (UTC)
Usually I have the same with the first two recordings in a session. Then I can upload them again at the end. Try again with more recordings, and using "retry filed upload" button. Poemat (talk) 08:07, 2 May 2021 (UTC)
Yup, I had this bug many times. (I say "had" because I don't remember having encountered it after the fire incident.) Just don't give up and it should be published eventually. DSwissK (talk) 11:56, 2 May 2021 (UTC)
(As of 3 May 2021 and as I checked, I'm not aware of any code changes (history) which may have of affected this. Seb35 made some other code change this same day.) Yug (talk) 09:47, 3 May 2021 (UTC)

I add a user who has the same problem: Le Commissaire. —Eihel-LiLi (talk) 15:33, 6 May 2021 (UTC)

Translation admins

I updated this ticket, explaining our need of translation admins. I'm espacially thinking of Sabelöga and Eihel, who have the skills and the needs to get this rights (e.g. here).
If the community agrees, we can ask the developper team currently working on the project to implement this new status into Lingua Libre, and we will then be able to elect new translation admins on LiLi. You can vote by using {{Support}} or {{Oppose}}.
All the best, — WikiLucas (🖋️) 12:21, 4 May 2021 (UTC)

Hello WikiLucas, Especially since the tvar translation variables have just changed. —Eihel-LiLi (talk) 16:32, 5 May 2021 (UTC)

Vote

  • Support Support (proposer) — WikiLucas (🖋️)
  • Support Support We are are early stage for the communnity, having 3 active referents for any given administrative task is required (see also en:Bus factor). It is also necessary to document process as we see them appears, in a concise therefore maintainable way. Yug (talk) 15:09, 4 May 2021 (UTC)
    In this project, the rights associated (example: pagetranslation) with translation administrators are already contained in the administrators. In addition, an administrator can self-grant the right without going through a formal request (on any WM). I therefore think that we are far from the indispensable (wo)man (especially after Strasbourg IMHO). Also, if I want to continue on this project and following the previous section… —Eihel-LiLi (talk) 16:29, 5 May 2021 (UTC)
    @Eihel-LiLi "Active" [and skilled] is an important word. I'm admin but not active on translations pages. We have about 4 admins truly active this past 6 months, AFAIK only WikiLucas was admin while truly active [and skilled] on pagetranslation. Adding 2+ more is required. Seems on the way. Yug (talk) 09:59, 6 May 2021 (UTC)
    And Pamputt too (already TA on WD for example). Cordially. —Eihel-LiLi (talk) 15:14, 6 May 2021 (UTC)
  • Support Support Agree to ask for this new status. Pamputt (talk) 15:46, 4 May 2021 (UTC)
  • Support Support Agreed. DSwissK (talk) 18:31, 4 May 2021 (UTC)
  • Weak support Weak supportEihel-LiLi (talk) 15:49, 6 May 2021 (UTC)
  • Support Support J’ai confiance. Lyokoï (talk) 17:57, 10 May 2021 (UTC)
  • Support Support I'm up for it! --Sabelöga (talk) 18:53, 19 May 2021 (UTC)

Discussion

@Eihel-LiLi Titodutta is already an admin on LiLi, which means he has the pagetranslation right. Implementing this translation admin status would allow us to grant some users the pagetranslation right without granting them all admin rights (like the right to delete pages or block users for instance). — WikiLucas (🖋️) 07:31, 6 May 2021 (UTC)
Ah OK. I took the most prolific users, but I remembered that you and Pamputt are TAs… —Eihel-LiLi (talk) 15:04, 6 May 2021 (UTC)

Browsing the sound library

Nicolas NALLET is currently working on the page that will display the recordings of Lingua Libre, and would like to know the list of filters that we would like to use on this page (e.g. by language, by speaker, by date...)

Feel free to suggest other filters or give your opinion on suggested filters 🙂 — WikiLucas (🖋️) 12:58, 20 May 2021 (UTC)
(pinging @Yug, Pamputt, & Titodutta WikiLucas (🖋️) 15:48, 20 May 2021 (UTC))

Great news!
The most obvious ones are, I guess, the following:
  • by language
  • by speaker
  • by speaker's language proficiency (beginner, etc.)
  • by genre (male, female, etc.)
--Poslovitch (talk) 13:38, 20 May 2021 (UTC)
  • Hello WikiLucas00 and Poslovitch
    • by cat (deepcat, incategory)
    • by coord (nearcoord, boost-nearcoord)
    • by link (linksto)
The codes in parentheses are those of CirrusSearch, an extension that can be added to LiLi. Poslovitch's proposals also have filters contained in WikibaseCirrusSearch (haswbstatement). Tell me what you think of this. Cordially. —Eihel (talk) 20:36, 20 May 2021 (UTC)
@Eihel could you describe a bit how do you imagine this would work? (since the recordings on Lingua Libre don't have cat or coord at all, and could have link but I couldn't find any examples, I'm a bit confused and would like to know more). Same question for CirrusSearch, we could look into it to see if it can be installed, but what use do you see for it? (the only use I know is for WikibaseCirrusSearch). Cheers, VIGNERON (talk) 14:42, 26 May 2021 (UTC)
Code on github please. You may check Forvo and Codepen to find elegant html5 audio element and css. Yug (talk) 22:00, 26 May 2021 (UTC)

Plans for the next armageddon?

Are there any contingency plans implemented after the Big Fire? A regular backup for example? Poemat (talk) 22:49, 24 May 2021 (UTC)

@Poemat good question, thanks for asking. There is obviously some plans. I'll let @Seb35, Nicolas NALLET, & Michael Barbereau WMFr complete and/or correct me but right now, there is daily backups on a server in an other datacenter. Cheers, VIGNERON (talk) 12:47, 26 May 2021 (UTC)

Request for Mon language Code= mnw

Check-green.svg Done
Do not have Mon language for this so I added Thai language I would like to have this problem resolved thanks. message posted by User:咽頭べさ (talk)

Hello again @咽頭べさ thank you for pointing out that Mon language was missing on Lingua Libre! I added it, you should from now on be able to record words in this language 🙂 Please read the message I posted on your talk page before recording new words.
All the best, — WikiLucas (🖋️) 16:40, 27 May 2021 (UTC)

Celebrating the coming 500k milestone

Hello @DenisdeShawi, DSwissK, Eihel-LiLi, Julien Baley, KlaudiuMihaila, Lepticed7, Lyokoï, Olaf, Pamputt, Poemat, Poslovitch, Sabelöga, Theklan, Titodutta, Yug, & सुबोध कुलकर्णी

As you may have seen, we recorded 30,000 pronunciations during the current month (2nd most active month ever), the very first full calendar month since the rebirth of the website, after the datacenter fire that stalled the project for 6 weeks. If we keep a similar pace, we should reach in June the important milestone of 500,000 recordings made on Lingua Libre. That is incredible.

I wanted to ask you all, how do you want to celebrate this milestone? Feel free to suggest anything below, and let's try to celebrate it properly 🙂

All the best
WikiLucas (🖋️) 14:33, 27 May 2021 (UTC)

Hi there, I remember registering numbers up to 1399 in French (c:File:LL-Q150 (fra)-Poslovitch-1399.wav). I abide to get that number up to 4242 once we reach that milestone ! --Poslovitch (talk) 18:18, 27 May 2021 (UTC)