Chat room

Chat rooms in various languages:

Chatroom FAQ

How to download all audios of one language? By speaker?

Datasets are availale here. A script is updating the datasets every 2 days, using CommonsDownloadTool. For more, see Help:Download datasets.

How to add missing languages?

Administrators can add new languages on demand, they do so within few days. Please provide your language's ISO 639-3 code and/or its Wikidata ID. For more, see Help:Add a new language.

How to keep my wikimedia project up to date?

Contact Poslovitch, the master of Lingua Libre Bot. For more info, check out Help:Bots and LinguaLibre:Bot.

What IRL events are coming? When? Where?

Please see LinguaLibre:Events.

How to translate LinguaLibre User Interface into a new language?

Go to translatewiki.net. For more, see Help:Translate.

How to archive sections which have been answered?

After reviewing the section, add {{done}} ~~~~ to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Datasets out of date

Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)

Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)

About the exclusion of already recorded words

Hi, I think the option to exclude words that I have already recorded is broken. This morning, I start a recording session and LL proposes me words that I registered two days ago. For example, I already registered Belorusino two days ago, but it does not disappear when I click exclude words already recorded. And notice the two versions of the file, which I already re-recorded it. Can someone fix this? Lepticed7 (talk) 10:07, 15 November 2020 (UTC)

I have opened a Phabricator ticket. It may be fixed in the coming months but not sure. Pamputt (talk) 20:05, 15 November 2020 (UTC)

Reminder : Grants

Hello all, I'am monitoring grants these days and there is a summary table available here LinguaLibre:Grants

I think both rapid grants mechanisms could be of help to us now, to reach out to local community via small scale events, training, hardware, food, transportation costs, flyers' designs, etc. By example, This WM-France micro-fi's request organizes 4 evenings of contribution, getting 100€ for each evening. The same user has been welcome to do several Grant requests.
Heavier, the R&D Grant could surely be used for something. I have an idea on this, but we can trust Indian contributors to come up with relevant technical ideas and teams as well. @Titodutta Yug (talk) 01:20, 8 February 2021 (UTC)

LinguaLibre Bot and Wikidata

This section should be moved to LinguaLibre:Technical board.

I have not checked the bot's contrib on Wikidata for quite some time. Yesterday I uploaded ~100 Bangal film names from Bangla Wikipedia. It looks like the bot is not active, unless I am missing something. --টিটো দত্ত (Titodutta) (কথা) 18:10, 13 February 2021 (UTC)

Update and technical improvements

Hi all,

Full information and full disclosure, I'm working now with WikiValley and Wikimédia France in a paid capacity to help improve Lingua Libre technical structure (see this - in French - for the scope of our intervention).

One of our first action last Thursday was to restart the Blazegraph updater. A lot of tools are depending on this "fundamental brick" (including but not limited to): the SPARQL endpoint (and pages using it) and bots. Now, you can see that pages like Special:MyLanguage/LinguaLibre:Stats are up-to-date again and the bots should also restart soon (you can see more technical info on this on LinguaLibre:Technical board)).

The next big step will be to update this Mediawiki from 1.31 to 1.35 and moving it to a new server.

If you see something or anything wrong or strange, don't hesitate to let me know. I'm also available for any question.

Cheers, VIGNERON (talk) 08:56, 15 February 2021 (UTC)

Nice ! Happy to see you folks jumping in. Thank you for the Stats ! We can witness our passage over 400,000 audios shortly. Yug (talk) 16:27, 15 February 2021 (UTC)

400,000

The total amount of recordings on Lingua Libre reached 400,000 a few hours ago. February is already the second most fruitful month since the beginning of the project, even though we are only halfway through. LiLi is growing faster and faster, and this is only the beginning!
Congratulations and thanks to everyone who gives some time to record voices and to spread the project around the world.
All the best — WikiLucas (🖋️) 18:10, 16 February 2021 (UTC)

And another milestone broken ! Big thanks to the Titodutta and Marathi effects, too ! Yug (talk) 21:24, 16 February 2021 (UTC)

Yug, WikiLucas and Titodutta- thanks for the support! Marathi community had decided to gift minimum 5000 records on the occasion of Marathi Language Day to be celebrated on 27 February. We have crossed 6000 records as of now. All credit goes to community members. सुबोध कुलकर्णी (talk) 05:22, 26 February 2021 (UTC)

Congratulation to the Marathi community ! It's nice to see you contributes this way :) Yug (talk)

Chat room in your language

Hi all. I've created Template:Lang-CR in order to list all the chat rooms. I think it would be interesting for people to discuss in their native language. The main discussion should remain on this chat room in English in order to be understood by most of the contributors. So feel free to create a village pump/chat room in your mother tongue. Pamputt (talk) 20:21, 16 February 2021 (UTC)

It is welcome move. We need to discuss many local issues, policies, approaches, ideas etc. in own language. I have created Mar page संवाद-चर्चा दालन. Let me know whether the process is right. I will start engaging speakers here. सुबोध कुलकर्णी (talk) 05:36, 26 February 2021 (UTC)

@सुबोध कुलकर्णी that's perfect. Pamputt (talk) 06:40, 26 February 2021 (UTC)

New batch of lists available ! (1,000 languages)

Please, remember to tag the list_talk's page with {{UNILEX license}}.

Greetings!
Thanks to Tshrinivasan with who we discussed recent Indic (Marathi!) activity and lack of lists, I bumped again into UNILEX (GNU-like license), which is a Google-led Unicode Consortium project listing vocabulary for 999 languages. Data seems clean as far as I can tell. The two main maintainers are Google folks. So I suspect UNILEX uses Google's best scrappers and NLP cleaners. Within this data are tab-separated frequency lists as {item} {number_of_occurences}. I forked their github, and made a script to convert their format into Lili's List:* format such as # {item}. See:

github.com/lingua-libre/unilex/data/frequency-sorted-hash/ig.txt – frequency
github.com/lingua-libre/unilex/data/frequency-sorted-count/ig.txt – sorted
github.com/lingua-libre/unilex/data/frequency-sorted-hash/ig.txt – Lili's List format

You can check if there is your own language among the 999 available. For Marathi, replace ig by mr. I therefor created 2 local lists to test this approach :

List:Mar/words-by-frequency-00001-to-01000 – starts soft
List:Mar/words-by-frequency-01001-to-05000 – then I jumps to multiples of 5,000 : 01001-05000, 05001-10000, 10001-15000, etc.

Right now, 1000 lists are already formated in Lili's syntax within the /data/frequency-sorted-hash directory. If any community lacks wordlists on Lili's there you have them : copy, paste, done, situation unlocked ! Yug (talk) 16:40, 24 February 2021 (UTC)

@Titodutta hi! This may interest your community. There are dozen(s) Indic languages :) It could also help you. You already recorded most of those words for your language (ben), together with the "ignore already recorded words" functions, these lists can fill some gaps :) Yug (talk) 16:48, 24 February 2021 (UTC)

I love this. I'll inform the Marathi folks. --টিটো দত্ত (Titodutta) (কথা) 17:16, 24 February 2021 (UTC)
This is just amazing. You don't know how much delighted I am feeling at this moment. I checked the Bengali list, a very few random words have typos, but that should not be more than 1% I guess. Over-all this will an extremely helpful resource for the communities. --টিটো দত্ত (Titodutta) (কথা) 17:24, 24 February 2021 (UTC)

I share your enthusiasm ! It's bot created I'am pretty sure, the clean up is likely just statistical. Now that those lists are technically available, ideal next step would be human review by local communities. Maybe groups of 2~3 users for copyedit sprints ? :D But this is optional IMHO. Also, the corpora coming from online documents, IRL objects like `chair`, `car`, `walk`, may be further down on these lists. But they must be there in the first 20,000 items. The best is the linguistic diversity of this set. Amazing. Yug (talk) 18:10, 24 February 2021 (UTC)

It's a good resource indeed. Thanks! The Marathi words in the list are grammatically correct also, with nearly no typos. We have started discussion about this in our community. Currently, we have started working on Lexemes first, the recordings of the lists thus created will be done simultaneously. The community thinks this approach is more useful in long run. The separate group of speakers may adopt these lists. But then we have to devise way to avoid repetitions. We will definitely discuss more on this resource utilisation and let you know.सुबोध कुलकर्णी (talk) 05:14, 26 February 2021 (UTC)

Tshrinivasan, Yug - Marathi community plans to work on these lists. But [1] giving 404 error. Please help. सुबोध कुलकर्णी (talk) 05:54, 5 March 2021 (UTC)

Tshrinivasan, सुबोध कुलकर्णी : It's in active developements these days so I made few changes.

Currently at: /hugolpz/unilex-extended/frequency-sorted-hash which uses UNILEX as a git submodule to respect each project's scope.
I just ran the script for Marathi, so the lists are now local. When picking a list, type List:Mar/M:

See also section below. My apologize for the changes. Hope it didn't affected you too much. Yug (talk) 07:47, 5 March 2021 (UTC)

Pause before running

Long tail curves likely applies to languages ranked by number of speakers. Since macro-languages such Mandarin, English, Spanish, Hindi, etc are certain to be soon audio documented by the sheer force of demography, our effort-strategy should progressively shift toward the right, and increasingly rare languages. The rarer the languages and speakers, the more listening we should become and the more custom assistances we will have to provide.

Dragons Bot has been created, coded, tested, and is ready to import UNILEX's lists to LinguaLibre's List:{iso}/{title} namespaces. Given 1,000 pages and associated talk page will be create, I would like to pause few days to consider about this large list import / creation and why.

Lili > Languages > existing breath: We reached 110 languages on LinguaLibre so far.
Lili > Lists > non-sorted by usefulness : Sparql queries provides lists for all languages, but without prioritization on words' usefulness.
Lili > Lists > sorted by usefulness :
- Hand picked frequency lists are present for about 7 languages : eng, mar, por, pol, tam, ron, kur. With optimal relevance for teaching/learning.
- Olafbot's List:*/Lemmas-without-audio-sorted-by-number-of-wiktionaries for 72 languages, updated daily, with optimal relevance for wiktionaries.
- UNILEX can provide frequency lists for 1,000 languages. About 10 times our current language coverage. UNILEX plugs itself upon Github.com/Google/Corpuscrawler, and open source project which plan to support more languages. I dived into these chain and it's an 'easy' NLP pipeline to contribute too. The wikimedia comunity can use it and expand it.

Core issue: the core issue from online arrival of users is to increase retention of minority and semi-rare languages by smoothing their speakers work. By example an user of Wayuu language arrived today. We local (frequency) list was available today. But UNILEX + Dragons Bot can provide a local Wayuu frequency list of 8000 items, ready to record.
Since we don't know which semi-rare languages will come next, having 1,000 languages ready is a safe yet not so excessive bet. Assuming a en:Zipf's law/en:Long tail curve for languages and their speakers we can still predict that at least one out of 10~20 new language's speaker will miss a local wordlist. But together with OlafBot's lists, we move from 6% toward 90% of our languages habing a solid, usefulness-based roadmap to walk forward. Yug (talk) 14:21, 3 March 2021 (UTC)

Well, I believe the idea to import Unilex lists is very good. One of the things a new user needs most is an idea of what to record. The Unilex lists suit this function, especially in the case of new languages, where there is no other list available, and no words have been already recorded. The only question I see is how to import the Unilex lists. Perhaps the best idea is to import 1000 most frequent words from each list. It would be even better if the recorded words were automatically removed from the lists and replaced by new ones (like in the case of Olafbot-managed lists), but even a static list is good as bait if the goal is just to attract more speakers of rare languages.

One remark: you should translate the file names from Unilex to match LiLi's language codes (or perhaps you did it, I don't know, I didn't examine the code). It's not always the same, for example, Polish is "pl" in Unilex, and "Pol" in Lili. If you leave the old codes, the list won't be automatically found when a new user presses the "Local List" button. Anyway, the newbies are likely not to notice the lists at all regardless of all our efforts. Olaf (talk) 00:55, 4 March 2021 (UTC)

jQuery.Deferred exception: this.pastRecords is undefined

This discussion may be moved to LinguaLibre:Technical board.

Hello, there.

When I try to load a list of words to record from the FR wiktionary, the modal does not disappear when I click "Done" and seems blocked trying to load the words. During this time, the JS console complains that "jQuery.Deferred exception: this.pastRecords is undefined", and the last resource loaded is, in cURL format: curl 'https://fr.wiktionary.org/w/api.php?action=query&format=json&origin=*&formatversion=2&prop=pageterms&wbptterms=label&generator=categorymembers&gcmnamespace=0&gcmtitle=%3ACat%C3%A9gorie%3ALocutions%20verbales%20en%20fran%C3%A7ais&gcmtype=page&gcmlimit=max' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Origin: https://lingualibre.org' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Referer: https://lingualibre.org/' -H 'TE: Trailers'

Looks like there is a bug…

Regards. LoquaxFR (talk) 17:21, 24 February 2021 (UTC)

Salut LoquaxFR, peux-tu décrire précisément ce que tu fais lorsque tu écris "when I try to load a list of words to record from the FR wiktionary" ? Comment charges-tu la liste de mots, le fais tu en utilisant en utalisant l'option « Catégorie Wikimedia » sur la droite ou bien en créant toi-même la liste de mots un par un ? Si tu utilises « Catégorie Wikimedia », peux-tu nous donner la catégorie que tu veux utiliser ? Est ce que tu arrives à reproduire le problème quelle que soit la catégorie avec laquelle tu veux travailler ? Merci d'avance pour ces renseignements qui je l'espère pourront permettre de cerner le problème le plus précisément possible. Pamputt (talk) 17:58, 24 February 2021 (UTC)

En français, ce sera plus simple, en effet. Le problème se reproduit systématiquement lorsque j’essaye d’utiliser une catégorie Wikimédia (celle du wiktionnaire français en l’occurrence); je n’utilise que cette possibilité pour charger des mots, et le problème apparaît pour toutes les catégories que j’essaye d’utiliser, que j’aie déjà enregistré presque tous les mots ou celles pour lesquelles je n’ai fait qu’une petite partie des milliers de termes. Le problème se produit en navigation privée également, donc ça ne semble pas être le cache ou les cookies. Si besoin de plus d’infos, n’hésite pas. LoquaxFR (talk) 18:08, 24 February 2021 (UTC)

Merci pour les infos supplémentaireS. Je viens de tester avec Firefox 78.7 et je ne rencontre pas ce problème. Peux-tu essayer avec un autre navigateur (Chromium ou autre) pour voir si le problème est inhérent à ton firefox (y compris en navigation privée). Ca peut par exemple venir d'un gadget que tu aurais installé. Pamputt (talk) 18:40, 24 February 2021 (UTC)

Addons Firefox qui casse le JS ? Yug (talk) 18:57, 24 February 2021 (UTC)

Chrome et Safari me donnent le même résultat ; j’ai également essayé depuis une autre bécane et un autre OS, sans mieux : l’erreur JS se montre toujours et rien ne se passe au moment de la validation de la modale. Est-ce que j’aurai enregistré trop de mots, faisant bugger le JS lorsqu’il essaye de retirer ceux déjà enregistrés ? Vu qu’on n’est que quelques-uns à en avoir enregistré autant, ça se pourrait. J’avais déjà remarqué que le chargement de listes depuis le Wiktionnaire mettait de plus en plus de temps pour moi (relativement, hein : quelques secondes d’attente au plus). Est-ce un autre problème lié à mon compte ? LoquaxFR (talk) 06:30, 25 February 2021 (UTC)

Merci pour les compléments d'info. J'ai ouvert T275734. Faudrait voir avec Lepticed7 et WikiLucas00, qui ont sensiblement le même nombre d'enregistrements que toi, pour tester si ils rencontrent aussi le même problème. Pamputt (talk) 06:54, 25 February 2021 (UTC)

Salut, perso, je sais pas si c’est lié, mais il y a certains enregistrements que le Record Wizard ne retire pas quand je veux retirer les mots déjà enregistrés. En atteste ce fichier, que j’ai enregistré trois fois. Lepticed7 (talk) 10:45, 28 February 2021 (UTC)

50,000

February 2021. This month. We have seen 50,000 pronunciation in a month (see LinguaLibre:Statistics). This is for the first time we saw 50,000 entries in a month. This is great. --টিটো দত্ত (Titodutta) (কথা) 08:51, 28 February 2021 (UTC)

That's really amazing. The same month we passed 400k recordings! AND the shortest month in the year! I'm going to prepare a small News to be published every month (inspired by what you did in September if I remember correctly), I think February is a very good month to start with! I'll publish it on your talk page if you'd like 🙂 All the best ! — WikiLucas (🖋️) 16:11, 28 February 2021 (UTC)

We can actually officially start a bi-monthly LinguaLibre:Newsletter to published on 1 March, 1 May, 1 July and so on. What do you think? I am also requesting User:Pamputt, User:Yug, User:Lyokoï, User:Lepticed7 to comment. --টিটো দত্ত (Titodutta) (কথা) 17:40, 28 February 2021 (UTC)

I would say, why not but I cannot lead for such project so if you are motivated to write and lead such newsletter, go ahead. Pamputt (talk) 18:39, 28 February 2021 (UTC)

On the LinguaLibre:Technical board/intro Poslovitch has started a /News section which keeps log of important milestones. It's an interesting idea because it's minimalist, therefor low maintenance.

I'am also interested by a Newsletter for both external and internal purpose. I would help around yes. Editorial line would gain to be clarified: who are the expected readers, writing stuly, overall length, major sections, sections lenghts, etc. But this can "appears" with the first few issues :) Please keep a balance so the writing workload stays modest. Yug (talk) 18:57, 28 February 2021 (UTC)

The /News of the technical board is mostly about technical news. I fully agree to the idea of a Newsletter, yet quarterly. We could grab some ideas from the French Wiktionary's Actualités. --Poslovitch (talk) 20:33, 28 February 2021 (UTC)

Salut, let's start with the newsletter of March. I'll add the stories I know such as 400,000 audios, 50,000 this month, the Wikimedia Wikimeet India, upcoming France-India call, French Wiktionary missed recording work etc. I'll start the draft tomorrow and ping you here.
In future we will need mw:Extension:MassMessage to send newsletter to subscribers' talk page. A system admin is needed with access to the server and localsettings.php etc pages. I understand this will take time, so it can wait. Kind regards. --টিটো দত্ত (Titodutta) (কথা) 21:24, 28 February 2021 (UTC)

@Titodutta hi, We are having on the mailing list another discussion about networking, cooperations and outward communications. I think the LinguaLibre:Newsletter page can be modeled upon Technical board and LinguaLibre:Bot, a kind of hub for a subgroup of active users dedicated to a common goal. In this case Communication. The bimonthly Newsletter could be a core, founding element. But other discussion about outreach could take place there. We have so much to push in this direction : academic outreach, rare languages and under-represented countries, partner institutions, calling for new wikimedians, reminding far-away Wikimedian chapter of Lingualibre, etc. Having a hub dedicated to writing elegant co-edited texts, defining targets and leading the call for communication campaign would be a strong plus. I'am still focused on codes but I could help in few weeks. You seems to love it as well. Do we have other users interested to join such efforts ? Would be good to have few more folks. Yug (talk) 20:39, 2 March 2021 (UTC)

Newsletter : March 2021 review ?

You can co-edit this text. PS Titodutta: a rough summary of past months and emerging directions based on a message to an ex-contributor.

In January and February, the « Lili » community has taken back control of the technical stack (access to servers, GitHub codes, bots, etc.) and made a call for more diverse speakers. The Indian community started to show up, with key Indic languages being Bengali (50,000) and Marathi (~10,000). Romanian, Polish, Ukrainian are also on the rise around 20,000 audios each. We continue to have some dozen smaller languages showing up but no powerful push yet.

Right now, an external software company is upgrading our MediaWiki and its modules thanks to Wikimedia France's funding. The volunteer dev team is also strong and internal organization is increasing. We now have LinguaLibre:Technical board as a tech hub, LinguaLibre:Bot as a bot hub, LinguaLibre:Events as an IRL/Online event hub. When the main software upgrade settles down in a month we plan a [yet to create] LinguaLibre:Newsletter/room as an inward and outward communication hub.

In that last dimension, we could reach out to « relay users » on other wikis, who can share our news about LinguaLibre with communities of wiktionaries, wiksources, wikipedias, wikidata. We equally consider formally reaching out to non-Wikimedia groups such as Common Voice, Unicode, governmental and NGO agencies, research centers. Possibly in the form of group work and/or an online editathon when we gather to spread the news. This hub, summarizing the community's discussions, will therefore also clarify goals and strategies. We are looking for help with this matter.

This current forward dynamic is thanks to the early Autumn 2020's efforts. We weren't able to immediately convert those into actions but it still injected energy and vision into LinguaLibre which helped snowball the current dynamic. Also, many thanks to all those who got involved in this journey! Yug (talk) 07:20, 3 March 2021 (UTC)

Also, I just found out Commons grows at a speed of about 1 millions files per month. So with 50,000 audios last month, Lili makes up to 5% of Commons' new files. Yug (talk) 14:57, 3 March 2021 (UTC)

Made a minor change, I'll get back to this. Sorry for the delay, something kept me really busy for the last two days. Regards. --টিটো দত্ত (Titodutta) (কথা) 20:20, 3 March 2021 (UTC)

Marathi women speakers celebrate 'Women's Day' & 'Women History Month' on Lingua Libre

Greetings of coming World Women's day!
Glad to share this news. Marathi language community in Maharashtra State of India has taken initiative to record their language from the last 2 months. Out of total 26 speakers, @24 are women from 4 different places in the state. The group has decided to reach 10,000 recording mark to celebrate 'Women's Day' and 15,000 mark in March. As of now 8600+ recordings are uploaded. A small group of women have also started working on Lexicographical data, the recordings of which would be done simultaneously. The activity is being coordinated by institutional partner Jnana Prabodhini, Pune and facilitated by CIS-A2K, affiliate of WMF in India. The community needs support from all of you. Thanks, सुबोध कुलकर्णी (talk) 06:28, 5 March 2021 (UTC)

Greeting सुबोध कुलकर्णी, nice to witness this enthusiasm.

I imported UNILEX lists for Marathi. When in RecordWizard's Step 3 as you pick a list, go for Local list, then mar/M and you will see lists of the most used words. I proposed a gentle ramp approach : first list has just 200 words, see List:Mar/Most_used_words,_UNILEX_1:_words_00001_to_00200. Given my experience it will allows better on-the-ground session with new users. 200 is gently ambitious, allows to pass the uncanny valley of the first 20 words, and move to the joyful Lingualibre flow of rapid recording. Perfect for demo and on-boarding. :)

Following lists are for motivated users who chose to return. To consolidate skills, list 2 has 800 words while list 3 has 1000. At this state a nice 2,000 audio have been recorded by the speaker, while this words likely make up for 90% of daily conversations.

It then moves into committed users. List 4 has 3000, the following ones 5,000 words each. These lists are not expected to be done in one strike but over several session of one hour or less, during a dedicated day or along a week or so.

I hope these may help your language community to better on-board interested contributors :)

We also encourage development of women speakers networks, so thanks a lot for your lead. Yug (talk) 08:57, 5 March 2021 (UTC)

Added Marathi lists :

Yug (talk) 09:01, 5 March 2021 (UTC)

Many thanks Yug for detailed explanation. These are useful to start with. Our group has taken lexicographical approach now to develop lists. So we need alphabetical lists to get forms of words. For example we create list like this - शरीर, शरीरभर, शरीराकडून, शरीराकडे, शरीराचं, शरीराचा, शरीराची, शरीराचे, शरीराच्या, शरीरात...etc. The members distribute work according to letters. Therefore it will be good if we can get modified lists. - सुबोध कुलकर्णी (talk) 11:22, 5 March 2021 (UTC)

I see. सुबोध कुलकर्णी, you could use frequency-sorted-count/mr.txt, keep the 30,000 most frequent, then sort alphabetically and split by hand on each letter. See Help:How_to_create_a_frequency_list?#UNILEX.27s_lists. Yug (talk) 11:53, 5 March 2021 (UTC)

I tried to pushed it forward but it's a bit more complex than I anticipated. Ideally, you would 1) add a prefix so औ.txt becomes /Marathi_words_starting_with_औ.txt, 2) merge the rarest letters together. I must refocus on non-wiki projects, can you call for help from local wiki-developers ?

# Define language
iso=mr
# get file, cut out meta, sort by 2nd column (frequency), keep 50000, keep only word, sort by 1st column, alphabetically, save to .txt file
curl https://raw.githubusercontent.com/unicode-org/unilex/master/data/frequency/${iso}.txt | tail -n +6 | sort -k 2,2 -n -r | head -n 50000 | cut -d$'\t' -f1 | sort -k 1,1 > ${iso}.txt
# get mr.txt content, for all line starting with alpha-num, convert first letter to lowercase, then print in files depending on first symbol
cat mr.txt | awk '{file = (/^[[:alnum:]]/ ? tolower(substr($0,1,1)) : "symbol") ".txt"; print >> file; close(file)}'
# Remove a to z files
find . -regex './[a-z].txt' -delete
# Convert to wiki lists format `# {item}
sed -i -E 's/^/# /g' `find . -type f -name "?.txt"`
# See line counts, sorted numerically descendant
wc -l * | sort -n -r
# See lines count, if n<200 then print filename, add file to merged.txt
wc -l * | awk '$1 < 200 {print $2}' | xargs cat >> merged.txt

This already provides the lists by letters. It should put you solidly on the way. Yug (talk) 12:52, 5 March 2021 (UTC)

Without merge (50 files)

With merging (32 files)

  99860 total
  50000 mr.txt
   4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
    182 थ.txt
    163 ओ.txt
    118 छ.txt
    115 ऑ.txt
     64 ऐ.txt
     55 ढ.txt
     44 औ.txt
     29 २.txt
     26 ई.txt
     20 ष.txt
     20 ऊ.txt
     20 १.txt
     14 ऋ.txt
      6 ऱ.txt
      4 ३.txt
      2 ९.txt
      2 ८.txt
      1 ॐ.txt
      1 ४.txt

  4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    886 merged.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt

There is also a list List:Mar/Lemmas-without-audio-sorted-by-number-of-wiktionaries which is updated every day by a bot, so it should be always fresh. The list consists of words that are present in one or more Wiktionaries, but have no recording in Commons. At the top of the list, there are words with the largest number of Wiktionaries. You could probably give it a try too, सुबोध कुलकर्णी. Olaf (talk) 16:34, 5 March 2021 (UTC)

Automatically updated lists of unrecorded audio

Not everybody here is probably aware that there are lists of unrecorded words available for 72 languages. The lists are sorted by the number of the language versions of Wiktionary where a corresponding word is described, with the most popular words at the top, so the lists should maximize in a way the usefulness of the recording. Words with audio recordings present in Commons are removed automatically from the lists every night. In this way, the lists should be always fresh. The lists have always a title in the form of <language code>/Lemmas-without-audio-sorted-by-number-of-wiktionaries: afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue. Olaf (talk) 16:51, 5 March 2021 (UTC)

This is game changer. Welcoming new contributors of 72 languages will no more be a tricking question of providing relevant lists. More lists coming. We can refocus on outreach and calling for new contributors to audio document their voices, their languages, their cultures. Yug (talk) 18:15, 5 March 2021 (UTC)

Outreach

Dialects of Catalan.

I used the opportunity of bumping into a currently inactive user to go to his wikipedia (Catalan), ask him where I could announce we now have a cat list, and went to make a gentle announcement. I don't expect it to pay off soon, but by several pings, we should have some folks landing back here on Lingualibre. I didn't contact the ca:wikt community but you see the idea : leaving small many announcements here and there so people know our name. Smaller pings are ok. "Sorry all, i've been busy on LinguaLibre project those days", this would be helpful too. I tried to emphasis what service Lili provides to them (not sure I was good on that, but it's just a ping :) ). Please when you have the opportunity, reach out to local communities. Especially those not currently active. We have nice lists in 72+ languagea now. Let the wiki folks know and record more. Yug (talk) 08:24, 7 March 2021 (UTC)

@Pamputt hi, they started a light conversation-description of Catalan about cat valencia, cat central, cat balearic and cat Western (? not sure it was 3 or 4 different) pronunciations. Do you have any understanding on this Catalan issue ? Is this like Marseille French VS Paris French accents or something else ? Yug (talk) 18:25, 7 March 2021 (UTC)

I do not precisely know how different are these Catalan varieties but they are more different than French from Paris and French from Marseille because theses varieties are considered as different dialects. So it is something like Gascon (Q930) and Occitan auvernhat (Q1186) for the Occitan language. So we could start to import this dialect in Lingua Libre to be able to record in these dialects. At least, we should import the main dialects here, namely Northwestern Catalan, Valencian, Central Catalan, Balearic, Rossellonese and Alguerese. Pamputt (talk) 18:58, 7 March 2021 (UTC)

It seems to be the wish expressed by User:Vriullop too, and on another discussion I got. Yug (talk) 19:22, 7 March 2021 (UTC)

Northwestern Catalan (Q518078), Valencian (Q518079), Central Catalan (Q518087), Balearic (Q518106), Northern Catalan (Q518118), Algherese (Q518128) are now available, so we can record right now words in these dialects. Pamputt (talk) 20:09, 7 March 2021 (UTC)

License ?

Done
I bumped again into cc-by-sa license for contributions. Aren't we supposed to contribute it all under CC-0 so it's Wikidata compatible ? Yug (talk) 21:39, 8 March 2021 (UTC)

The licence is up to the user's choice. --Poslovitch (talk) 21:54, 8 March 2021 (UTC)

Then what do we do on wikidata ? Ooohhh... It's just a link toward Commons, no a copy of the audio file.... Yug (talk) 22:53, 8 March 2021 (UTC)

Metrics > Accounts creations

Hi everyone !
We got about 5 times more account creations this January 2021 (~60) compare to January 2020 (~12).
Welcoming is largely done by hand these days. Having a bot for that may help.
And, given that we are all overloaded, maybe would be wise to outreach for help. Yug (talk) 23:19, 8 March 2021 (UTC)

Help - to delete word

Hi, please guide me how i can delete recorded word from lili. already uploaded on wikimedia commons by mistake. Recorded Marathi word is 'कालका', which i want to delete. Thanks in advance.

Hi Aparna Gondhalekar, there are two options depending whether "कालका" exists. If "कालका" exists but you record badly, then you just need to record it again and the new recording will replace the previous recording. Or if "कालका" does not exist, we need to delete the file directly on Wikimedia Commons. Pamputt (talk) 21:18, 9 March 2021 (UTC)

Wikimania 2021

It's not a big surprise, but it have been confirmed : Wikimania_2021 will be online only. It will limit our outreach. We used to go there and record 10~20 languages, 5-mins demoing to 30 people, and doing workshop to 40+ others. Also got plenty of small chats (100+) raising awareness about Lili and connecting with devs for fast discussions. Will need to find other way this year too. Yug (talk) 21:34, 9 March 2021 (UTC)

Return with Return

So, we are back. Almost after 50 days, we are back to work. Thanks to User:VIGNERON, User:Yug, User:Pamputt etc who were around. Let's make some noise.

Idea: I have an idea, can you record the word "Return" or "Come back" (or something similar) in your language and put it in the gallery below? Please mention the language name, and meaning in the caption. --টিটো দত্ত (Titodutta) (কথা) 02:09, 23 April 2021 (UTC)

"Return/Come back" as in "LinguaLibre is back", :en:The Lord of the Rings: The Return of the King] (70 languages) or en:Return of the Jedi (63), right ? Titodutta, please provide some examples / context. Yug (talk) 04:58, 23 April 2021 (UTC)

Yes you are right. --টিটো দত্ত (Titodutta) (কথা) 19:30, 23 April 2021 (UTC)

Return Gallery

প্রত্যাবর্তন (Protyaborton in Bangla, means "Return")
Retour (French)

Translate doesn't seem to work

I can't seem to be able to translate pages, is this an error on my behalf or are there something wrong with the servers? --Sabelöga (talk) 17:01, 23 April 2021 (UTC)

Indeed, something is broken. There is a Phabricator ticket to track this issue. Pamputt (talk) 18:30, 23 April 2021 (UTC)

Okay, thank you. --Sabelöga (talk) 22:01, 23 April 2021 (UTC)

Hello Pamputt, I tried to translate several pages from the Wiki directly, to test, taking inspiration from the T:xx translation markers (example: https://lingualibre.org/wiki/Translations:Help:Main/14/fr). An error occurs, always the same. I added a line in your task, notifying Tgr who may be interested. He may add the tag of the "OAuthAuthentication" project. Cordially. —Eihel (talk) 14:31, 25 April 2021 (UTC)

Translations are back. Thanks. Pamputt (talk) 18:54, 27 April 2021 (UTC)

I still can't seem to be able to translate :( @Pamputt & Eihel --Sabelöga (talk) 22:12, 28 April 2021 (UTC)

Sabelöga can you describe precisely (or post a screenshot) when you want to translate the main page? Pamputt (talk) 08:29, 29 April 2021 (UTC)

Pamputt When I click translate it looks like this, and nothing else happens. https://imgur.com/a/fgY1sSl --Sabelöga (talk) 15:42, 29 April 2021 (UTC)

Sabelöga Indeed, it is the same behaviour as before. Could it be a problem of cache? Could you try to clear it (see Wikipedia:Bypass_your_cache to know how to bypass it if needed). Seb35 and VIGNERON, do you have any idea? Pamputt (talk) 17:26, 29 April 2021 (UTC)

Pamputt I've tried to clear cache, to log in on different devices, edit on computer and mobile and translate uninlogged in incognito mode and when I tried to manualy create Translations:Help:Configure_your_microphone/1/sv this error appeared:

Internt fel
[1738fa8dc0b56f3d0f41bed6] /index.php?title=Translations:Help:Configure_your_microphone/1/sv&action=submit Error from line 294 of /opt/mediawiki/1.35/extensions/OAuthAuthentication/auth/OAuthPrimaryAuthenticationProvider.php: Class 'MediaWiki\Extensions\OAuthAuthentication\AuthBlacklist' not found

Backtrace:

#0 /opt/mediawiki/1.35/includes/auth/AuthManager.php(2470): MediaWiki\Extensions\OAuthAuthentication\OAuthPrimaryAuthenticationProvider->providerRevokeAccessForUser()
#1 /opt/mediawiki/1.35/includes/auth/AuthManager.php(864): MediaWiki\Auth\AuthManager->callMethodOnProviders()
#2 /opt/mediawiki/1.35/includes/user/User.php(848): MediaWiki\Auth\AuthManager->revokeAccessForUser()
#3 /opt/mediawiki/1.35/extensions/Translate/src/SystemUsers/FuzzyBot.php(17): User::newSystemUser()
#4 /opt/mediawiki/1.35/extensions/Translate/TranslateHooks.php(1095): MediaWiki\Extensions\Translate\SystemUsers\FuzzyBot::getUser()
#5 /opt/mediawiki/1.35/includes/HookContainer/HookContainer.php(321): TranslateHooks::validateMessage()
#6 /opt/mediawiki/1.35/includes/HookContainer/HookContainer.php(132): MediaWiki\HookContainer\HookContainer->callLegacyHook()
#7 /opt/mediawiki/1.35/includes/HookContainer/HookRunner.php(1529): MediaWiki\HookContainer\HookContainer->run()
#8 /opt/mediawiki/1.35/includes/EditPage.php(1904): MediaWiki\HookContainer\HookRunner->onEditFilterMergedContent()
#9 /opt/mediawiki/1.35/includes/EditPage.php(2232): EditPage->runPostMergeFilters()
#10 /opt/mediawiki/1.35/includes/EditPage.php(1724): EditPage->internalAttemptSave()
#11 /opt/mediawiki/1.35/includes/EditPage.php(680): EditPage->attemptSave()
#12 /opt/mediawiki/1.35/includes/actions/EditAction.php(71): EditPage->edit()
#13 /opt/mediawiki/1.35/includes/actions/SubmitAction.php(38): EditAction->show()
#14 /opt/mediawiki/1.35/includes/MediaWiki.php(527): SubmitAction->show()
#15 /opt/mediawiki/1.35/includes/MediaWiki.php(313): MediaWiki->performAction()
#16 /opt/mediawiki/1.35/includes/MediaWiki.php(940): MediaWiki->performRequest()
#17 /opt/mediawiki/1.35/includes/MediaWiki.php(543): MediaWiki->main()
#18 /opt/mediawiki/1.35/index.php(53): MediaWiki->run()
#19 /opt/mediawiki/1.35/index.php(46): wfIndexMain()
#20 {main}

--Sabelöga (talk) 21:46, 29 April 2021 (UTC)

Hello Pamputt and Sabelöga, I admit that I didn't search deeply, but I don't understand the change from status to resolved from T280972 (Translating does not work anymore). I still cannot access the Translate pages. Also, the translation wiki pages (page/xxx/code_language) are accessible via Translate, so I am willing to believe that the problem is unrelated, but I am confused. A translation page on the wiki is created and read for translation from Translate, is there no cause link? If these pages are blocked, can FuzzyBot update them? Removing the caches does not solve anything. See also phab:T281289. Why add an old extension version that does not work on MW 1.35 by adding a patch instead of adding what is recommended? Cordially. —Eihel (talk) 11:10, 30 April 2021 (UTC)

Resolved —Eihel (talk) 17:31, 30 April 2021 (UTC)

It works now, thanks! --Sabelöga (talk) 20:06, 30 April 2021 (UTC)

HIGH PRIORITY: Audio recordings have dust and clicks

Under investigation: Some users experience parasitic saturation (“Pock!”) or dust while other don't. This irregular occurrence reminds of earlier, non-solved “speed up bug”.

I've had friends record German and Romanian lists. They're using separate hardware, and have recorded thousands of words before, so I know their hardware is fine. The recordings they've done today suffer from loud clicks on half the recordings, so there seems to be a problem with the recording studio. I clearly have no idea what the problem is or how to fix it, but I hope someone else will!

Here are examples:

— LL-Q188_(deu)-Natschoba-der_Wunsch.wav
— LL-Q7913_(ron)-Andreea_Teodoraa-muscă.wav
— LL-Q150 (fra)-Hélène (Hsarrazin)-corné.wav

Julien Baley (User talk:Julien Baleytalk) 16:24, 24 April 2021 (UTC)

J'ai le même souci. DSwissK (talk) 17:49, 24 April 2021 (UTC)

Hmm, very annoying.I 've opened a Phabricator ticket. I hope the issue will be fixed soon. Pamputt (talk) 18:38, 24 April 2021 (UTC)

HIGH priority. No idea who can fix it. Can someone refine the diagnosis ? Can more people test with their configuration and report here ? Yug (talk) 15:33, 25 April 2021 (UTC)

I notified Mr. Vion, the original coder of the JS recorder. He may have some insights. I suspect it's a bug with either :

RecordWizard (studio), the mw extension interfacing the user speaking and the audio processing layers. It got recent changes due to migration to mw 1.35.
LinguaRecorder JS, the core JS library processing audio signal. No changes in past week.

Recent changes may have affected how the audio cuts are done. Either mw extension or the JS could need a fix.

This is a core bug preventing LinguaLibre core mission. Any insight is welcome. Yug (talk) 15:43, 25 April 2021 (UTC)

So der Wunsch (Q522922) (deu:der_Wunsch), muscă (Q522753) (ron:muscă) and corné (Q523386) (fra:corné). —Eihel (talk) 17:26, 25 April 2021 (UTC)

@Eihel the 1st and 3rd ones sounds good to me. Yug (talk) 20:38, 25 April 2021 (UTC)

@Yug the 1st and 3rd ones do not sound good to me, there's a clear click on the "der" and "cor". If you have populated the table below, perhaps your numbers are too optimistic (if we have a different judgement on these three). Julien Baley (talk) 12:56, 26 April 2021 (UTC)

@Julien Baley, DSwissK, & Eihel

I reviewed recent recordings of 4 users.

Two contributors have perfect audios (100% good on 8 audios checked for each user).
Two new users have the bug (30% of audios with saturation).

I first though it could be new users not using their hardware properly : microphone must not be overly sensitive, we should not let them vibrate, etc. It's a know-how we are transmitting when doing IRL workshops and that tech-friendly people fix quickly. Autodidact users have not been warned of this.

But it does not explain why experienced users such as DSwissK and Julien's friend have such noise. So I'am confused.

DSwissK, did you tried alternative microphone settings, with lower volume ? That you are not recently speaking louder or a changes you did not notice previously ? Yug (talk) 22:02, 25 April 2021 (UTC)

Hello Yug, I concede that the difference may be minimal on some records. You have to listen carefully, it's like "a diamond on a vinyl which jumps on a dust". Some files are more affected than others (depending on the vocal intonation), but all of the ones I have cited are problematic. To fully understand, you can try recording with Schtooka (former LiLi), then immediately redo the same recording on LiLi. As I said to Hélène, you can also compare with an existing recording corné (Q499309). Cordially. —Eihel (talk) 15:12, 26 April 2021 (UTC)

@Eihel & Julien Baley I'am officially deaf from one ear so I'am not the best judge on audios. I pushed the review as far as I can do bu could other users help to review more audios so Mr. Vion can attack this investigation with clean clues and ratios. Yug (talk) 16:15, 26 April 2021 (UTC)

@Yug I'm very happy to help review some recordings, if you want; could you suggest a list of users? (I don't know how to find users that have recently recorded). Julien Baley (talk) 17:41, 26 April 2021 (UTC)

@Julien Bale process added below. Thank you ! Note: the user I review (all those below) may have higher noise ratio since don't have a musical ear. Yug (talk) 16:56, 26 April 2021 (UTC)

@Yug ; I've checked the entire table and added a few people (Hsarazin has only 1 recent recording, so I've amended the "14" that was shown). Some people have 0% problem, some close to 100%... the problems are very characteristic. Julien Baley (talk) 19:25, 26 April 2021 (UTC)

@Pamputt & DSwissK & others, I really need help on this one. We need to review and report 10+ recording for each user uploading audios to Commons and likely to send a custom message to each affected user, on their talk page and on their Commons' talk page (ex msg, ex ping). Yug (talk) 16:36, 26 April 2021 (UTC)

@Yug not fully helpful but I added a section on LinguaLibre:Stats#The most prolific speakers for the current month, it may help to narrow down to who did recent recordings. Cheers, VIGNERON (talk) 07:20, 27 April 2021 (UTC)

/!\ The dust bug issue is confirmed as core and relatively widespread. I sent an email this morning to Wikimedia France (Adelaide, Remy, Michael) with suggested solutions : immediate, restoring a sitenotice ribon to inform our users ; short term, hiring Vion for analysis and possibly a fix. We should not be claiming to be back online and on our feet when we arent. Yug (talk) 14:09, 27 April 2021 (UTC)

Good. The CSS fixes have been deployed. → Sitenotice is back. → Indentation is back. Yug (talk) 14:11, 27 April 2021 (UTC)

@WikiLucas00 & DSwissK hi,

Given you are the two active users having this issue we need you most.

Could you record 15~30 other audios with another Web browser, such as Firefox or else. Then report the result with this ?

If you have any other hypothesis to test I'am interested. (Changing microphones, etc.) Yug (talk) 18:23, 27 April 2021 (UTC)

I had the impression (and DSwissK confirmed on Discord) that using Firefox slightly reduces the amount of problems encountered. — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)

Yup, I installed Firefox and could finally send some more audios (me and my daughter), with internal microphone on my laptop. Please review. DSwissK (talk) 00:45, 28 April 2021 (UTC)

@Yug I checked with Andreea_Teodoraa and Natschoba what browser they're using: Chrome and Safari. I asked Andreea_Teodoraa to try Firefox, she did 22 recordings (https://commons.wikimedia.org/wiki/Special:ListFiles?limit=20&user=Andreea+Teodoraa) and 20 are clearly perfect, and 2 (însene and "pe scurt" I feel I hear a problem, but cannot see anything in Audacity). Considering we were on 75% bug on Chrome, this seems to be a move in the right direction. Julien Baley (talk) 02:33, 30 April 2021 (UTC)

@Yug Have tried with another friend (https://commons.wikimedia.org/w/index.php?title=Special:ListFiles&limit=100&user=LangPao) and everything sounds bug-free, both on Chrome and Firefox; Firefox is the most recent 10). Julien Baley (talk) 13:11, 30 April 2021 (UTC)

(Answered below on 15:16, 4 May 2021 Yug (talk) 15:48, 4 May 2021 (UTC))

I think that could raise your interest : same smartphone, same internal microphone, same list (1 word). The only difference is using Chrome and Firefox version. DSwissK (talk) 19:20, 1 May 2021 (UTC)

@Julien Baley & DSwissK thank to you both. The recent A/B testing where only one parameter is changed is what we look for. Testing same users with different browser seems fruitful. Thanks also to Julien for your audacity inspections, our dev will eventually have to dig into that.

@DSwissK, from your 2 example i see mainly a difference in volume (dB). It may be nothing, but when reviewing audios I also noticed that many seemed to be low dB. Could it be that Chrome changed it's default audio recording levels, which increase the presence of noise ? In that cases other projects like Forvo (fake open license) and others should also be affected.

Anyway, if a recent Chrome version was corrupted, maybe we could recommend to use Firefox for a while. Yug (talk) 15:16, 4 May 2021 (UTC)

@Yug there is indeed a difference in volume but the problem is not the noise but the clicks. There is more noise in the Firefox version, but it isn't disturbing. At least, not as much as these clicks... DSwissK (talk) 18:29, 4 May 2021 (UTC)

Is there any chance it is related to the versions of Firefox or Chrome? I guess people upgraded their browser versions in the recent months – if I understand correctly there were a few issues before the OVH fire; perhaps more people upgraded since. (Personnally I hardly hear the issue except when there is a loud click, I don’t have an ear as developed as others here.) Seb35 (talk) 21:05, 4 May 2021 (UTC)

I reinstalled the LinguaRecorder demo on https://lingualibre.org/demo/sandbox.html with the settings identical to the RecordWizard extension (on the gear on the 'Studio' (4th) step and here in the PHP+JS code). You can play with the settings, perhaps there is something to move around the saturation? (You have to click on "Apply new options" then "start" when you change one, and the "ready" counter should be incremented.) Seb35 (talk) 20:54, 4 May 2021 (UTC)

Limiting the number of words to record

@Yug, DSwissK, VIGNERON, Seb35, Pamputt, & Titodutta I think that one important cause of the bugs is related to the RAM. Thus, loading a long list into the Record Wizard results in a maximum amount of bugs in the recordings (the length of this list -- its weight -- may vary, depending on the user's hardware and software).

I think we should try limiting (to 100 or 200 maximum) the possible number of words to be put into the Record Wizard, at least temporarily. There is no point in loading into the RW lists that are 1000-words long; taking a little break during the recording is never wrong, and it could help reducing the amount of bugs for the moment, while we try to find the source of the issue.
Best — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)

We have to test this hypothesis. Yug (talk) 21:35, 27 April 2021 (UTC)

Tested and reporting : I used very small lists (less than 10 words) and still have the same issue. I encounter that bug on my smartphone, both my computers (desktop and laptop) under Chrome (latest version). Using internal or external microphone doesn't change anything. DSwissK (talk) 00:42, 28 April 2021 (UTC)

@DSwissK thank you. This is helpful. Seems clearly software issue. I contacted Wikimedia France and Vion requesting them to jump in.

We need people with audio software skills to inspect those audios and people with JS+audio skills to review the audio input chains. Mr. Vion has both skills. Yug (talk) 10:52, 28 April 2021 (UTC)

I do not think it's RAM related.

Even with 1000 words we are dealing with 1000 words x 7KB per file = 7 MB.

Let's admit the browser stores the words in a very, very details-rich way, so the files are 1000 times heavier. We still are 7GB.

Most computers have 8~16GB of RAM by now.

I also recorded small list and apparently add the issue.

Most (all?) users affected had recorded few dozens words. Worst affected users: Natschoba → 149, Andreea Teodoraa → 247, WikiLucas00 → 64.

All but 3 users this month have recorded less than 300 words. Yug (talk) 11:02, 28 April 2021 (UTC)

Folks, I inspected our Github codes:

RecordWizard MediaWiki extension (php) – some recent non-audio-stream changes.
LinguaRecorderJS – no changes this past year.

I can't find a clear recent change which could have affected our audios recording stream.

@VIGNERON & Seb35 are you aware of any (environmental) change which could have had affected the audio stream of RecordWizard recently ? Yug (talk) 07:57, 29 April 2021 (UTC)

I am still in the process of properly publishing code from the server to Github and Gerrit for the various extensions, but there is indeed no change related to audio.

Specifically the LinguaRecorderJS is very exactly what was installed in 1.31 and in 1.35, no change here (on the server there is only a micro-instruction to register the LinguaRecorderJS in MediaWiki environment)

For the RecordWizard, main changes are maintenance, a technical thing about serialization of Wikibase items, and related to interface (vue.js, which changed from 2.6.11 to 2.6.12, which is mainly a security release).

Seb35 (talk) 19:46, 4 May 2021 (UTC)

@VIGNERON, Seb35, Pamputt, Yug, & Poslovitch
Update: Another user (Le Commissaire) reported an audio bug (on WMFr Discord server). This was not the "click"/"pop" bug, but the speeding-up bug, but the user told that the bug occurred when loading a list of 1000 words into the RW. I suggested him to try loading a shorter list, he tried with 250 words and it worked fine, no issue. This constitutes another clue that RAM is important/long lists are a problem for several users in the RW.
In addition to a potential limitation of the RW to 350 words (for example), see this related ticket:

T276014, Feature request to be able to load parts of lists in RW (only possible for Categories at the moment)

Best — WikiLucas (🖋️) 15:09, 6 May 2021 (UTC)

Worth investigating. I made assumption of 7kB per word, but the audio strean could be completly different from my assumption. Natural path would requires to call back Mr. Vion or User:0x010C to investigate (none currently active), or to dive into LinguaRecorderJS, the navigator's memory, and Ram. Maybe more. Yug (talk) 18:41, 6 May 2021 (UTC)

Review process

Click to see the review process

To review recordings by another user :

Go to Special:RecentChanges > Find recent recordings > Pick an user which is not already in the table below
Open 10~20 of this user's recent recordings > Listen each > Count how many have unusual audio artifacts
Add this user to the table below with its associated results and your comment
If you feel necessary, please notify the user on Lili (ex msg) and ping the user on Commons (ex ping)

To be reviewed :

With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.

To be reviewed, recording with another browser or device :

With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.
Add some information so we know which of your recording are associated with this alternative browser or device.

Review-ready

I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : ~~all good for me, no review needed~~. Yug (talk) 14:35, 27 April 2021 (UTC)

@Yug Could you try 20 more with an up-to-date version of Chrome? — WikiLucas (🖋️) 18:38, 27 April 2021 (UTC)

@WikiLucas00 Done. I'am not sure, but I may have the bug as well. Yug (talk) 19:42, 27 April 2021 (UTC)

@Yug The majority of your last recordings contain at least a click. — WikiLucas (🖋️) 19:56, 27 April 2021 (UTC)

Samples

Under investigation: Some contributors experience parasitic saturation (“Pock!”) or dust while other don't.

Please review your recent recordings and help expand table below so we can identify a recurring pattern among affected contributors vs non-affected ones.

	Username	# reviewed	% affected	Example file	Web Browser + version	Comment
c	User:DSwissK	15	33% (5)			New echo bug?
c	User:Natschoba	20	95% (19)			Several thousands of recordings before. No hardware change.
c	User:Andreea Teodoraa	11	75% (8)			Several thousands of recordings before. Tried different mics and platforms, same behaviour.
c	User:GeoMechain	15	0% (0)
c	User:ClasseNoes	15	0% (0)
c	User:Hsarrazin	14	30% (4)
c	User:ᱥᱟᱹᱜᱩᱱ ᱗	2	100% (2)			Only 2 audios.
c	User:Zoyahssn	2	100% (2)	File:LL-Q1860 (eng)-Md Anan Islam (Zoyahssn)-Md Anan Islam.wav		Suspects: Hardware & sound setting issue
c	User:Olaf	15	0% (0)	—		All recent recordings ok. (I have these clicks in every recording session, but I remove all such occurrences during the review phase. Only because of this it's 0%.Olaf (talk) 23:44, 1 May 2021 (UTC)))
c	User:WikiLucas00	60	75% (45)		Brave 1.23.73 (Chromium: 90.0.4430.85)	See my 2021-04-26 10pm CEST series
c	User:WikiLucas00	300	0% (0)	All files are OK	Firefox 88.0.1, External microphone	Perfectly fine. See my 2021-05-06 9am CEST series
c	User:Le Commissaire	??	?% (?)		Opera, Desktop Computer, External microphone	Speed-up bug occurred when loading a 1000-words-long list into RW. Tried with loading only 250 words and recording again, went fine.

Publish on Wikimedia Commons

Hello, I just tested, but my records are not published on Commons. My tests: on Firefox, then on Chrome, with 50, then with 1 expression (s), with license CC3.0-BY-SA and CC1.0. —Eihel (talk) 06:51, 2 May 2021 (UTC)

Problème de publication sur Wikimedia Commons

phab:T281636 —Eihel (talk) 07:10, 2 May 2021 (UTC)

Usually I have the same with the first two recordings in a session. Then I can upload them again at the end. Try again with more recordings, and using "retry filed upload" button. Poemat (talk) 08:07, 2 May 2021 (UTC)

Yup, I had this bug many times. (I say "had" because I don't remember having encountered it after the fire incident.) Just don't give up and it should be published eventually. DSwissK (talk) 11:56, 2 May 2021 (UTC)

(As of 3 May 2021 and as I checked, I'm not aware of any code changes (history) which may have of affected this. Seb35 made some other code change this same day.) Yug (talk) 09:47, 3 May 2021 (UTC)

I add a user who has the same problem: Le Commissaire. —Eihel-LiLi (talk) 15:33, 6 May 2021 (UTC)

Bonjour @Seb35 , Faudrait voir avec Le Commissaire si le problème persiste aussi (avant de clore le ticket Phab. Sincères salutations. —Eihel (talk) 10:01, 4 June 2021 (UTC)

J’ai mis un message à Le Commissaire sur sa page de discussion.

Le problème que vous avez eu était spécifique à votre compte, c’est peut-être arrivé à d’autres personnes mais ça semble assez rare. Aussi, à partir du moment où un utilisateur a réussi à faire un envoi vers Commons, alors c’est un problème différent du vôtre (celui-ci, qui ressemble mais l’erreur est intermittente). Plus globalement, il faudrait que le message d’erreur soit explicite plutôt que d’aller à chercher dans la console du navigateur, je vais ouvrir un ticket Phabricator en ce sens. Seb35 (talk) 10:28, 4 June 2021 (UTC)

Translation admins

I updated this ticket, explaining our need of translation admins. I'm espacially thinking of Sabelöga and Eihel, who have the skills and the needs to get this rights (e.g. here).
If the community agrees, we can ask the developper team currently working on the project to implement this new status into Lingua Libre, and we will then be able to elect new translation admins on LiLi. You can vote by using {{Support}} or {{Oppose}}.
All the best, — WikiLucas (🖋️) 12:21, 4 May 2021 (UTC)

Hello WikiLucas, Especially since the tvar translation variables have just changed. —Eihel-LiLi (talk) 16:32, 5 May 2021 (UTC)

UPDATE: Translation admins should now "exist" on Lingua Libre. See [T262855] Implement new user rights. --Poslovitch (talk) 19:35, 3 July 2021 (UTC)

Vote

Support (proposer) — WikiLucas (🖋️)
Support We are are early stage for the communnity, having 3 active referents for any given administrative task is required (see also en:Bus factor). It is also necessary to document process as we see them appears, in a concise therefore maintainable way. Yug (talk) 15:09, 4 May 2021 (UTC)
In this project, the rights associated (example: pagetranslation) with translation administrators are already contained in the administrators. In addition, an administrator can self-grant the right without going through a formal request (on any WM). I therefore think that we are far from the indispensable (wo)man (especially after Strasbourg IMHO). Also, if I want to continue on this project and following the previous section… —Eihel-LiLi (talk) 16:29, 5 May 2021 (UTC)
@Eihel-LiLi "Active" [and skilled] is an important word. I'm admin but not active on translations pages. We have about 4 admins truly active this past 6 months, AFAIK only WikiLucas was admin while truly active [and skilled] on pagetranslation. Adding 2+ more is required. Seems on the way. Yug (talk) 09:59, 6 May 2021 (UTC)
And Pamputt too (already TA on WD for example). Cordially. —Eihel-LiLi (talk) 15:14, 6 May 2021 (UTC)
Support Agree to ask for this new status. Pamputt (talk) 15:46, 4 May 2021 (UTC)
Support Agreed. DSwissK (talk) 18:31, 4 May 2021 (UTC)
Weak support —Eihel-LiLi (talk) 15:49, 6 May 2021 (UTC)
Support J’ai confiance. Lyokoï (talk) 17:57, 10 May 2021 (UTC)
Support I'm up for it! --Sabelöga (talk) 18:53, 19 May 2021 (UTC)

Discussion

I'd rather see Titodutta. —Eihel-LiLi (talk) 01:20, 6 May 2021 (UTC)

@Eihel-LiLi Titodutta is already an admin on LiLi, which means he has the pagetranslation right. Implementing this translation admin status would allow us to grant some users the pagetranslation right without granting them all admin rights (like the right to delete pages or block users for instance). — WikiLucas (🖋️) 07:31, 6 May 2021 (UTC)

Ah OK. I took the most prolific users, but I remembered that you and Pamputt are TAs… —Eihel-LiLi (talk) 15:04, 6 May 2021 (UTC)

Browsing the sound library

Nicolas NALLET is currently working on the page that will display the recordings of Lingua Libre, and would like to know the list of filters that we would like to use on this page (e.g. by language, by speaker, by date...)

Feel free to suggest other filters or give your opinion on suggested filters 🙂 — WikiLucas (🖋️) 12:58, 20 May 2021 (UTC)
(pinging @Yug, Pamputt, & Titodutta — WikiLucas (🖋️) 15:48, 20 May 2021 (UTC))

Great news!

The most obvious ones are, I guess, the following:

by language
by speaker
by speaker's language proficiency (beginner, etc.)
by genre (male, female, etc.)

--Poslovitch (talk) 13:38, 20 May 2021 (UTC)

Hello WikiLucas00 and Poslovitch
- by cat (deepcat, incategory)
- by coord (nearcoord, boost-nearcoord)
- by link (linksto)

The codes in parentheses are those of CirrusSearch, an extension that can be added to LiLi. Poslovitch's proposals also have filters contained in WikibaseCirrusSearch (haswbstatement). Tell me what you think of this. Cordially. —Eihel (talk) 20:36, 20 May 2021 (UTC)

@Eihel could you describe a bit how do you imagine this would work? (since the recordings on Lingua Libre don't have cat or coord at all, and could have link but I couldn't find any examples, I'm a bit confused and would like to know more). Same question for CirrusSearch, we could look into it to see if it can be installed, but what use do you see for it? (the only use I know is for WikibaseCirrusSearch). Cheers, VIGNERON (talk) 14:42, 26 May 2021 (UTC)

Code on github please. You may check Forvo and Codepen to find elegant html5 audio element and css. Yug (talk) 22:00, 26 May 2021 (UTC)

Hello @VIGNERON , The WikibaseCirrusSearch extension requires the installation of the CirrusSearch extension. This means that it does not change much. It is true that my proposals are not very Catholic, but this project will evolve over time. To begin with, this page contains a cat (not all LiLi TPs contain a cat, this should be corrected). However, since you want an example, here is one (the TPs where we both participated with insource). Best regards. —Eihel (talk) 09:54, 4 June 2021 (UTC)

For example, the lists - which are the way to correctly make a significant number of records - were already numerous before Strasbourg. Now only one language letter appears (a). A search on its history for its own lists is possible knowing how they were recorded. But for example, if I want the lists in French in a search, "List:Fra" is not sufficient, because we only get a part. In the future, categories should be created for lists: by user, by language, by set (from the same record session) and by subject (fruit, animals, etc.). Otherwise it will quickly be insurmountable from a moment. Cordially. —Eihel (talk) 14:04, 4 June 2021 (UTC)

Plans for the next armageddon?

Are there any contingency plans implemented after the Big Fire? A regular backup for example? Poemat (talk) 22:49, 24 May 2021 (UTC)

@Poemat good question, thanks for asking. There is obviously some plans. I'll let @Seb35, Nicolas NALLET, & Michael Barbereau WMFr complete and/or correct me but right now, there is daily backups on a server in an other datacenter. Cheers, VIGNERON (talk) 12:47, 26 May 2021 (UTC)

Request for Mon language Code= mnw

Done
Do not have Mon language for this so I added Thai language I would like to have this problem resolved thanks. message posted by User:咽頭べさ (talk)

Hello again @咽頭べさ thank you for pointing out that Mon language was missing on Lingua Libre! I added it, you should from now on be able to record words in this language 🙂 Please read the message I posted on your talk page before recording new words.

All the best, — WikiLucas (🖋️) 16:40, 27 May 2021 (UTC)

Celebrating the coming 500k milestone

Hello @DenisdeShawi, DSwissK, Eihel-LiLi, Julien Baley, KlaudiuMihaila, Lepticed7, Lyokoï, Olaf, Pamputt, Poemat, Poslovitch, Sabelöga, Theklan, Titodutta, Yug, & सुबोध कुलकर्णी

As you may have seen, we recorded 30,000 pronunciations during the current month (2nd most active month ever), the very first full calendar month since the rebirth of the website, after the datacenter fire that stalled the project for 6 weeks. If we keep a similar pace, we should reach in June the important milestone of 500,000 recordings made on Lingua Libre. That is incredible.

I wanted to ask you all, how do you want to celebrate this milestone? Feel free to suggest anything below, and let's try to celebrate it properly 🙂

All the best
— WikiLucas (🖋️) 14:33, 27 May 2021 (UTC)

Hi there, I remember registering numbers up to 1399 in French (c:File:LL-Q150 (fra)-Poslovitch-1399.wav). I abide to get that number up to 4242 once we reach that milestone ! --Poslovitch (talk) 18:18, 27 May 2021 (UTC)

Some kind of reward would be nice, like a star for the home-pedia user page. Or a physical sticker sent by post, similar to what Wikimedia does from time to time. Or an online event of sorts. KlaudiuMihaila (talk) 16:45, 29 May 2021 (UTC)

We gather and make an apéro. Lepticed7 (talk) 16:54, 29 May 2021 (UTC)

Maybe an online event is the simple to do actualy. What did you think about a Live on Twitch with some guests about Lingua Libre, its history, how people made some very big recording session, how its help describe language, etc… ? Lyokoï (talk) 10:22, 1 June 2021 (UTC)

It's possible to have some budget for celebrating :)Xenophôn(talk) 08:54, 8 June 2021 (UTC)

Failed to upload on Wiki Commons

Hi, I am an editor from Central Bikol Wiktionary. I have tried to record words and it went through. But it has failed to be uploaded on commons. I think it's the second time to happen. This was only after the Lingua Libre has came back. My internet connection is stable so I guess there might be some internal problems. I hope not. Kunokuno (talk) 14:58, 28 May 2021 (UTC)

Hello @Kunokuno I'm truly sorry that this problem occurred, thanks for warning us about it.

Could you please tell us your current setup (device, browser, microphone)? How many words did you record? Could you try to reproduce the bug with 10 words, and then look at your browser's console (instructions here) to tell us the error message if there is one?

Thank you in advance.

All the best. — WikiLucas (🖋️) 16:21, 28 May 2021 (UTC)

Hello @Kunokuno ,

Did you retried and do you stil have the same problem? (there has been some fixes recently, it shouldn't happen anymore but I want to make sure everything is correct right now).

Cheers, VIGNERON (talk) 08:41, 4 June 2021 (UTC)

Lingua libre error

Hello everyone, sorry for the late response. My records are still not getting through to commons. The record was successful, but it cannot be upload on commons. My device is a intel core i5 laptop, browser is google chrome, and I'm using a headset with a built in microphone. I have also tried recording on my phone but it has the same error. I have tried doing the screenshot for the error message, if there's any. Please check here. Sorry, I am not quite knowledgeable on the codes and programming languages. Kunokuno (talk) 13:53, 18 June 2021 (UTC)

500000!

Lili reached 500 000 recordings. Congratulations to everybody! Olaf (talk) 12:56, 15 June 2021 (UTC)

Congrat dear all speakers! It’s unbelievable! \o/ Lyokoï (talk) 23:30, 15 June 2021 (UTC)

Indeed, congratulations to all of you, let us go to the million o_O. Pamputt (talk) 16:57, 16 June 2021 (UTC)

Lingua Libre video tutorial

Hi everyone! I made a short video tutorial for Lingua Libre, in French. If you like it, I could create one in English and we could include it in the {{Welcome}} template, to help newcomers.
Here is the video, please tell me your thoughts about it! also available here on YouTube

Lingua Libre tutorial in French

All the best — WikiLucas (🖋️) 10:04, 23 June 2021 (UTC)

I really like it. It is not too long, very clear, etc. So I think it would be a good idea to create one in English. Few remarks:

if you create one video in English, is it possible just to make the movie with the interface in English and then to create the text as subtitle (Wikimedia Commons supports subtitles), so that it would be easy to translate the subtitles in several languages (remain the problem of the interface itself in English).
on Wikimedia Commons, I think you should write what music is used in the video and where does it come from in order to be sure it is a free-licence music

Very nice job. Pamputt (talk) 18:56, 23 June 2021 (UTC)

Thank you @Pamputt ) ! I think I will indeed make a video with the interface in English, with no built-in subtitles as you suggested, and we will then be able to add TimedText subtitles on Commons. I think I'll also make a version with built-in substitles (so basically the same video as here but with everything in English), in order to have a cleaner English version to be post and share on YouTube.

EDIT: I added English subtitles on the French video, to test the functionality, it seems to work well!

Thank you for your remark about the music, I added the information on the file's description.

See you! 🙂 — WikiLucas (🖋️) 10:08, 24 June 2021 (UTC)

Auto-inserting recorded words to Wiktionary

Hi, I am back after a long hiatus! :) I wanted to ask about auto-inserting recorded words to Wiktionary. Is it possible to automatically insert recorded files into the respective Wiktionary entries if I had imported those words from a specific Wiktionary category? For instance, I did a test batch today from "ଶ୍ରେଣୀ:ବାଲେଶ୍ୱରୀ_ଶବ୍ଦ" from the Odia Wiktionary. The uploaded words do appear on Commons but I need to manually add each recording. Is there a way to automate that?

My second question is something that I had asked long back - is there a way to change (or choose from two options) the filename. For instance, I would like to use the Commmons convention of "TWO_LETTER_ISO_CODE-WORDNAME.EXTENSION" format (e.g. "or-କଳା.wav"). If there is already a file that exists, then the new file can be "or-କଳା-01.wav". In that way, viewing the words in the Commons category would be easier meaning "or-କଳା.wav" and "or-କଳା-01.wav" will appear close to each other. One can even check which of the recordings is better to use on Wikimedia projects. In the backend you can of course connect the files to your Wikibase by providing unique IDs to each recording.

Hugs of solidarity for your grave loss because of the fire! With everything going on with COVID last and this year, this was horrible! <3 --Subhashish (talk) 14:45, 24 June 2021 (UTC)

Hi @Psubhashish I'm Lingua Libre Bot's operator. It cannot operate on Wiktionaries on which it has not received the bot flag. Feel free to file a request on LinguaLibre:Bot. I'm falling behind with the various currently pending requests since I've been the handyman of Lingua Libre on and off, but at some point I'll be able to tackle these ;) --Poslovitch (talk) 15:23, 24 June 2021 (UTC)

Hello all! @Poslovitch done, please let me know if there is anything that I could do.

File name

Hi Psubhashish regarding the second question about the filename, it has been decided to have only one record by word and by locutor. This means that if you record again the same word, the previous record will be replaced by the new one. Thus, it is possible to correct a bad/wrong pronunciation. Why would you like to record two times the same word? Pamputt (talk) 17:50, 24 June 2021 (UTC)

@Psubhashish & Pamputt I think that Psubhashish is refering to the historical naming convention (NB there is no actual naming convention on Commons, it is merely some advice for naming files) of pronunciation files on Commons (see here), that was unchanged since 2005 and clearly insufficient. This page was suggesting just to put a 2-letter language code and the pronunced word in the filename, which was problematic as soon as another speaker pronounced the same word (that's why they suggested to add a number if it was the case). I changed this page recently, to advice users to display in the filename at least the language spoken (iso 639-3 if possible), the word written in the language's writing system, an identifier for the user, and a place related to the speaker (the place where they learned the language and/or where they live). Lingua Libre's automatic naming already does that, except for the place of learning/residence (which is for the moment only available on the speaker's element, on Lingua Libre). @Psubhashish I don't understand why you would want to change your filenames for some more reductive ones. The more precise the filename is, the better it is to know information about the speaker! And it is still very easy to search for a precise word, you just have to type the word+.wav in Commons, or the word itself directly in Lingua Libre's searchbar.

All the best — WikiLucas (🖋️) 18:48, 24 June 2021 (UTC)

@Pamputt and @WikiLucas00 I didn't actually mean to create ambiguous filenames based on the older convention. I was worried for the multiple kinds of naming inside the category c:commons:Odia pronunciation. The way the files are organized there are or-NAME.extension (e.g. File:Or-ଅନ୍ୟ.wav). What I am proposing is slightly different than how you want to capture the information in the file. I am all for metadata being captured inside the description. In fact, I'd support to add a field to describe the ISO 639-2/639-3 three-letter-codes (e.g. Ori-nor-ଏଇଚି.wav). There is currently no link to the Lingua Libre QID and I'd propose to add that too.

What I was proposing was not to reduce information collected but simplifying the filename. We're struggling at the moment to use a bot, find and search and insert a file from Commons into a Wiktionary entry. I'd love to hear from you all what the issue would be if the file descriptions template ({{Lingua Libre record}} contains information such as language name, language ISO (including variation), language Glottocode (which linguists prefer because ISO is faulty. ref. requirements by language archives such as Living Tongues, ELR and Language Archive Cologne), and information about speaker's age range, gender and region (as dialects also vary from region to region, optional field as this is personal data).

The filename, however, can be simpler as using a bot to search for duplicates is hard now for the community because the QID and username are included in the filename. What if all that information, as I explained above, are included in the information below in the template and the file name can be the ISO 639-1 (for standard spoken forms or macrolanguages) or ISO 639-2 or 639-3 (for dialects/variations)? As I had explained in my previous comment, nor-NAME.wav and nor-NAME-01.wav will appear close to eachother because of alphabetical sorting. An average user without the knowledge of bots can even manually test the quality of recordings if they are using files on different Wikimedia projects. Can at least this be piloted for one language? --Subhashish (talk) 02:12, 25 June 2021 (UTC)

I have created a sub-section just to make clearer the discussion. I am completely lost. Currently, the files created on Lingua LIbre are all named such as File:LL-Q33810_(ori)-Psubhashish-ଫସ୍କା.wav, which mean File:LL-QID (LANGUAGE_CODE)-(LOCUTOR NAME)-WORD.wav, with QID the identifier of the language on Wikidata~~identifier of the recording on Lingua Libre~~, LANGUAGE CODE can be either two or three letters (ISO 639-3) if there is no 2 letters code for the language, LOCUTOR NAME, the name of the person who record on Lingua Libre and WORD the word that has been recorded. So could you give us an example pointing to a file that has not a suitable name from your point of view? I think it will help me to get your point. Pamputt (talk) 06:13, 25 June 2021 (UTC)

@Pamputt Watch out. The QID is not the recording's, it's the language's Wikidata QID ;) --Poslovitch (talk) 08:13, 25 June 2021 (UTC)

Indeed :) Thank you, I correct in my previous message. Pamputt (talk) 08:34, 25 June 2021 (UTC)

Hello all, what meant to say is I understand that you have a convention for LL. But I personally do not want my username of the QID of the language or too many signs or even blank spaces. All of that are a problem when it comes to a few thousand recordings by multiple authors where the same word recorded by different people do not even appear close to each other in a sorted list. As I had written earlier, metadata can be better captured in a more formatted way inside Commons and you're capturing it even better inside the Wikibase of LL. The question is whether the file name should have all the metdata or can it have even the most essential metadata. The username is irrelevant in a filename. If I click a picture of the Eiffel Tower or the Taj Mahal, my username appearing in the filename can only indicate a copyright owner pride. :D QID is a Wikimedian's paradise. It makes no sense to a common user. Entries on Commons are not just for use by Wikimedians but for the larger public. An ISO code (or a Glottolog ID) does this job (though one can argue that not all the people understand ISO codes). The three letter ISO code would address the language-dialect-variation in most cases. The word itself in the preferred script is self explanatory. All the metadata can be included inside the page using the LL template. I do not understand the insistence on adding additional info (QID and username). Also, just curious what really is the issue with ISO-FILENAME.EXTN (ori-କ.wav) for the first occurrence and ISO-FILENAME.EXTN (ori-କ-1.wav) for the second occurrence and so on? --Subhashish (talk) 09:19, 25 June 2021 (UTC)

Not all the languages we allow to record on LinguaLibre have an ISO code. That's why the QID is useful. --Poslovitch (talk) 09:33, 25 June 2021 (UTC)

@Psubhashish , Poslovitch replied about the QID. About the username, the goal is to ensure that there is only one record per speaker. With such name, if you record twice the same words, only the lastest record will remain. It is very useful if you want to correct a wrong/bad pronunciation because the preivous recording is automatically replaced by the new one. Thus, no need for the user to ask for a deletion of the previous file on Wikimedia Commons.

That's said, I do not see the benefits to shorten the filename name. If you are looking for a given word, using the search engine on Wikimedia Commons should find the recordings. If you are interested by mass import, so Lingua Libre Bot is probably the tool you are looking for. If you want to do it by yourself, there are already some Python codes (other that LLBot) that do this job. See for example this code that is used on the French Wiktionary. Pamputt (talk) 11:09, 25 June 2021 (UTC)

@Pamputt thanks for sharing this. I share the same concern with you when it comes to ISO and had shared about Glottolog ID. Glottolog ID is something field researchers as Gregory Anderson (Living Tongues) or organizations such as LAC and ELP use. But apart from Glottolog being used by field researchers, the classification is indeed really detailed. Does using QID solve any particular issue? I am yet to have explore the LL Bot but have made a request. BTW can the LL Bot be used for inserting files that are already there on a Commons folder? You still didn't share why the ISO-FILENAME.EXTN and ISO-FILENAME-01.EXTN option is a bad one and why "LL-QID (ISO)-USERNAME-NAME.wav" is preferred over the former for the languages with ISO standards. Also, have you considered the need for the same word being recorded multiple times by someone who speaks in different accents or there is a need for different intonations/moods? A word might be written the same way in a particular writing system but there are often aforementioned needs. If a new recording overwrites an existing one, many might accidentally overwrite audio files that are needed. --Subhashish (talk) 14:34, 25 June 2021 (UTC)

@Psubhashish I let Poslovitch answer concerning LLBot.

Does using QID solve any particular issue?

Using QID allow us to be able to record any language/dialect even those that would not be yet available in Glottolog. In addition, we are sure that the QID is stable and will not change in the future.

You still didn't share why the ISO-FILENAME.EXTN and ISO-FILENAME-01.EXTN option is a bad one and why "LL-QID (ISO)-USERNAME-NAME.wav" is preferred over the former for the languages with ISO standards.

This is what I tried to explain in the previous message. This is used to manage double recording and to correct bad pronuncitation files easily. If we use "ISO-FILENAME.EXTN", it is not linked to a locutor and so it means several files can be created by the same locutor, and the "bad" files will be kept. A name such as "LL-QID (ISO)-USERNAME-NAME.wav" solves this problem (maybe "LL" is not needed but it is only two letters). In addition, how you would record word from dialects or languages that do not have ISO codes if we use something like "ISO-FILENAME.EXTN"?

Also, have you considered the need for the same word being recorded multiple times by someone who speaks in different accents or there is a need for different intonations/moods? A word might be written the same way in a particular writing system but there are often aforementioned needs. If a new recording overwrites an existing one, many might accidentally overwrite audio files that are needed.

This are really rare cases. If a user wants to record himself/herself with several accents, probably most of the recordings will not be "natural", which mean the audio files will be poor quality for reusing. That's said, there is a way to manage words that spell the same but have differents pronunciations. In such cases, it is possible to add in bracket a precision about the word we want to record. For example in French, we have File:LL-Q150 (fra)-0x010C-fils (pluriel de fil).wav (fils (plural of fil)) and File:LL-Q150 (fra)-0x010C-fils (enfant).wav (fils (child)). So that, using the bracket, we are sure about the user intent Pamputt (talk) 16:53, 25 June 2021 (UTC)

LinguaLibreBot pour le Wiktionnaire en Chaoui

Bonjour, je veux relancer la discussion pour permettre le bot LinguaLibre d'ajouter des audio sur shy.wiktionary.org . je suis le seul admin de ce projet. je peux vous aider pour l'algorithme des pages, si vous n'êtes pas contre. merci d'avance.--Reda Kerbouche (talk) 12:48, 2 July 2021 (UTC)

Salut Reda Kerbouche : on est train de réfléchir comment prendre en charge au mieux les différents Wiktionnaires. Si vais essayer de me motiver pour proposer quelque chose durant l'été. Pamputt (talk) 12:54, 2 July 2021 (UTC)

Bonjour Reda ! Je suis le dresseur du bot. Il faut remplir le formulaire dans LL:Bot pour que l'on ai les renseignements de base. Et d'ici à la fin de l'été, nous devrions avoir quelque chose de fonctionnel. --Poslovitch (talk) 14:02, 2 July 2021 (UTC)