LinguaLibre

Difference between revisions of "Chat room"

Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.

Line 636: Line 636:
 
# Add some information so we know which of your recording are associated with this alternative browser or device.
 
# Add some information so we know which of your recording are associated with this alternative browser or device.
  
=== Reviews ===
+
=== Review-ready ===
 
* I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : <s>all good for me, no review needed</s>. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 14:35, 27 April 2021 (UTC)
 
* I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : <s>all good for me, no review needed</s>. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 14:35, 27 April 2021 (UTC)
 
::{{ping|Yug}} Could you try 20 more with an up-to-date version of Chrome? — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 18:38, 27 April 2021 (UTC)
 
::{{ping|Yug}} Could you try 20 more with an up-to-date version of Chrome? — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 18:38, 27 April 2021 (UTC)

Revision as of 11:12, 28 April 2021

Chat rooms in various languages:
English · 🌐

Chatroom FAQ

  • How to add missing languages ?
    • Administrators can add new languages, they do so within few days. For users, please provide your language's iso-639-3 code + link to the en.wikipedia.org's article. Optional infos are the common English name and wikidata IQ. For more, see Help:Add a new language.
  • How to keep my wikimedia project up to date ?
  • What IRL event.s are coming ? When ? Where ?
  • How to archive sections which have been answered ?
    • After reviewing the section, add '{{done}} -- can be closed ~~~~' to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Archives

Datasets out of date

Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)

Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)

About the exclusion of already recorded words

Hi, I think the option to exclude words that I have already recorded is broken. This morning, I start a recording session and LL proposes me words that I registered two days ago. For example, I already registered Belorusino two days ago, but it does not disappear when I click exclude words already recorded. And notice the two versions of the file, which I already re-recorded it. Can someone fix this? Lepticed7 (talk) 10:07, 15 November 2020 (UTC)

I have opened a Phabricator ticket. It may be fixed in the coming months but not sure. Pamputt (talk) 20:05, 15 November 2020 (UTC)

Issue with the Main page

This section should be moved to LinguaLibre:Technical board.

Check-green.svg Done

Hi, the main page here uses a sollution that requires the MediaWiki:Lang and its subpages to be populated, which they aren't which makes the main page not switch languages even if there is a translation available in that language and the language has been set. Could someone look into this if it's possibly to rework the structure or maybe somehow import the MediaWiki:Lang subpages? --Sabelöga (talk) 17:40, 16 January 2021 (UTC)

Hello @Sabelöga thank you very much for this remark! I just imported the MediaWiki:Lang subpages from Meta, and it seems to be working as of now 🙂 All the best. — WikiLucas (🖋️) 00:09, 17 January 2021 (UTC)
That's excellent, that you so much for amending this, regards --Sabelöga (talk) 21:59, 18 January 2021 (UTC)
Thanks a lot! The main page looks fine --Higa4 (talk) 04:34, 19 January 2021 (UTC)

Images missing

Check-green.svg Done

This section should be moved to LinguaLibre:Technical board.

Images on Help:Add_a_new_language show up as missing on my end. Have they moved or is this some error? --Sabelöga (talk) 21:08, 19 January 2021 (UTC)

Hello @Sabelöga they are not actually missing (for example, I can see them on the page you are talking about), but I also have experienced similar issues on the website. Some images seem to randomly disappear for no reason, and to come back after a while, without any modification on the page. We will talk about it during the next team meeting. — WikiLucas (🖋️) 21:19, 19 January 2021 (UTC)
Actually, some images are really missing in the section "I know what I'm doing". This comes from the fact all images have been lost when the website have migrated to the new design. I had opened a ticket about that but I think we will never find them back. So we should create new screenshot when we discover such missing image. I will try to do for the page you've mentioned. Pamputt (talk) 23:06, 19 January 2021 (UTC)
Yes, those were the one I were talking about. --Sabelöga (talk) 16:56, 23 January 2021 (UTC)

Translation error?

The translation units on Help:Configure_your_microphone does not align properly with each other. What I mean is that the translation units include several section when the software should just pick the one for each unit. I removed the __TOC__ from the page since the TOC will appear anyway. So could a translation adminstrator mark the page for translation again and let's see if that solves the issue. --Sabelöga (talk) 16:56, 23 January 2021 (UTC)

Check-green.svg DoneWikiLucas (🖋️) 22:21, 23 January 2021 (UTC)

RecordWizard drops syllable

Check-green.svg Done -- can be archived. Yug (talk) 16:03, 27 January 2021 (UTC)

Prior to several days ago I was recording word pronunciations without problems, but now I can't. Several days ago RecordWizard started removing certain sounds from my voice, most frequently syllables "s" and "f". If I spell a word, for example "syllable", it records it as "yllable", just like if I never spelled "s". If I spell "sophicated", it records "soicated".

This sort of behaviour is not present at the test sound stage (the first thing RecordWizard asks user to do), it captures my speech perfecty. However the very same issue is present on different devices with different operation systems, different browsers and different microphones.

My guess is that maybe some changes to noise recognition were deployed several days ago, and it now misinterprets those syllables as background noice. Anyway, I will be grateful for suggestions on how to fix this issue. --Tohaomg (talk) 07:43, 26 January 2021 (UTC)

Hi Tohaomg, not easy to say what happens here. I am pretty sure nothing change at the backend since several months. With your examples "syllable" and "sophicated", does it happen every time you try to pronounce these words or does it happen randomly? In the first case, can other contributors try to record these words and see if the problem occurs for them as well? Myself, I just tried and I did not see this problem. Pamputt (talk) 21:39, 26 January 2021 (UTC)
I am not trying to record exactly those words, they are just examples to show you what I mean. I am actually trying to record words in Ukrainian language. When I try to record words, in some 3 cases out of 4 syllables are dropped, and in 1 out of 4 they are not, so I need to do in average 4 attempts to record a word. And this problem appeared abruptly several days ago, everything worked fine before. --Tohaomg (talk) 08:09, 27 January 2021 (UTC)
Could it come from your microphone? Did you try to record with other hardware for a test? Pamputt (talk) 10:36, 27 January 2021 (UTC)
Yes, it happens on different devices. --Tohaomg (talk) 11:59, 27 January 2021 (UTC)

Solved it. Turns out, this effect is present only when loading lists longer than several hundred words. My next theory is that it was due to some sort of RAM shortage. Thank you for your time. --Tohaomg (talk) 13:06, 27 January 2021 (UTC)

This bug stays weird...
Anyway, thanks Tohaomg for your audios <3 Yug (talk) 16:03, 27 January 2021 (UTC)
Happy to see that you've found a workaround. Indeed, what you guess could be the reason of the problem because the server is currently not very robust. It should evolve in the coming weeks/months. Pamputt (talk) 20:04, 27 January 2021 (UTC)

Technical > Github Winter 20-21 review

Following October 2020's 0x010C's departure we've reviewed the human needs for maintenance of various technical subprojects. Thanks to 4 months community effort things are in better position now :

  • Definitions: All repositories are now well defined via a clean, one sentence descriptor. It maps sub-projects, so new volunteers know quickly what repository does what. See github.com/lingua-libre.
  • Mentors: 2/3 of repositories now have a volunteer referee-mentor with "correct" understanding, able to discuss the repository, guide new comers.
  • Documentations: Most repositories are "correctly" documented via an existing readme.md. Improvement always welcome.
  • Web servers: Wikimedia France hired a new Sysops, which guide and team up with volunteers on the server issues. Welcome to WMFR's MickeyBarber/Michael.
  • Maintenance: Wikimedia France is reviewing freelance candidates for deeper mediawiki and recordwizar coding support. Thanks to Adelaide & WMFR's team.
  • Globalization: Wikimedia France has plan to expand volunteership toward India. Thanks to Adelaide & WMFR's team.

All pretty positive. Pamputt, Jitrixis, Poslovitch, Adelaide, Mikey and myself pushed forward on these fronts.

Still ! The following repositories are currently leaderless and contributorless:

  • LinguaRecorder: Powerfull JS library to manage audio recording : intelligent cutting with regular padding, saturation control, various export options,...
  • RecordWizard: MediaWiki extension allowing mass recording of clean, well cut, well named pronunciation files.
  • QueryViz: MediaWiki extension adding a <query> tag to display sparql queries results inside wiki pages

LinguaLibre Bot is under review by Poslovitch but may gain from some more love. LinguaLibre Bot it's the most impactful yet underused piece of our sub-projects since it needs to be authorized per target language (ex: add audios to tamil wikipedia articles) and is only authorized for few languages & wiki :

  • Lingua-Libre-Bot: Mediawiki bot facilitating the resuse of Lingua Libre's audio records on many wikis, including wikipedias and wiktionaries.

Satelite linguistic project :

  • SignIt: LinguaLibre SignIt is a web-browser extension which translates a word in Sign Language, in order to learn sign language while reading online.

The end ! Thanks to all those who helped and are joining in :) Yug (talk) 13:11, 1 February 2021 (UTC)

LinguaLibre International call (France-India-others)

Check-green.svg Done -- Please refer to LinguaLibre:Events#2021_International_call. Yug (talk) 11:46, 12 February 2021 (UTC)

Namaskara/Hello,
Earlier we noted that we started getting more participation from India (I am from India as well). In October last year when I had around 15,000 uploads, I then contacted Wikimedia France with the following idea, what I wrote in the email then:

I believe in India, as we have many languages and dialects, the tool is specially relevant. I wanted to have a discussion with the people working on the project, or can help with this idea.

This email was followed by a call with Adelaide and most possibly Lyokoi joined. Adelaide kindly invited me to attend another call, where I could briefly meet many of you.
Now, I know, there is more interest from India (different languages). You might have seen some work in Marathi very recently. Just two days ago I attended a brief India (Maharashtra) LinguaLibre meeting. I did not expect to see so many participants, but it looks like around 10 or so people are interested to record Marathi pronunciation.

So, is it possible to have a France-India call? It is absolutely OK for me to make it an "international call", so that everyone can join. Here we can have some of the people from India (any country) who can tell their plans, ask questions, get to know from you, or share experience. There might be ideas and questions related to setting up the project page etc. Around 5 or more people from India will be interested to join, I think.

PS: Most possibly there are LinguaLibre calls arranged. Adelaide kindly invited me to two such calls. Otherwise, I do not get to know about these calls. If these calls are open and anyone can join, possibly can we announce the call dates and time on this Project Chat, so that anyone interested/eligible can join?

Regards. --টিটো দত্ত (Titodutta) (কথা) 20:03, 1 February 2021 (UTC)

Hello Titodutta,
Nice to hear that news of a 10 people Lingualibre workshop in India's Marathi community. This is wonderful.
Adelaide is definitively the person to contact for institutional relationship and workshops. She is coordinating-piloting-animating this project for Wikimedia France and knows who is who, where the human resources are, what is our wish-list and next moves.
I'am interested by this call as well. Santosh, an Indian wikimedian contacted us as well (via email) for a similar need.
I will send you an email to group us all. Yug (talk) 09:41, 2 February 2021 (UTC)
Yes, it would be good to have documentation or project page creation process on this site. Other than Marathi, briefly, a few Kannada students from a south Indian college started working on Lingua Libre (you may see an event page m:Alva's Wikipedia Student's Association/Events/Lingua Libre training session). Similarly, you might have seen some involvement from the Punjabi community where User:Nitesh Gill and a couple of other Punjabi community members are working.
Other than Indian languages, there is a good response from Japanese, Ukrainian, and a few other languages. From all these the thought of the "International call" came to my mind.
Other than small projects we can also think of "small events" in the future, such as a LinguaLibre-a-thon or Libre-a-thon (similar to edit-a-thon, for example on World Environment Day we can get together and records pronunciation which is related to the environment, and not on Commons. This can be a small event where we record our own respective languages). Regards. --টিটো দত্ত (Titodutta) (কথা) 23:51, 2 February 2021 (UTC)
@Titodutta & सुबोध कुलकर्णी From what I see now with France and India, It seems the best seeds are with already very active wikipedians with interest in languages.
We also have a group of successful seeding due to already active wikimedian who have some institutional roles (Lyokoi, WikiLucas, Titudutta, सुबोध कुलकर्णी_Subodh). Basically, according to data, one out of 10 speaker who tried Lingualibre really stick in. So you need someone really active and outreaching, training 20, 30 people to initiate a local community.
Note: I create LinguaLibre:Events#2021_International_call, please fill in informations as needed. Yug (talk) 16:07, 8 February 2021 (UTC)

Marathi language stats

This section should be moved to LinguaLibre:Technical board.
Check-green.svg Done Yug (talk) 16:09, 8 February 2021 (UTC)

Mar records @2600 on Commons:Category:Lingua_Libre_pronunciation-mar Wikimedia Commons, but it is not reflected in LL stats - records per lang. It is just 163. Could anyone please look into and resolve? सुबोध कुलकर्णी (talk) 05:08, 3 February 2021 (UTC)

Hi सुबोध कुलकर्णी. Lingua Libre suffers a bug since the end of 2020. New developers are looking to this issue. Let us hope it will be fixed in the coming weeks. Pamputt (talk) 07:05, 3 February 2021 (UTC)
@सुबोध कुलकर्णी & सुबोध कुलकर्णी it's fixed ! Data are back online thanks to the devs hired by Wikimedia France and Adelaide. You can also use the {{User records-mar}} template on userpages to tag speakers/uploaders. Yug (talk) 14:17, 11 February 2021 (UTC)

Stats : toward records and beyond...

Folks, given the stats page is broken [paid devs will fix it in coming weeks thanks to Wikimedia France !], I jumped with some regex to do the maths:

We will likely reach 400,000 this very months. This feast is wildly due to the recent rise of Indic languages. We must also notice that most languages only have from 3 to 50 words, people trying out. Best results are achieved if we get users commit a bit, then things truly take off. Other thing, our 7 most active users provided 200,000 of our audios. 20 users contributed more than 3000 audios, and 20 others between 3000 and 1000 audios, so about 10% of speakers really hit it off. Quite interesting ! In my opinion we still have bottle necks on :

  • reaching out to diverse & minority languages ;
  • getting contributors to contribute consistently ;
  • and creating words lists for our users.

Inventing and exploring new methods for each of these bottlenecks is always welcome. Recent success with Marathi (Commons:Category:Lingua Libre pronunciation-mar‎: 15 C, 3,011 F) is a great example of reaching outside our usual pool, we surely may learn from this initiative.
I will hide in the code below the per-language stats as in tsv format, in case you want to check those. Yug (talk) 21:34, 6 February 2021 (UTC)

10% of speakers commiting to 1000 recording or more is very interesting. So it suggest that, if we give a workshop to 10 people, one committed speaker will emerge. Thereby kick starting this language. Yug (talk) 01:20, 8 February 2021 (UTC)

Reminder : Grants

Hello all, I'am monitoring grants these days and there is a summary table available here LinguaLibre:Grants

I think both rapid grants mechanisms could be of help to us now, to reach out to local community via small scale events, training, hardware, food, transportation costs, flyers' designs, etc. By example, This WM-France micro-fi's request organizes 4 evenings of contribution, getting 100€ for each evening. The same user has been welcome to do several Grant requests.
Heavier, the R&D Grant could surely be used for something. I have an idea on this, but we can trust Indian contributors to come up with relevant technical ideas and teams as well. @Titodutta Yug (talk) 01:20, 8 February 2021 (UTC)

LinguaLibre Bot and Wikidata

This section should be moved to LinguaLibre:Technical board.

I have not checked the bot's contrib on Wikidata for quite some time. Yesterday I uploaded ~100 Bangal film names from Bangla Wikipedia. It looks like the bot is not active, unless I am missing something. --টিটো দত্ত (Titodutta) (কথা) 18:10, 13 February 2021 (UTC)

Update and technical improvements

Hi all,

Full information and full disclosure, I'm working now with WikiValley and Wikimédia France in a paid capacity to help improve Lingua Libre technical structure (see this - in French - for the scope of our intervention).

One of our first action last Thursday was to restart the Blazegraph updater. A lot of tools are depending on this "fundamental brick" (including but not limited to): the SPARQL endpoint (and pages using it) and bots. Now, you can see that pages like Special:MyLanguage/LinguaLibre:Stats are up-to-date again and the bots should also restart soon (you can see more technical info on this on LinguaLibre:Technical board)).

The next big step will be to update this Mediawiki from 1.31 to 1.35 and moving it to a new server.

If you see something or anything wrong or strange, don't hesitate to let me know. I'm also available for any question.

Cheers, VIGNERON (talk) 08:56, 15 February 2021 (UTC)

Nice ! Happy to see you folks jumping in. Thank you for the Stats ! We can witness our passage over 400,000 audios shortly. Yug (talk) 16:27, 15 February 2021 (UTC)

400,000

The total amount of recordings on Lingua Libre reached 400,000 a few hours ago. February is already the second most fruitful month since the beginning of the project, even though we are only halfway through. LiLi is growing faster and faster, and this is only the beginning!
Congratulations and thanks to everyone who gives some time to record voices and to spread the project around the world.
All the best — WikiLucas (🖋️) 18:10, 16 February 2021 (UTC)

And another milestone broken ! Big thanks to the Titodutta and Marathi effects, too ! Yug (talk) 21:24, 16 February 2021 (UTC)
Yug, WikiLucas and Titodutta- thanks for the support! Marathi community had decided to gift minimum 5000 records on the occasion of Marathi Language Day to be celebrated on 27 February. We have crossed 6000 records as of now. All credit goes to community members. सुबोध कुलकर्णी (talk) 05:22, 26 February 2021 (UTC)
See also Commons:Category:Lingua_Libre_pronunciation-mar
Congratulation to the Marathi community ! It's nice to see you contributes this way :) Yug (talk)

Chat room in your language

Hi all. I've created Template:Lang-CR in order to list all the chat rooms. I think it would be interesting for people to discuss in their native language. The main discussion should remain on this chat room in English in order to be understood by most of the contributors. So feel free to create a village pump/chat room in your mother tongue. Pamputt (talk) 20:21, 16 February 2021 (UTC)

It is welcome move. We need to discuss many local issues, policies, approaches, ideas etc. in own language. I have created Mar page संवाद-चर्चा दालन. Let me know whether the process is right. I will start engaging speakers here. सुबोध कुलकर्णी (talk) 05:36, 26 February 2021 (UTC)
@सुबोध कुलकर्णी that's perfect. Pamputt (talk) 06:40, 26 February 2021 (UTC)

New batch of lists available ! (1,000 languages)

Please, remember to tag the list_talk's page with {{UNILEX license}}.

Greetings!
Thanks to Tshrinivasan with who we discussed recent Indic (Marathi!) activity and lack of lists, I bumped again into UNILEX (GNU-like license), which is a Google-led Unicode Consortium project listing vocabulary for 999 languages. Data seems clean as far as I can tell. The two main maintainers are Google folks. So I suspect UNILEX uses Google's best scrappers and NLP cleaners. Within this data are tab-separated frequency lists as {item} {number_of_occurences}. I forked their github, and made a script to convert their format into Lili's List:* format such as # {item}. See:

You can check if there is your own language among the 999 available. For Marathi, replace ig by mr. I therefor created 2 local lists to test this approach :

Right now, 1000 lists are already formated in Lili's syntax within the /data/frequency-sorted-hash directory. If any community lacks wordlists on Lili's there you have them : copy, paste, done, situation unlocked ! Yug (talk) 16:40, 24 February 2021 (UTC)

@Titodutta hi! This may interest your community. There are dozen(s) Indic languages :) It could also help you. You already recorded most of those words for your language (ben), together with the "ignore already recorded words" functions, these lists can fill some gaps :) Yug (talk) 16:48, 24 February 2021 (UTC)
  • I love this. I'll inform the Marathi folks. --টিটো দত্ত (Titodutta) (কথা) 17:16, 24 February 2021 (UTC)
  • This is just amazing. You don't know how much delighted I am feeling at this moment. I checked the Bengali list, a very few random words have typos, but that should not be more than 1% I guess. Over-all this will an extremely helpful resource for the communities. --টিটো দত্ত (Titodutta) (কথা) 17:24, 24 February 2021 (UTC)
  • I share your enthusiasm ! It's bot created I'am pretty sure, the clean up is likely just statistical. Now that those lists are technically available, ideal next step would be human review by local communities. Maybe groups of 2~3 users for copyedit sprints ? :D But this is optional IMHO. Also, the corpora coming from online documents, IRL objects like `chair`, `car`, `walk`, may be further down on these lists. But they must be there in the first 20,000 items. The best is the linguistic diversity of this set. Amazing. Yug (talk) 18:10, 24 February 2021 (UTC)
  • It's a good resource indeed. Thanks! The Marathi words in the list are grammatically correct also, with nearly no typos. We have started discussion about this in our community. Currently, we have started working on Lexemes first, the recordings of the lists thus created will be done simultaneously. The community thinks this approach is more useful in long run. The separate group of speakers may adopt these lists. But then we have to devise way to avoid repetitions. We will definitely discuss more on this resource utilisation and let you know.सुबोध कुलकर्णी (talk) 05:14, 26 February 2021 (UTC)

Tshrinivasan, Yug - Marathi community plans to work on these lists. But [1] giving 404 error. Please help. सुबोध कुलकर्णी (talk) 05:54, 5 March 2021 (UTC)

Tshrinivasan, सुबोध कुलकर्णी : It's in active developements these days so I made few changes.
  • Currently at: /hugolpz/unilex-extended/frequency-sorted-hash which uses UNILEX as a git submodule to respect each project's scope.
  • I just ran the script for Marathi, so the lists are now local. When picking a list, type List:Mar/M:
See also section below. My apologize for the changes. Hope it didn't affected you too much. Yug (talk) 07:47, 5 March 2021 (UTC)

Pause before running

Long tail curves likely applies to languages ranked by number of speakers. Since macro-languages such Mandarin, English, Spanish, Hindi, etc are certain to be soon audio documented by the sheer force of demography, our effort-strategy should progressively shift toward the right, and increasingly rare languages. The rarer the languages and speakers, the more listening we should become and the more custom assistances we will have to provide.

Dragons Bot has been created, coded, tested, and is ready to import UNILEX's lists to LinguaLibre's List:{iso}/{title} namespaces. Given 1,000 pages and associated talk page will be create, I would like to pause few days to consider about this large list import / creation and why.

  • Lili > Languages > existing breath: We reached 110 languages on LinguaLibre so far.
  • Lili > Lists > non-sorted by usefulness : Sparql queries provides lists for all languages, but without prioritization on words' usefulness.
  • Lili > Lists > sorted by usefulness :
    • Hand picked frequency lists are present for about 7 languages : eng, mar, por, pol, tam, ron, kur. With optimal relevance for teaching/learning.
    • Olafbot's List:*/Lemmas-without-audio-sorted-by-number-of-wiktionaries for 72 languages, updated daily, with optimal relevance for wiktionaries.
    • UNILEX can provide frequency lists for 1,000 languages. About 10 times our current language coverage. UNILEX plugs itself upon Github.com/Google/Corpuscrawler, and open source project which plan to support more languages. I dived into these chain and it's an 'easy' NLP pipeline to contribute too. The wikimedia comunity can use it and expand it.

Core issue: the core issue from online arrival of users is to increase retention of minority and semi-rare languages by smoothing their speakers work. By example an user of Wayuu language arrived today. We local (frequency) list was available today. But UNILEX + Dragons Bot can provide a local Wayuu frequency list of 8000 items, ready to record.
Since we don't know which semi-rare languages will come next, having 1,000 languages ready is a safe yet not so excessive bet. Assuming a en:Zipf's law/en:Long tail curve for languages and their speakers we can still predict that at least one out of 10~20 new language's speaker will miss a local wordlist. But together with OlafBot's lists, we move from 6% toward 90% of our languages habing a solid, usefulness-based roadmap to walk forward. Yug (talk) 14:21, 3 March 2021 (UTC)

Well, I believe the idea to import Unilex lists is very good. One of the things a new user needs most is an idea of what to record. The Unilex lists suit this function, especially in the case of new languages, where there is no other list available, and no words have been already recorded. The only question I see is how to import the Unilex lists. Perhaps the best idea is to import 1000 most frequent words from each list. It would be even better if the recorded words were automatically removed from the lists and replaced by new ones (like in the case of Olafbot-managed lists), but even a static list is good as bait if the goal is just to attract more speakers of rare languages.
One remark: you should translate the file names from Unilex to match LiLi's language codes (or perhaps you did it, I don't know, I didn't examine the code). It's not always the same, for example, Polish is "pl" in Unilex, and "Pol" in Lili. If you leave the old codes, the list won't be automatically found when a new user presses the "Local List" button. Anyway, the newbies are likely not to notice the lists at all regardless of all our efforts. Olaf (talk) 00:55, 4 March 2021 (UTC)

jQuery.Deferred exception: this.pastRecords is undefined

This discussion may be moved to LinguaLibre:Technical board.

Hello, there.

When I try to load a list of words to record from the FR wiktionary, the modal does not disappear when I click "Done" and seems blocked trying to load the words. During this time, the JS console complains that "jQuery.Deferred exception: this.pastRecords is undefined", and the last resource loaded is, in cURL format: curl 'https://fr.wiktionary.org/w/api.php?action=query&format=json&origin=*&formatversion=2&prop=pageterms&wbptterms=label&generator=categorymembers&gcmnamespace=0&gcmtitle=%3ACat%C3%A9gorie%3ALocutions%20verbales%20en%20fran%C3%A7ais&gcmtype=page&gcmlimit=max' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:85.0) Gecko/20100101 Firefox/85.0' -H 'Accept: application/json, text/javascript, */*; q=0.01' -H 'Accept-Language: de,en-US;q=0.7,en;q=0.3' --compressed -H 'Origin: https://lingualibre.org' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Referer: https://lingualibre.org/' -H 'TE: Trailers'

Looks like there is a bug…

Regards. LoquaxFR (talk) 17:21, 24 February 2021 (UTC)

Salut LoquaxFR, peux-tu décrire précisément ce que tu fais lorsque tu écris "when I try to load a list of words to record from the FR wiktionary" ? Comment charges-tu la liste de mots, le fais tu en utilisant en utalisant l'option « Catégorie Wikimedia » sur la droite ou bien en créant toi-même la liste de mots un par un ? Si tu utilises « Catégorie Wikimedia », peux-tu nous donner la catégorie que tu veux utiliser ? Est ce que tu arrives à reproduire le problème quelle que soit la catégorie avec laquelle tu veux travailler ? Merci d'avance pour ces renseignements qui je l'espère pourront permettre de cerner le problème le plus précisément possible. Pamputt (talk) 17:58, 24 February 2021 (UTC)
En français, ce sera plus simple, en effet. Le problème se reproduit systématiquement lorsque j’essaye d’utiliser une catégorie Wikimédia (celle du wiktionnaire français en l’occurrence); je n’utilise que cette possibilité pour charger des mots, et le problème apparaît pour toutes les catégories que j’essaye d’utiliser, que j’aie déjà enregistré presque tous les mots ou celles pour lesquelles je n’ai fait qu’une petite partie des milliers de termes. Le problème se produit en navigation privée également, donc ça ne semble pas être le cache ou les cookies. Si besoin de plus d’infos, n’hésite pas. LoquaxFR (talk) 18:08, 24 February 2021 (UTC)
Merci pour les infos supplémentaireS. Je viens de tester avec Firefox 78.7 et je ne rencontre pas ce problème. Peux-tu essayer avec un autre navigateur (Chromium ou autre) pour voir si le problème est inhérent à ton firefox (y compris en navigation privée). Ca peut par exemple venir d'un gadget que tu aurais installé. Pamputt (talk) 18:40, 24 February 2021 (UTC)
Addons Firefox qui casse le JS ? Yug (talk) 18:57, 24 February 2021 (UTC)
Chrome et Safari me donnent le même résultat ; j’ai également essayé depuis une autre bécane et un autre OS, sans mieux : l’erreur JS se montre toujours et rien ne se passe au moment de la validation de la modale. Est-ce que j’aurai enregistré trop de mots, faisant bugger le JS lorsqu’il essaye de retirer ceux déjà enregistrés ? Vu qu’on n’est que quelques-uns à en avoir enregistré autant, ça se pourrait. J’avais déjà remarqué que le chargement de listes depuis le Wiktionnaire mettait de plus en plus de temps pour moi (relativement, hein : quelques secondes d’attente au plus). Est-ce un autre problème lié à mon compte ? LoquaxFR (talk) 06:30, 25 February 2021 (UTC)
Merci pour les compléments d'info. J'ai ouvert T275734. Faudrait voir avec Lepticed7 et WikiLucas00, qui ont sensiblement le même nombre d'enregistrements que toi, pour tester si ils rencontrent aussi le même problème. Pamputt (talk) 06:54, 25 February 2021 (UTC)
Salut, perso, je sais pas si c’est lié, mais il y a certains enregistrements que le Record Wizard ne retire pas quand je veux retirer les mots déjà enregistrés. En atteste ce fichier, que j’ai enregistré trois fois. Lepticed7 (talk) 10:45, 28 February 2021 (UTC)

50,000

February 2021. This month. We have seen 50,000 pronunciation in a month (see LinguaLibre:Statistics). This is for the first time we saw 50,000 entries in a month. This is great. --টিটো দত্ত (Titodutta) (কথা) 08:51, 28 February 2021 (UTC)

That's really amazing. The same month we passed 400k recordings! AND the shortest month in the year! I'm going to prepare a small News to be published every month (inspired by what you did in September if I remember correctly), I think February is a very good month to start with! I'll publish it on your talk page if you'd like 🙂 All the best ! — WikiLucas (🖋️) 16:11, 28 February 2021 (UTC)
I would say, why not but I cannot lead for such project so if you are motivated to write and lead such newsletter, go ahead. Pamputt (talk) 18:39, 28 February 2021 (UTC)
On the LinguaLibre:Technical board/intro Poslovitch has started a /News section which keeps log of important milestones. It's an interesting idea because it's minimalist, therefor low maintenance.
I'am also interested by a Newsletter for both external and internal purpose. I would help around yes. Editorial line would gain to be clarified: who are the expected readers, writing stuly, overall length, major sections, sections lenghts, etc. But this can "appears" with the first few issues :) Please keep a balance so the writing workload stays modest. Yug (talk) 18:57, 28 February 2021 (UTC)
The /News of the technical board is mostly about technical news. I fully agree to the idea of a Newsletter, yet quarterly. We could grab some ideas from the French Wiktionary's Actualités. --Poslovitch (talk) 20:33, 28 February 2021 (UTC)
  • Salut, let's start with the newsletter of March. I'll add the stories I know such as 400,000 audios, 50,000 this month, the Wikimedia Wikimeet India, upcoming France-India call, French Wiktionary missed recording work etc. I'll start the draft tomorrow and ping you here.
    In future we will need mw:Extension:MassMessage to send newsletter to subscribers' talk page. A system admin is needed with access to the server and localsettings.php etc pages. I understand this will take time, so it can wait. Kind regards. --টিটো দত্ত (Titodutta) (কথা) 21:24, 28 February 2021 (UTC)
@Titodutta hi, We are having on the mailing list another discussion about networking, cooperations and outward communications. I think the LinguaLibre:Newsletter page can be modeled upon Technical board and LinguaLibre:Bot, a kind of hub for a subgroup of active users dedicated to a common goal. In this case Communication. The bimonthly Newsletter could be a core, founding element. But other discussion about outreach could take place there. We have so much to push in this direction : academic outreach, rare languages and under-represented countries, partner institutions, calling for new wikimedians, reminding far-away Wikimedian chapter of Lingualibre, etc. Having a hub dedicated to writing elegant co-edited texts, defining targets and leading the call for communication campaign would be a strong plus. I'am still focused on codes but I could help in few weeks. You seems to love it as well. Do we have other users interested to join such efforts ? Would be good to have few more folks. Yug (talk) 20:39, 2 March 2021 (UTC)

Newsletter : March 2021 review ?

You can co-edit this text. PS Titodutta: a rough summary of past months and emerging directions based on a message to an ex-contributor.

In January and February, the « Lili » community has taken back control of the technical stack (access to servers, GitHub codes, bots, etc.) and made a call for more diverse speakers. The Indian community started to show up, with key Indic languages being Bengali (50,000) and Marathi (~10,000). Romanian, Polish, Ukrainian are also on the rise around 20,000 audios each. We continue to have some dozen smaller languages showing up but no powerful push yet.

Right now, an external software company is upgrading our MediaWiki and its modules thanks to Wikimedia France's funding. The volunteer dev team is also strong and internal organization is increasing. We now have LinguaLibre:Technical board as a tech hub, LinguaLibre:Bot as a bot hub, LinguaLibre:Events as an IRL/Online event hub. When the main software upgrade settles down in a month we plan a [yet to create] LinguaLibre:Newsletter/room as an inward and outward communication hub.

In that last dimension, we could reach out to « relay users » on other wikis, who can share our news about LinguaLibre with communities of wiktionaries, wiksources, wikipedias, wikidata. We equally consider formally reaching out to non-Wikimedia groups such as Common Voice, Unicode, governmental and NGO agencies, research centers. Possibly in the form of group work and/or an online editathon when we gather to spread the news. This hub, summarizing the community's discussions, will therefore also clarify goals and strategies. We are looking for help with this matter.

This current forward dynamic is thanks to the early Autumn 2020's efforts. We weren't able to immediately convert those into actions but it still injected energy and vision into LinguaLibre which helped snowball the current dynamic. Also, many thanks to all those who got involved in this journey! Yug (talk) 07:20, 3 March 2021 (UTC)

Also, I just found out Commons grows at a speed of about 1 millions files per month. So with 50,000 audios last month, Lili makes up to 5% of Commons' new files. Yug (talk) 14:57, 3 March 2021 (UTC)

Marathi women speakers celebrate 'Women's Day' & 'Women History Month' on Lingua Libre

Greetings of coming World Women's day!
Glad to share this news. Marathi language community in Maharashtra State of India has taken initiative to record their language from the last 2 months. Out of total 26 speakers, @24 are women from 4 different places in the state. The group has decided to reach 10,000 recording mark to celebrate 'Women's Day' and 15,000 mark in March. As of now 8600+ recordings are uploaded. A small group of women have also started working on Lexicographical data, the recordings of which would be done simultaneously. The activity is being coordinated by institutional partner Jnana Prabodhini, Pune and facilitated by CIS-A2K, affiliate of WMF in India. The community needs support from all of you. Thanks, सुबोध कुलकर्णी (talk) 06:28, 5 March 2021 (UTC)

Greeting सुबोध कुलकर्णी, nice to witness this enthusiasm.
I imported UNILEX lists for Marathi. When in RecordWizard's Step 3 as you pick a list, go for Local list, then mar/M and you will see lists of the most used words. I proposed a gentle ramp approach : first list has just 200 words, see List:Mar/Most_used_words,_UNILEX_1:_words_00001_to_00200. Given my experience it will allows better on-the-ground session with new users. 200 is gently ambitious, allows to pass the uncanny valley of the first 20 words, and move to the joyful Lingualibre flow of rapid recording. Perfect for demo and on-boarding. :)
Following lists are for motivated users who chose to return. To consolidate skills, list 2 has 800 words while list 3 has 1000. At this state a nice 2,000 audio have been recorded by the speaker, while this words likely make up for 90% of daily conversations.
It then moves into committed users. List 4 has 3000, the following ones 5,000 words each. These lists are not expected to be done in one strike but over several session of one hour or less, during a dedicated day or along a week or so.
I hope these may help your language community to better on-board interested contributors :)
We also encourage development of women speakers networks, so thanks a lot for your lead. Yug (talk) 08:57, 5 March 2021 (UTC)
Added Marathi lists :
Yug (talk) 09:01, 5 March 2021 (UTC)
Many thanks Yug for detailed explanation. These are useful to start with. Our group has taken lexicographical approach now to develop lists. So we need alphabetical lists to get forms of words. For example we create list like this - शरीर, शरीरभर, शरीराकडून, शरीराकडे, शरीराचं, शरीराचा, शरीराची, शरीराचे, शरीराच्या, शरीरात...etc. The members distribute work according to letters. Therefore it will be good if we can get modified lists. - सुबोध कुलकर्णी (talk) 11:22, 5 March 2021 (UTC)
I see. सुबोध कुलकर्णी, you could use frequency-sorted-count/mr.txt, keep the 30,000 most frequent, then sort alphabetically and split by hand on each letter. See Help:How_to_create_a_frequency_list?#UNILEX.27s_lists. Yug (talk) 11:53, 5 March 2021 (UTC)
I tried to pushed it forward but it's a bit more complex than I anticipated. Ideally, you would 1) add a prefix so औ.txt becomes /Marathi_words_starting_with_औ.txt, 2) merge the rarest letters together. I must refocus on non-wiki projects, can you call for help from local wiki-developers ?
# Define language
iso=mr
# get file, cut out meta, sort by 2nd column (frequency), keep 50000, keep only word, sort by 1st column, alphabetically, save to .txt file
curl https://raw.githubusercontent.com/unicode-org/unilex/master/data/frequency/${iso}.txt | tail -n +6 | sort -k 2,2 -n -r | head -n 50000 | cut -d$'\t' -f1 | sort -k 1,1 > ${iso}.txt
# get mr.txt content, for all line starting with alpha-num, convert first letter to lowercase, then print in files depending on first symbol
cat mr.txt | awk '{file = (/^[[:alnum:]]/ ? tolower(substr($0,1,1)) : "symbol") ".txt"; print >> file; close(file)}'
# Remove a to z files
find . -regex './[a-z].txt' -delete
# Convert to wiki lists format `# {item}
sed -i -E 's/^/# /g' `find . -type f -name "?.txt"`
# See line counts, sorted numerically descendant
wc -l * | sort -n -r
# See lines count, if n<200 then print filename, add file to merged.txt
wc -l * | awk '$1 < 200 {print $2}' | xargs cat >> merged.txt
This already provides the lists by letters. It should put you solidly on the way. Yug (talk) 12:52, 5 March 2021 (UTC)
Without merge (50 files) With merging (32 files)
  99860 total
  50000 mr.txt
   4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
    182 थ.txt
    163 ओ.txt
    118 छ.txt
    115 ऑ.txt
     64 ऐ.txt
     55 ढ.txt
     44 औ.txt
     29 २.txt
     26 ई.txt
     20 ष.txt
     20 ऊ.txt
     20 १.txt
     14 ऋ.txt
      6 ऱ.txt
      4 ३.txt
      2 ९.txt
      2 ८.txt
      1 ॐ.txt
      1 ४.txt
  4976 स.txt
   4462 प.txt
   3745 म.txt
   3545 क.txt
   3195 व.txt
   2201 न.txt
   2183 ब.txt
   2134 अ.txt
   1789 र.txt
   1666 द.txt
   1623 आ.txt
   1568 ग.txt
   1524 ज.txt
   1507 त.txt
   1376 श.txt
   1132 ल.txt
   1102 ह.txt
   1089 च.txt
   1076 उ.txt
   1025 भ.txt
    886 merged.txt
    809 य.txt
    791 फ.txt
    766 ख.txt
    652 ट.txt
    645 घ.txt
    480 ए.txt
    456 इ.txt
    446 ध.txt
    420 ड.txt
    318 ठ.txt
    273 झ.txt
There is also a list List:Mar/Lemmas-without-audio-sorted-by-number-of-wiktionaries which is updated every day by a bot, so it should be always fresh. The list consists of words that are present in one or more Wiktionaries, but have no recording in Commons. At the top of the list, there are words with the largest number of Wiktionaries. You could probably give it a try too, सुबोध कुलकर्णी. Olaf (talk) 16:34, 5 March 2021 (UTC)

Automatically updated lists of unrecorded audio

Not everybody here is probably aware that there are lists of unrecorded words available for 72 languages. The lists are sorted by the number of the language versions of Wiktionary where a corresponding word is described, with the most popular words at the top, so the lists should maximize in a way the usefulness of the recording. Words with audio recordings present in Commons are removed automatically from the lists every night. In this way, the lists should be always fresh. The lists have always a title in the form of <language code>/Lemmas-without-audio-sorted-by-number-of-wiktionaries: afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue. Olaf (talk) 16:51, 5 March 2021 (UTC)

This is game changer. Welcoming new contributors of 72 languages will no more be a tricking question of providing relevant lists. More lists coming. We can refocus on outreach and calling for new contributors to audio document their voices, their languages, their cultures. Yug (talk) 18:15, 5 March 2021 (UTC)

Outreach

Dialects of Catalan.

I used the opportunity of bumping into a currently inactive user to go to his wikipedia (Catalan), ask him where I could announce we now have a cat list, and went to make a gentle announcement. I don't expect it to pay off soon, but by several pings, we should have some folks landing back here on Lingualibre. I didn't contact the ca:wikt community but you see the idea : leaving small many announcements here and there so people know our name. Smaller pings are ok. "Sorry all, i've been busy on LinguaLibre project those days", this would be helpful too. I tried to emphasis what service Lili provides to them (not sure I was good on that, but it's just a ping :) ). Please when you have the opportunity, reach out to local communities. Especially those not currently active. We have nice lists in 72+ languagea now. Let the wiki folks know and record more. Yug (talk) 08:24, 7 March 2021 (UTC)

@Pamputt hi, they started a light conversation-description of Catalan about cat valencia, cat central, cat balearic and cat Western (? not sure it was 3 or 4 different) pronunciations. Do you have any understanding on this Catalan issue ? Is this like Marseille French VS Paris French accents or something else ? Yug (talk) 18:25, 7 March 2021 (UTC)
I do not precisely know how different are these Catalan varieties but they are more different than French from Paris and French from Marseille because theses varieties are considered as different dialects. So it is something like Gascon (Q930) and Occitan auvernhat (Q1186) for the Occitan language. So we could start to import this dialect in Lingua Libre to be able to record in these dialects. At least, we should import the main dialects here, namely Northwestern Catalan, Valencian, Central Catalan, Balearic, Rossellonese and Alguerese. Pamputt (talk) 18:58, 7 March 2021 (UTC)
It seems to be the wish expressed by User:Vriullop too, and on another discussion I got. Yug (talk) 19:22, 7 March 2021 (UTC)
Northwestern Catalan (Q518078), Valencian (Q518079), Central Catalan (Q518087), Balearic (Q518106), Northern Catalan (Q518118), Algherese (Q518128) are now available, so we can record right now words in these dialects. Pamputt (talk) 20:09, 7 March 2021 (UTC)

License ?

Check-green.svg Done
I bumped again into cc-by-sa license for contributions. Aren't we supposed to contribute it all under CC-0 so it's Wikidata compatible ? Yug (talk) 21:39, 8 March 2021 (UTC)

The licence is up to the user's choice. --Poslovitch (talk) 21:54, 8 March 2021 (UTC)
Then what do we do on wikidata ? Ooohhh... It's just a link toward Commons, no a copy of the audio file.... Yug (talk) 22:53, 8 March 2021 (UTC)

Metrics > Accounts creations

Hi everyone !
We got about 5 times more account creations this January 2021 (~60) compare to January 2020 (~12).
Welcoming is largely done by hand these days. Having a bot for that may help.
And, given that we are all overloaded, maybe would be wise to outreach for help. Yug (talk) 23:19, 8 March 2021 (UTC)

Help - to delete word

Hi, please guide me how i can delete recorded word from lili. already uploaded on wikimedia commons by mistake. Recorded Marathi word is 'कालका', which i want to delete. Thanks in advance.

Hi Aparna Gondhalekar, there are two options depending whether "कालका" exists. If "कालका" exists but you record badly, then you just need to record it again and the new recording will replace the previous recording. Or if "कालका" does not exist, we need to delete the file directly on Wikimedia Commons. Pamputt (talk) 21:18, 9 March 2021 (UTC)

Wikimania 2021

It's not a big surprise, but it have been confirmed : Wikimania_2021 will be online only. It will limit our outreach. We used to go there and record 10~20 languages, 5-mins demoing to 30 people, and doing workshop to 40+ others. Also got plenty of small chats (100+) raising awareness about Lili and connecting with devs for fast discussions. Will need to find other way this year too. Yug (talk) 21:34, 9 March 2021 (UTC)

Return with Return

So, we are back. Almost after 50 days, we are back to work. Thanks to User:VIGNERON, User:Yug, User:Pamputt etc who were around. Let's make some noise.

Idea: I have an idea, can you record the word "Return" or "Come back" (or something similar) in your language and put it in the gallery below? Please mention the language name, and meaning in the caption. --টিটো দত্ত (Titodutta) (কথা) 02:09, 23 April 2021 (UTC)

"Return/Come back" as in "LinguaLibre is back", :en:The Lord of the Rings: The Return of the King] (70 languages) or en:Return of the Jedi (63), right ? Titodutta, please provide some examples / context. Yug (talk) 04:58, 23 April 2021 (UTC)

Return Gallery

Translate doesn't seem to work

I can't seem to be able to translate pages, is this an error on my behalf or are there something wrong with the servers? --Sabelöga (talk) 17:01, 23 April 2021 (UTC)

Indeed, something is broken. There is a Phabricator ticket to track this issue. Pamputt (talk) 18:30, 23 April 2021 (UTC)
Okay, thank you. --Sabelöga (talk) 22:01, 23 April 2021 (UTC)
Hello Pamputt, I tried to translate several pages from the Wiki directly, to test, taking inspiration from the T:xx translation markers (example: https://lingualibre.org/wiki/Translations:Help:Main/14/fr). An error occurs, always the same. I added a line in your task, notifying Tgr who may be interested. He may add the tag of the "OAuthAuthentication" project. Cordially. —Eihel (talk) 14:31, 25 April 2021 (UTC)

erreur de traduction

Translations are back. Thanks. Pamputt (talk) 18:54, 27 April 2021 (UTC)

HIGH PRIORITY: Audio recordings have dust and clicks

Under investigation: Some users experience parasitic saturation (“Pock!”) or dust while other don't. This irregular occurrence reminds of earlier, non-solved “speed up bug”.

Discussion

I've had friends record German and Romanian lists. They're using separate hardware, and have recorded thousands of words before, so I know their hardware is fine. The recordings they've done today suffer from loud clicks on half the recordings, so there seems to be a problem with the recording studio. I clearly have no idea what the problem is or how to fix it, but I hope someone else will!

Here are examples:

  • — LL-Q188_(deu)-Natschoba-der_Wunsch.wav
  • — LL-Q7913_(ron)-Andreea_Teodoraa-muscă.wav
  • — LL-Q150 (fra)-Hélène (Hsarrazin)-corné.wav

Julien Baley (User talk:Julien Baleytalk) 16:24, 24 April 2021 (UTC)

J'ai le même souci. DSwissK (talk) 17:49, 24 April 2021 (UTC)
Hmm, very annoying.I 've opened a Phabricator ticket. I hope the issue will be fixed soon. Pamputt (talk) 18:38, 24 April 2021 (UTC)
HIGH priority. No idea who can fix it. Can someone refine the diagnosis ? Can more people test with their configuration and report here ? Yug (talk) 15:33, 25 April 2021 (UTC)
I notified Mr. Vion, the original coder of the JS recorder. He may have some insights. I suspect it's a bug with either :
  • RecordWizard (studio), the mw extension interfacing the user speaking and the audio processing layers. It got recent changes due to migration to mw 1.35.
  • LinguaRecorder JS, the core JS library processing audio signal. No changes in past week.
Recent changes may have affected how the audio cuts are done. Either mw extension or the JS could need a fix.
This is a core bug preventing LinguaLibre core mission. Any insight is welcome. Yug (talk) 15:43, 25 April 2021 (UTC)
So der Wunsch (Q522922) (deu:der_Wunsch), muscă (Q522753) (ron:muscă) and corné (Q523386) (fra:corné). —Eihel (talk) 17:26, 25 April 2021 (UTC)
@Eihel the 1st and 3rd ones sounds good to me. Yug (talk) 20:38, 25 April 2021 (UTC)
@Yug the 1st and 3rd ones do not sound good to me, there's a clear click on the "der" and "cor". If you have populated the table below, perhaps your numbers are too optimistic (if we have a different judgement on these three). Julien Baley (talk) 12:56, 26 April 2021 (UTC)
@Julien Baley, DSwissK, & Eihel
I reviewed recent recordings of 4 users.
  • Two contributors have perfect audios (100% good on 8 audios checked for each user).
  • Two new users have the bug (30% of audios with saturation).
I first though it could be new users not using their hardware properly : microphone must not be overly sensitive, we should not let them vibrate, etc. It's a know-how we are transmitting when doing IRL workshops and that tech-friendly people fix quickly. Autodidact users have not been warned of this.
But it does not explain why experienced users such as DSwissK and Julien's friend have such noise. So I'am confused.
DSwissK, did you tried alternative microphone settings, with lower volume ? That you are not recently speaking louder or a changes you did not notice previously ? Yug (talk) 22:02, 25 April 2021 (UTC)
Hello Yug, I concede that the difference may be minimal on some records. You have to listen carefully, it's like "a diamond on a vinyl which jumps on a dust". Some files are more affected than others (depending on the vocal intonation), but all of the ones I have cited are problematic. To fully understand, you can try recording with Schtooka (former LiLi), then immediately redo the same recording on LiLi. As I said to Hélène, you can also compare with an existing recording corné (Q499309). Cordially. —Eihel (talk) 15:12, 26 April 2021 (UTC)
@Eihel & Julien Baley I'am officially deaf from one ear so I'am not the best judge on audios. I pushed the review as far as I can do bu could other users help to review more audios so Mr. Vion can attack this investigation with clean clues and ratios. Yug (talk) 16:15, 26 April 2021 (UTC)
@Yug I'm very happy to help review some recordings, if you want; could you suggest a list of users? (I don't know how to find users that have recently recorded). Julien Baley (talk) 17:41, 26 April 2021 (UTC)
@Julien Bale process added below. Thank you ! Note: the user I review (all those below) may have higher noise ratio since don't have a musical ear. Yug (talk) 16:56, 26 April 2021 (UTC)
@Yug ; I've checked the entire table and added a few people (Hsarazin has only 1 recent recording, so I've amended the "14" that was shown). Some people have 0% problem, some close to 100%... the problems are very characteristic. Julien Baley (talk) 19:25, 26 April 2021 (UTC)
@Pamputt & DSwissK & others, I really need help on this one. We need to review and report 10+ recording for each user uploading audios to Commons and likely to send a custom message to each affected user, on their talk page and on their Commons' talk page (ex msg, ex ping). Yug (talk) 16:36, 26 April 2021 (UTC)
@Yug not fully helpful but I added a section on LinguaLibre:Stats#The most prolific speakers for the current month, it may help to narrow down to who did recent recordings. Cheers, VIGNERON (talk) 07:20, 27 April 2021 (UTC)

/!\ The dust bug issue is confirmed as core and relatively widespread. I sent an email this morning to Wikimedia France (Adelaide, Remy, Michael) with suggested solutions : immediate, restoring a sitenotice ribon to inform our users ; short term, hiring Vion for analysis and possibly a fix. We should not be claiming to be back online and on our feet when we arent. Yug (talk) 14:09, 27 April 2021 (UTC)

Good. The CSS fixes have been deployed. → Sitenotice is back. → Indentation is back. Yug (talk) 14:11, 27 April 2021 (UTC)
@WikiLucas00 & DSwissK hi,
Given you are the two active users having this issue we need you most.
Could you record 15~30 other audios with another Web browser, such as Firefox or else. Then report the result with this ?
If you have any other hypothesis to test I'am interested. (Changing microphones, etc.) Yug (talk) 18:23, 27 April 2021 (UTC)
I had the impression (and DSwissK confirmed on Discord) that using Firefox slightly reduces the amount of problems encountered. — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)
Yup, I installed Firefox and could finally send some more audios (me and my daughter), with internal microphone on my laptop. Please review. DSwissK (talk) 00:45, 28 April 2021 (UTC)

Limiting the number of words to record

@Yug, DSwissK, VIGNERON, Seb35, Pamputt, & Titodutta I think that one important cause of the bugs is related to the RAM. Thus, loading a long list into the Record Wizard results in a maximum amount of bugs in the recordings (the length of this list -- its weight -- may vary, depending on the user's hardware and software).

I think we should try limiting (to 100 or 200 maximum) the possible number of words to be put into the Record Wizard, at least temporarily. There is no point in loading into the RW lists that are 1000-words long; taking a little break during the recording is never wrong, and it could help reducing the amount of bugs for the moment, while we try to find the source of the issue.
Best — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)

We have to test this hypothesis. Yug (talk) 21:35, 27 April 2021 (UTC)
Tested and reporting : I used very small lists (less than 10 words) and still have the same issue. I encounter that bug on my smartphone, both my computers (desktop and laptop) under Chrome (latest version). Using internal or external microphone doesn't change anything. DSwissK (talk) 00:42, 28 April 2021 (UTC)
@DSwissK thank you. This is helpful. Seems clearly software issue. I contacted Wikimedia France and Vion requesting them to jump in.
We need people with audio software skills to inspect those audios and people with JS+audio skills to review the audio input chains. Mr. Vion has both skills. Yug (talk) 10:52, 28 April 2021 (UTC)
I do not think it's RAM related.
Even with 1000 words we are dealing with 1000 words x 7KB per file = 7 MB.
Let's admit the browser stores the words in a very, very details-rich way, so the files are 1000 times heavier. We still are 7GB.
Most computers have 8~16GB of RAM by now.
I also recorded small list and apparently add the issue.
Most (all?) users affected had recorded few dozens words. Worst affected users: Natschoba → 149, Andreea Teodoraa → 247, WikiLucas00 → 64.
All but 3 users this month have recorded less than 300 words. Yug (talk) 11:02, 28 April 2021 (UTC)

Review process

To review recordings by another user :

  1. Go to Special:RecentChanges > Find recent recordings > Pick an user which is not already in the table below
  2. Open 10~20 of this user's recent recordings > Listen each > Count how many have unusual audio artifacts
  3. Add this user to the table below with its associated results and your comment
  4. If you feel necessary, please notify the user on Lili (ex msg) and ping the user on Commons (ex ping)

To be reviewed :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews > Post a message with your web browser, its version [optional], and your OS.

To be reviewed, recording with another browser or device :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews > Post a message with your web browser, its version [optional], and your OS.
  3. Add some information so we know which of your recording are associated with this alternative browser or device.

Review-ready

  • I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : all good for me, no review needed. Yug (talk) 14:35, 27 April 2021 (UTC)
@Yug Could you try 20 more with an up-to-date version of Chrome? — WikiLucas (🖋️) 18:38, 27 April 2021 (UTC)
@WikiLucas00 Done. I'am not sure, but I may have the bug as well. Yug (talk) 19:42, 27 April 2021 (UTC)
@Yug The majority of your last recordings contain at least a click. — WikiLucas (🖋️) 19:56, 27 April 2021 (UTC)

Samples

Under investigation: Some contributors experience parasitic saturation (“Pock!”) or dust while other don't.
Please review your recent recordings and help expand table below so we can identify a recurring pattern among affected contributors vs non-affected ones.
Username # reviewed % affected Example file Web Browser + version Comment
c User:DSwissK 15 33% (5)
c User:Natschoba 20 95% (19)


Several thousands of recordings before. No hardware change.
c User:Andreea Teodoraa 11 75% (8)

Several thousands of recordings before. Tried different mics and platforms, same behaviour.
c User:GeoMechain 15 0% (0)
c User:ClasseNoes 15 0% (0)
c User:Hsarrazin 14 30% (4)

c User:ᱥᱟᱹᱜᱩᱱ ᱗ 2 100% (2)
Only 2 audios.
c User:Zoyahssn 2 100% (2) File:LL-Q1860 (eng)-Md Anan Islam (Zoyahssn)-Md Anan Islam.wav Suspects: Hardware & sound setting issue
c User:Olaf 15 0% (0) All recent recordings ok.
c User:WikiLucas00 60 75% (45)


Brave 1.23.73 (Chromium: 90.0.4430.85) See my 2021-04-26 10pm series