LinguaLibre

Technical board/Audio click bug

< LinguaLibre:Technical board
Revision as of 20:30, 8 September 2023 by Adithyak1997 (talk | contribs) (Corrected typo)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

All discussions related to this audio click bug gathered here. See also phabricator:T281041.

HIGH PRIORITY: Audio recordings have dust and clicks

phab:T281041. Moved from LinguaLibre:Chat_room/Archives/2021.
Under investigation: Some users experience parasitic saturation (“Pock!”) or dust while other don't. This irregular occurrence reminds of earlier, non-solved “speed up bug”.

I've had friends record German and Romanian lists. They're using separate hardware, and have recorded thousands of words before, so I know their hardware is fine. The recordings they've done today suffer from loud clicks on half the recordings, so there seems to be a problem with the recording studio. I clearly have no idea what the problem is or how to fix it, but I hope someone else will!

Here are examples:

  • commons (rerecorded)— LL-Q188_(deu)-Natschoba-der_Wunsch.wav
  • commons (rerecorded) — LL-Q7913_(ron)-Andreea_Teodoraa-muscă.wav
  • commons (rerecorded) — LL-Q150 (fra)-Hélène (Hsarrazin)-corné.wav

Julien Baley (User talk:Julien Baleytalk) 16:24, 24 April 2021 (UTC)

J'ai le même souci. DSwissK (talk) 17:49, 24 April 2021 (UTC)
Hmm, very annoying.I 've opened a Phabricator ticket. I hope the issue will be fixed soon. Pamputt (talk) 18:38, 24 April 2021 (UTC)
HIGH priority. No idea who can fix it. Can someone refine the diagnosis ? Can more people test with their configuration and report here ? Yug (talk) 15:33, 25 April 2021 (UTC)
I notified Mr. Vion, the original coder of the JS recorder. He may have some insights. I suspect it's a bug with either :
  • RecordWizard (studio), the mw extension interfacing the user speaking and the audio processing layers. It got recent changes due to migration to mw 1.35.
  • LinguaRecorder JS, the core JS library processing audio signal. No changes in past week.
Recent changes may have affected how the audio cuts are done. Either mw extension or the JS could need a fix.
This is a core bug preventing LinguaLibre core mission. Any insight is welcome. Yug (talk) 15:43, 25 April 2021 (UTC)
So der Wunsch (Q522922) (deu:der_Wunsch), muscă (Q522753) (ron:muscă) and corné (Q523386) (fra:corné). —Eihel (talk) 17:26, 25 April 2021 (UTC)
@Eihel the 1st and 3rd ones sounds good to me. Yug (talk) 20:38, 25 April 2021 (UTC)
@Yug the 1st and 3rd ones do not sound good to me, there's a clear click on the "der" and "cor". If you have populated the table below, perhaps your numbers are too optimistic (if we have a different judgement on these three). Julien Baley (talk) 12:56, 26 April 2021 (UTC)
@Julien Baley, DSwissK, & Eihel
I reviewed recent recordings of 4 users.
  • Two contributors have perfect audios (100% good on 8 audios checked for each user).
  • Two new users have the bug (30% of audios with saturation).
I first though it could be new users not using their hardware properly : microphone must not be overly sensitive, we should not let them vibrate, etc. It's a know-how we are transmitting when doing IRL workshops and that tech-friendly people fix quickly. Autodidact users have not been warned of this.
But it does not explain why experienced users such as DSwissK and Julien's friend have such noise. So I'am confused.
DSwissK, did you tried alternative microphone settings, with lower volume ? That you are not recently speaking louder or a changes you did not notice previously ? Yug (talk) 22:02, 25 April 2021 (UTC)
Hello Yug, I concede that the difference may be minimal on some records. You have to listen carefully, it's like "a diamond on a vinyl which jumps on a dust". Some files are more affected than others (depending on the vocal intonation), but all of the ones I have cited are problematic. To fully understand, you can try recording with Schtooka (former LiLi), then immediately redo the same recording on LiLi. As I said to Hélène, you can also compare with an existing recording corné (Q499309). Cordially. —Eihel (talk) 15:12, 26 April 2021 (UTC)
@Eihel & Julien Baley I'am officially deaf from one ear so I'am not the best judge on audios. I pushed the review as far as I can do bu could other users help to review more audios so Mr. Vion can attack this investigation with clean clues and ratios. Yug (talk) 16:15, 26 April 2021 (UTC)
@Yug I'm very happy to help review some recordings, if you want; could you suggest a list of users? (I don't know how to find users that have recently recorded). Julien Baley (talk) 17:41, 26 April 2021 (UTC)
@Julien Bale process added below. Thank you ! Note: the user I review (all those below) may have higher noise ratio since don't have a musical ear. Yug (talk) 16:56, 26 April 2021 (UTC)
@Yug ; I've checked the entire table and added a few people (Hsarazin has only 1 recent recording, so I've amended the "14" that was shown). Some people have 0% problem, some close to 100%... the problems are very characteristic. Julien Baley (talk) 19:25, 26 April 2021 (UTC)
@Pamputt & DSwissK & others, I really need help on this one. We need to review and report 10+ recording for each user uploading audios to Commons and likely to send a custom message to each affected user, on their talk page and on their Commons' talk page (ex msg, ex ping). Yug (talk) 16:36, 26 April 2021 (UTC)
@Yug not fully helpful but I added a section on LinguaLibre:Stats#The most prolific speakers for the current month, it may help to narrow down to who did recent recordings. Cheers, VIGNERON (talk) 07:20, 27 April 2021 (UTC)

/!\ The dust bug issue is confirmed as core and relatively widespread. I sent an email this morning to Wikimedia France (Adelaide, Remy, Michael) with suggested solutions : immediate, restoring a sitenotice ribon to inform our users ; short term, hiring Vion for analysis and possibly a fix. We should not be claiming to be back online and on our feet when we arent. Yug (talk) 14:09, 27 April 2021 (UTC)

Good. The CSS fixes have been deployed. → Sitenotice is back. → Indentation is back. Yug (talk) 14:11, 27 April 2021 (UTC)
@WikiLucas00 & DSwissK hi,
Given you are the two active users having this issue we need you most.
Could you record 15~30 other audios with another Web browser, such as Firefox or else. Then report the result with this ?
If you have any other hypothesis to test I'am interested. (Changing microphones, etc.) Yug (talk) 18:23, 27 April 2021 (UTC)
I had the impression (and DSwissK confirmed on Discord) that using Firefox slightly reduces the amount of problems encountered. — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)
Yup, I installed Firefox and could finally send some more audios (me and my daughter), with internal microphone on my laptop. Please review. DSwissK (talk) 00:45, 28 April 2021 (UTC)
@Yug I checked with Andreea_Teodoraa and Natschoba what browser they're using: Chrome and Safari. I asked Andreea_Teodoraa to try Firefox, she did 22 recordings (https://commons.wikimedia.org/wiki/Special:ListFiles?limit=20&user=Andreea+Teodoraa) and 20 are clearly perfect, and 2 (însene and "pe scurt" I feel I hear a problem, but cannot see anything in Audacity). Considering we were on 75% bug on Chrome, this seems to be a move in the right direction. Julien Baley (talk) 02:33, 30 April 2021 (UTC)
@Yug Have tried with another friend (https://commons.wikimedia.org/w/index.php?title=Special:ListFiles&limit=100&user=LangPao) and everything sounds bug-free, both on Chrome and Firefox; Firefox is the most recent 10). Julien Baley (talk) 13:11, 30 April 2021 (UTC)
(Answered below on 15:16, 4 May 2021 Yug (talk) 15:48, 4 May 2021 (UTC))

I think that could raise your interest : same smartphone, same internal microphone, same list (1 word). The only difference is using Chrome and Firefox version. DSwissK (talk) 19:20, 1 May 2021 (UTC)

@Julien Baley & DSwissK thank to you both. The recent A/B testing where only one parameter is changed is what we look for. Testing same users with different browser seems fruitful. Thanks also to Julien for your audacity inspections, our dev will eventually have to dig into that.
@DSwissK, from your 2 example i see mainly a difference in volume (dB). It may be nothing, but when reviewing audios I also noticed that many seemed to be low dB. Could it be that Chrome changed it's default audio recording levels, which increase the presence of noise ? In that cases other projects like Forvo (fake open license) and others should also be affected.
Anyway, if a recent Chrome version was corrupted, maybe we could recommend to use Firefox for a while. Yug (talk) 15:16, 4 May 2021 (UTC)
@Yug there is indeed a difference in volume but the problem is not the noise but the clicks. There is more noise in the Firefox version, but it isn't disturbing. At least, not as much as these clicks... DSwissK (talk) 18:29, 4 May 2021 (UTC)
Is there any chance it is related to the versions of Firefox or Chrome? I guess people upgraded their browser versions in the recent months – if I understand correctly there were a few issues before the OVH fire; perhaps more people upgraded since. (Personnally I hardly hear the issue except when there is a loud click, I don’t have an ear as developed as others here.) Seb35 (talk) 21:05, 4 May 2021 (UTC)

I reinstalled the LinguaRecorder demo on https://lingualibre.org/demo/sandbox.html with the settings identical to the RecordWizard extension (on the gear on the 'Studio' (4th) step and here in the PHP+JS code). You can play with the settings, perhaps there is something to move around the saturation? (You have to click on "Apply new options" then "start" when you change one, and the "ready" counter should be incremented.) Seb35 (talk) 20:54, 4 May 2021 (UTC)

Limiting the number of words to record

@Yug, DSwissK, VIGNERON, Seb35, Pamputt, & Titodutta I think that one important cause of the bugs is related to the RAM. Thus, loading a long list into the Record Wizard results in a maximum amount of bugs in the recordings (the length of this list -- its weight -- may vary, depending on the user's hardware and software).

I think we should try limiting (to 100 or 200 maximum) the possible number of words to be put into the Record Wizard, at least temporarily. There is no point in loading into the RW lists that are 1000-words long; taking a little break during the recording is never wrong, and it could help reducing the amount of bugs for the moment, while we try to find the source of the issue.
Best — WikiLucas (🖋️) 19:53, 27 April 2021 (UTC)

We have to test this hypothesis. Yug (talk) 21:35, 27 April 2021 (UTC)
Tested and reporting : I used very small lists (less than 10 words) and still have the same issue. I encounter that bug on my smartphone, both my computers (desktop and laptop) under Chrome (latest version). Using internal or external microphone doesn't change anything. DSwissK (talk) 00:42, 28 April 2021 (UTC)
@DSwissK thank you. This is helpful. Seems clearly software issue. I contacted Wikimedia France and Vion requesting them to jump in.
We need people with audio software skills to inspect those audios and people with JS+audio skills to review the audio input chains. Mr. Vion has both skills. Yug (talk) 10:52, 28 April 2021 (UTC)
I do not think it's RAM related.
Even with 1000 words we are dealing with 1000 words x 7KB per file = 7 MB.
Let's admit the browser stores the words in a very, very details-rich way, so the files are 1000 times heavier. We still are 7GB.
Most computers have 8~16GB of RAM by now.
I also recorded small list and apparently add the issue.
Most (all?) users affected had recorded few dozens words. Worst affected users: Natschoba → 149, Andreea Teodoraa → 247, WikiLucas00 → 64.
All but 3 users this month have recorded less than 300 words. Yug (talk) 11:02, 28 April 2021 (UTC)
Folks, I inspected our Github codes:
I can't find a clear recent change which could have affected our audios recording stream.
@VIGNERON & Seb35 are you aware of any (environmental) change which could have had affected the audio stream of RecordWizard recently ? Yug (talk) 07:57, 29 April 2021 (UTC)
I am still in the process of properly publishing code from the server to Github and Gerrit for the various extensions, but there is indeed no change related to audio.
Specifically the LinguaRecorderJS is very exactly what was installed in 1.31 and in 1.35, no change here (on the server there is only a micro-instruction to register the LinguaRecorderJS in MediaWiki environment)
For the RecordWizard, main changes are maintenance, a technical thing about serialization of Wikibase items, and related to interface (vue.js, which changed from 2.6.11 to 2.6.12, which is mainly a security release).
Seb35 (talk) 19:46, 4 May 2021 (UTC)

@VIGNERON, Seb35, Pamputt, Yug, & Poslovitch
Update: Another user (Le Commissaire) reported an audio bug (on WMFr Discord server). This was not the "click"/"pop" bug, but the speeding-up bug, but the user told that the bug occurred when loading a list of 1000 words into the RW. I suggested him to try loading a shorter list, he tried with 250 words and it worked fine, no issue. This constitutes another clue that RAM is important/long lists are a problem for several users in the RW.
In addition to a potential limitation of the RW to 350 words (for example), see this related ticket:

  • T276014, Feature request to be able to load parts of lists in RW (only possible for Categories at the moment)


Best — WikiLucas (🖋️) 15:09, 6 May 2021 (UTC)

Worth investigating. I made assumption of 7kB per word, but the audio strean could be completly different from my assumption. Natural path would requires to call back Mr. Vion or User:0x010C to investigate (none currently active), or to dive into LinguaRecorderJS, the navigator's memory, and Ram. Maybe more. Yug (talk) 18:41, 6 May 2021 (UTC)

Review process

To review recordings by another user :

  1. Go to Special:RecentChanges > Find recent recordings > Pick an user which is not already in the table below
  2. Open 10~20 of this user's recent recordings > Listen each > Count how many have unusual audio artifacts
  3. Add this user to the table below with its associated results and your comment
  4. If you feel necessary, please notify the user on Lili (ex msg) and ping the user on Commons (ex ping)

To be reviewed :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.

To be reviewed, recording with another browser or device :

  1. With your usual web browser, go to Record Wizard (studio) > Step 3, enter your web browser name then 15 words in your language > Record, publish.
  2. Come on LinguaLibre:Chat room#Reviews-ready > Post a message with your web browser, its version [optional], and your OS.
  3. Add some information so we know which of your recording are associated with this alternative browser or device.

Review-ready

  • I recorded 10+ audios with Chrome 89.0.4389.114 (Official Build) (64-bit) : all good for me, no review needed. Yug (talk) 14:35, 27 April 2021 (UTC)
@Yug Could you try 20 more with an up-to-date version of Chrome? — WikiLucas (🖋️) 18:38, 27 April 2021 (UTC)
@WikiLucas00 Done. I'am not sure, but I may have the bug as well. Yug (talk) 19:42, 27 April 2021 (UTC)
@Yug The majority of your last recordings contain at least a click. — WikiLucas (🖋️) 19:56, 27 April 2021 (UTC)
Username # reviewed % affected Example file Browser + version OS + Version Bug within the RecordWizard Review ? Bug After upload to Commons ? Comment
c User:DSwissK 15 33% (5)

New echo bug?
c User:Natschoba 20 95% (19) Several thousands of recordings before. No hardware change.
c User:Andreea Teodoraa 11 75% (8) Several thousands of recordings before. Tried different mics and platforms, same behaviour.
c User:GeoMechain 15 0% (0)
c User:ClasseNoes 15 0% (0)
c User:Hsarrazin 14 30% (4)
c User:ᱥᱟᱹᱜᱩᱱ ᱗ 2 100% (2) Only 2 audios.
c User:Olaf 15 0% (0) All recent recordings ok. (I have these clicks in every recording session, but I remove all such occurrences during the review phase. Only because of this it's 0%.Olaf (talk) 23:44, 1 May 2021 (UTC)))
c User:WikiLucas00 60 75% (45) Brave 1.23.73 (Chromium: 90.0.4430.85) See my 2021-04-26 10pm CEST series
c User:WikiLucas00 300 0% (0) All files are OK Firefox 88.0.1, External microphone Perfectly fine. See my 2021-05-06 9am CEST series
c User:Le Commissaire ?? ?% (?) Opera, Desktop Computer, External microphone Speed-up bug occurred when loading a 1000-words-long list into RW. Tried with loading only 250 words and recording again, went fine.

Audio click bug (June 2022)

Investigating the audio click bug is a major priority for Lingualibre's growth, as solving it depends on this investigation.

I was able to reliably reproduce the audio click bug at Wikimedia France office, on their smartphones:

  • Samsung Galaxy A51 (SM-A515F- Build/RP1A.200720.012)
  • Android 11 (màj mai 2022)
  • Chrome 102.0.5005

Those phones are still in there office, and could be provided to a developer investigating this bug.


Yug (talk) 14:15, 9 June 2022 (UTC)

Speed up bug

phab:T256663

New Sign language videos not displaying properly

Check-green.svg See meta:Lingua_Libre/SignIt/2023/Phase_1
phab:T312554 and discussion on Commons:Commons:Village pump#New Sign language videos not displayed properly.

Audio click bug solved ?

Record Wizard Update ! The audio click bug should now be solved. Wikimedia France hired 0x010C to rewrite the lingua recorder library and he has found and fixed what was causing the click bug. Please see here for technical details : https://github.com/lingua-libre/LinguaRecorder/commit/88444f2d

We now need your help to test it ! Please check all new audio recordings you make from today onwards, for any glitches, clicks, parasite sounds or data loss. Please do so before and after uploading them to commons. Use the table bellow to list your findings. Many thanks in advance ! -Adélaïde Calais WMFr (talk) 09:38, 8 September 2023 (UTC)


Testing
Username # reviewed % affected Example file Bug within the RecordWizard Review ? Bug After upload to Commons ? Comment


If you find any other bugs, please report them below :


Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.jsLiLiZip.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports