LinguaLibre

Chat room/Archives/2020

< LinguaLibre:Chat room
Revision as of 11:42, 1 December 2020 by WikiLucas00 (talk | contribs) (Archiving of posts older than 6 months)

Custom Commons filename

Check-green.svg Done

Sorry if this is obvious.

How can I upload files to Commons with my preferred file name?

For example File:LL-Q1860 (eng)-Commander Keane-phonate.wav should ideally be named File:En-au-phonate.ogg. Regards --Commander Keane (talk) 05:37, 21 January 2020 (UTC)

Hi, the file are named this way to allow several people to record the same word. Thus, it is possible to have several recordings for "phonate" from differents places in Australia (and other countries) and also to have different prononciations from the same location (gender, age, ...). A way to avoid to use the ame file name is to use the username in the filename. Pamputt (talk) 06:58, 21 January 2020 (UTC)
Is it possible to have my dialect (AU) specified on the Commons file description page?--Commander Keane (talk) 03:54, 23 January 2020 (UTC)
@Commander Keane a dialect is a language without an army. Most dialect actually have a dedicated iso-639.3. In your case it seems more like an accent. Your files will be tagged with you as the author, as well as your linguistic properties defined on your speaker profile (name, gender, place of learning mainly). As for tagging your files with AU, it would require a bot. Bots are simple codes which do automatic actions via mw:API, possibly on Commons, to enrich the file's pages. You are not the first to request such feature, so it may one day come. Yug (talk) 18:48, 22 September 2020 (UTC)
See mw:API:Edit and mw:API:Get for your wish, there is python and javascript avenues. Yug (talk) 05:47, 23 September 2020 (UTC)

Have your say on Lingua Libre's 2020-2021 strategy and more

Dear all,

I hope you are all doing fine in these difficult times.

For those of you who couldn't take part in the volunteer meeting, you can find the key points that were addressed here : https://notes.wikimedia.fr/p/2020-02-29_Lingua_Libre

Following the discussions we had during this meeting and the vote of the budget by the association last Saturday, here are some elements that I would like to open up for discussion :


- with regards to the logo : given that the graphic redesign in process is finishing soon and that the search for a new logo has not been successful https://phabricator.wikimedia.org/T240552, I suggest that we adapt the current logo to the new colour and keep it until further notice.


- with regards to the strategy : several of you manifested the need to make the Lingua Libre community bigger and more international, I therefore suggest that we work on this aspect in the September 2020-June 2021 timeframe with :

1) the integration of a more ergonomic and mutilingual discussion space inviting for discussion and collaboration notably for newcomers on the website

2) the development of Say It, an audio variant of the Sign It extension https://addons.mozilla.org/fr/firefox/addon/lingua-libre-signit/#&gid=1&pid=3 that would display instead of the word in sign language, its pronunciation from the LL audio in a pop-up window when highlighting the word in any webpage, this would allow for a more concrete and immediate impact for a broader range of web users, not only wikimedians

3) the organisation of a series of hackatons destined to develop Lingua Libre bots towards other wiktionaries - these could make Lingua Libre more known rapidly and efficiently

4) once the graphic redesign complete, a series of editathons for the renewal and development of the website contents : the improvement of tutorial pages aiding contribution in particular, but also the writing of project pages exemplifying what certain communites achieved and are working on with Lingua Libre, and the translation in as many languages necessary and possible


For those who wish to take part in the development of Lingua Libre in the long term, I suggest that we schedule a remote meeting to discuss the execution of the 4 points above (or their revision!) before the official beginning of the 2020-2021 year in June.

In order to ensure the stability and coherence of the project, we could also discuss on this occasion the constitution of a strategic volunteer committee to supervise the phases of development of Lingua Libre along the semesters.


Thank you for taking the time to vote for the future of the logo here : https://framadate.org/gSfBpVYeqzYWXLn6

and write down your availabilities for a meeeting in April-May 2020 if interested : https://framadate.org/kdn7tGoqDgjpAj5i

you can also give us your feedback on the 4 points above on this pad : https://notes.wikimedia.fr/p/2020-02-29_Lingua_Libre_Follow-up


Thank you in advance for your replies, apologies for the length of this email, and good luck for the lockdown...

One last thing: do not hesitate to forward this email to all those who you suspect may not be on the mailing list but would be interested to join, or to post this message below in various discussion pages (to be polished at will).

Hi! In case you have ever contributed to Lingua Libre but are not on the mailing list, please consider signing up here : https://lists.wikimedia.fr/info/lingualibre to receive updates and take part in the discussions around the project :)

Best regards, Emma Vadillo Quesada


concours #Wikicheznous sur les projets wikimedia

Bonjour à toutes et à tous,
Wikimédia France lance le concours #wikicheznous pendant le confinement : du 8 avril au 6 mai, vous êtes invité⋅e⋅s à contribuer sur les projets wikimédia ainsi que sur lingua libre et vikidia. Sur lingua libre, ce sont les contributions en langues autres qu'Anglais et Français qui pourront concourir. Il suffit de vous inscrire sur le dashboard pour que vos contributions soient comptabilisées via commons. Alors à vos micros ! et n'hésitez pas à aller faire un tour du côté des autres projets wikimedia.
Pour plus de renseignements, rendez-vous ici : https://www.wikimedia.fr/lancement-de-loperation-wikicheznous/
Bon courage et amusez-vous bien. --Adélaïde Calais WMFr (talk) 12:00, 8 April 2020 (UTC)


Record Wizard translation

Check-green.svg Done @0x010C I assume this issue is closed. Yug (talk) 18:40, 22 September 2020 (UTC)

Hi, can someone please merge the translation at Translate.wiki? I translated Record Wizard into Japanese several weeks ago. Thanks in advance. Higa4 (talk) 14:31, 9 April 2020 (UTC)

Hi Higa4 and thank you for the translations in Japanese. I do not know how often Lingua Libre get the new translations from Translate Wiki. 0x010C certainly knows. Pamputt (talk) 14:33, 12 April 2020 (UTC)
Thanks for your comment. Anyway,I hope someday when having time. Higa4 (talk) 07:41, 15 April 2020 (UTC)
Hi Higa4, Usually translations are pulled every day from TranslateWiki, but due to a main technical overhaul in recent weeks, this has been temporarily stopped. Translations will be pulled again in a week or two :). Thanks for your involvement! — 0x010C ~talk~ 08:04, 21 April 2020 (UTC)


Premier essai

J'ai fait un premier essai d'enregistrement ! Je trouve l'ensemble assez sympa. J'ai enregistré plus de 50 mots, je suis fier ! Voir https://commons.wikimedia.org/w/index.php?title=Special:ListFiles/Touam

Les innévitables questions :

  • Comment ça se passe ensuite pour que ces mots aillent sur le wiktionnaire ?
  • J'ai remarqué que la catégorie "Lingua Libre pronunciation by Touam" sur Commons restait en lien rouge... Pourquoi ? Qu'est-ce qu'il faut faire pour qu'elle passe en bleu, ou du moins en quelque chose de consultable ?
  • Y a-t-il des projets pour autre chose que des mots ?

En tous cas cet outil me séduit bien. je vais essayer de continuer. --Touam (talk) 20:20, 23 April 2020 (UTC)

Bonjour Touam et bienvenue sur Lingua Libre. Tout d'abord, félicitations pour ces premiers enregistrements. En espérant que ça sera le début d'une très longue série. Concernant tes questions
  • Comment ça se passe ensuite pour que ces mots aillent sur le wiktionnaire ?
C'est ajouté automatiquement chaque nuit par Lingua Libre Bot.
  • J'ai remarqué que la catégorie "Lingua Libre pronunciation by Touam" sur Commons restait en lien rouge... Pourquoi ? Qu'est-ce qu'il faut faire pour qu'elle passe en bleu, ou du moins en quelque chose de consultable ?
Oui, je crois qu'il faut créer cette catégorie manuellement. Tu peux t'inspirer de la catégorie me concernant.
  • Y a-t-il des projets pour autre chose que des mots ?
Tu penses à quoi en particulier ? Il est possible d'enregistrer des vidéos pour des mots en langue des signes. Pour le reste, n'hésite pas à indiquer quels seraient tes souhaits.
Pamputt (talk) 19:54, 27 April 2020 (UTC)
Merci Pamputt pour ton aide, je voudrais bien continuer, mais les téléchargements vers Commons ne fonctionnent plus ? --Touam (talk) 13:01, 29 April 2020 (UTC)
Etrange, ça fonctionne pour d'autres. Tu peux réessayer ? Pamputt (talk) 09:56, 30 April 2020 (UTC)
Je viens de re-éssayer et ça ne veut toujours pas télécharger vers Commons. J'ai enregistré "acronymie" et "anonymie". Ça me dit juste "Aucun téléversement n'a réussi" et ça me met un point d'exclamation rouge au niveau de chaque mot dans le record wizard. J'utilise firefox sous Linux Mint. Voilà tu sais tout je crois. Et je suis bien logué, comme tu peux le voir à ma signature. (qui d'ailleurs, n'est pas dans les boutons de l'éditeur de wikicode ?? ). --Touam (talk) 20:01, 30 April 2020 (UTC)
Re-essayé ce matin avec "je" "tu" "il" - je racourcis mes mots de plus en plus - toujours pareil. Je suis en plein désespoir. --Touam (talk) 06:14, 1 May 2020 (UTC)
Il s'agit donc de la dernière étape, là où Lingua Libre essaie d'envoyer les enregistrements sur Wikimedia Commons. Ca m'arrive parfois sur quelques enregistrements. Est-ce que tu peux essayer sur un autre navigateur juste pour voir si ça se passe mieux ? Pamputt (talk) 11:04, 1 May 2020 (UTC)
Oui, je viens d'essayer avec Chromium Version 81.0.4044.122 (Build officiel) Built on Ubuntu , running on LinuxMint 19.3 (64 bits), même mauvais résultat. --Touam (talk) 13:05, 1 May 2020 (UTC)
Hmmm, c'est effectivement frustrant. Je n'ai aucune idée du pourquoi du comment. Peut-être que 0x010C a quelques explications. Et juste pour être sûr, tu n'a pas de problème pour envoyer directement des fichiers depuis le site de Wikimedia Commons ? Pamputt (talk) 15:20, 1 May 2020 (UTC)
Bonjour,
Un Stewart a bloqué une grosse plage d'IP sur tous les projets Wikimedia. Le blocage du serveur de Lingua Libre a été un dégât collatéral et accidentel. Thibaut, un administrateur de Commons, vient de lever le blocage sur Commons pour résoudre le problème, merci à lui.
Cordialement, — 0x010C ~talk~ 15:49, 1 May 2020 (UTC)
Oui, merci à tous deux, j'ai pu enfin enregistrer "je" "tu" "il" ! Je vais m'engager à enregistrer des mots plus complexes, maintenant... Si vous avez des conseils à me donner... --Touam (talk) 16:59, 1 May 2020 (UTC)
Nickel, pour les listes de mots, ça dépend de tes envies. Si tu veux tu peux enregistrer les noms des villages autour de chez toi ou de ton département. Tu peux importer des listes de mots en récupérant directement le contenu d'une catégorie du dictionnaire (verbe, nom de métiers, animaux, vocabulaire technique, etc). Bref il y a le choix. Pamputt (talk) 19:02, 1 May 2020 (UTC)
Voilà je viens de faire un peu plus de 100 mots, tout s'est bien passé. J'ai observé que, sur le wiktionnaire, il ne m'ajoute pas les mots dans ma liste de suivi, alors que sur Commons il me les rajoute bien. Bug or Feature ? (it’s not a bug, it’s a feature sur le wiktionnaire). Perso je préfèrerais ajouter ces mots dans ma liste de suivi wiktionnaire. --Touam (talk) 12:50, 2 May 2020 (UTC)
Pour la liste de suivi, c'est facile à faire sur Commons car tu es le créateur du fichier en tant qu'uploadeur. Sur le Wiktionnaire, c'est Lingua Libre Bot qui ajoute les prononciations audio dans les pages donc je ne sais pas s'il est possible techniquement d'ajouter une page dans la liste de suivi de quelqu'un d'autres (j'aurais tendance à penser que non). Pamputt (talk) 08:47, 3 May 2020 (UTC)


One word, one language, one page

The most important change I think to attract people to the project is to make it super user-friendly to browse existing content. That should be done by having a separate page per word per language. When people can browse and listen to the existing content easily, they will be motivated to add content themselves. Compare a site like Forvo which has 9535 NL speakers contributing https://forvo.com/languages/nl/, whereas LinguaLibre has only 1 NL speaker.

A possible structure would be:

  • /fr/ would be the home page for French words
  • /fr/word/chien would be the page for the French word "chien"
  • /fr/audio/joe-bloggs would be the page listing the links to all the recordings from Joe Bloggs

A more sophisticated URL scheme could be:

  • /fr/word/chien-12345 would be the page for the French word "chien" with unique id 12345
  • /en/fr/word/dog-12345 would be the EN URL for the french word "chien" with unique id 12345


The current "Browsing the sound library" is extremely user-unfriendly. It is really only a "track down the zip which you then have to download and unzip" feature. The "Browsing the sound library" page https://lingualibre.fr/wiki/Help:Sound_library which then takes you to https://lingualibre.fr/wiki/LinguaLibre:Records which then spends forever loading is a complete turn-off to all but the most dedicated visitors.

Some may say that the content will be used by other sites (such as Wiktionary or Wikidata), so it is not necessary, but Lingua Libre will only be able to do a good job of feeding these other sites if it does a good job itself of being a fantastic browsing tool for the source recordings.

In future, it would be great if all power users could use an API to go directly to a particular page and get the recording(s), e.g. /lingua-libre/fr/chien would give me the recording of that particular word in that particular language.

Examples from other sites

Shtooka

Shtooka used to be like this, that you could simply browse to the page for a particular recording. This was great. Lingua Libre has lost a lot of this simplicity, and so has taken a large step backwards in terms of easily attracting people to the concept.

Lingopolo

My own site https://lingopolo.org/, has one word per language per page, e.g. https://lingopolo.org/dutch/word/dog https://lingopolo.org/thai/word/dog https://lingopolo.org/french/word/dog although I think in some ways it makes more sense to use the word in the language on the URL. Notice too how I have a page per audio author, e.g. https://lingopolo.org/dutch/audio/J.vdleeNL and a page listing all audio authors (per language) https://lingopolo.org/dutch/audio

Forvo

https://forvo.com/ gives a good example of one way this can be organised, but also of just how much people go mad to help. The https://forvo.com/languages/ gives an excellent overview of what languages are well supported. Each language has it's own home page, e.g. https://forvo.com/languages/nl/ for Dutch where you see a link stating the number of pronounced words. https://forvo.com/languages-pronunciations/nl/ shows you all the pronunciations, by word. Any individual word has its own page like https://forvo.com/word/jongen/ Forvo though thinks of the pronunciation of a word first, and then language second, for example "main" means very different things in English and French, but Forvo puts them both on the same page https://forvo.com/word/main/ even though separated by language. I would not propose Lingua Libre to go that far, but rather link to words spelt the same in other languages, so the Ligua Libre pages would be "English" -> "main" with a link to the "French" -> "main" page.

Pixabay

https://pixabay.com/ is not a recording site at all, but it has a very well thought-out structure for its URLs. Each image has its own page, but the interesting thing is that each image page can be accessed in different languages. For example, the EN image https://pixabay.com/photos/pug-puppy-dog-animal-cute-690566/ also has an FR URL https://pixabay.com/fr/photos/pug-chiot-chien-des-animaux-mignon-690566/ and an NL version https://pixabay.com/nl/photos/pug-puppy-hond-dierlijke-cute-690566/ etc. -- Rugops

Hi Rugops. Thank you for this very interesting feedback. For sure, Lingua Libre needs to be improved to become more "user-friendly". Browsing audio recording may be one way to do it. Find a way to propose list of words to record more easily would be another. I open a a Phabricator ticket to think more about your ideas. Pamputt (talk) 07:16, 10 May 2020 (UTC)
Agree. More could be done for browzability and maintenance (verify audio, request re-recording to speakers), etc.
It should also be noted that LinguaLibre is the DIRECT CHILD of Shtooka. Nicolas Vion who created Shtooka and myself who was PhD student at INALCO and volunteer at Wikimedia looked for way to move Shtooka from a C++ desktop based recorder toward an online HTML5-based recording apps. We then connected to Wikimedia, Lyokoi, Remy Gerbet so Nicolas was hired as freelance by Wikimédia France and got a month or two dedicated to create LinguaLibre v1.0 which was PHP-based.
One core issue is that 2000s' Nicolas moved from being a young and passionate developer in his 20s to new life priorities and developments, aka the classical open source project's evolution and Benevolent dictator's drifting away for new adventures and transmitting the project to a new generation. Software development is since mainly done by sprints, via a talented developer as well, but integrating the project into the Wikimedia and MediaWiki galaxy has naturally absorbed a lot of dev energy (second issue). Overall, all strength of Shtooka haven't been moved to LinguaLibre, while LinguaLibre also has strengths that Shtooka never had.
As of today (May 29), a new UI have been unrolled showing Wikimedia France will to improve the project. But more should be done in term of sound-specific UX and features. Feature requests must be collected here or better on phabricator, and WM France must be notified of the users' requests :) Yug (talk) 15:53, 30 May 2020 (UTC)
Agree I think its important to also manage alternative spellings and accents. This is an issue that Forvo is currently failing to manage and taking up considerable times for editors. Right now, Lingua Libre uses geographic region as a crude stand in for accent, but I think it's important to explicitly state the accent. I also think that we need a discussion of whether or not to include (in)definite articles. Lastly, we should also think about how to make the files easily scrapable so that they can be batch imported into a program such as Anki. We should also do more to delete poorly pronounced or recorded pronunciations. Languageseeker (talk) 17:07, 20 October 2020 (UTC)
Edit: @Rugops You inspired me to create a phabricator ticket on how to do this. I'd love your feedback on it. Languageseeker (talk) 02:01, 23 October 2020 (UTC)


TypeError: this.pastRecords is undefined

Check-green.svg Done

Bonjour, tout le monde.

Quand j’essaye de sélectionner des mots à enregistrer à partir d’une catégorie du Wiktionnaire français en excluant les termes déjà enregistrés, je reçois une erreur TypeError: this.pastRecords is undefined. Décocher l’option éliminant les termes déjà enregistrés résout le problème, mais je n’ai pas vraiment envie de réenregistrer des mots déjà traités. Ça sent le bug ; quelqu’un pourrait corriger ?

LoquaxFR (talk) 08:59, 12 May 2020 (UTC)

Bonjour,
Je vais regarder ça, mais pas sur d'avoir le temps de corriger ça rapidement car je prépare le déploiement de la nouvelle grosse version de Lingua Libre et du RecordWizard actuellement (je pense début de semaine prochaine environ). En attendant celle-ci (qui n'a pas ce bug), tu peux désélectionner l'option dans le générateur, et retirer les mots une fois dans la liste principale en cliquant sur le bouton Retirer les mots déjà enregistré.
Désolé pour le problème :/.
Cordialement — 0x010C ~talk~ 12:44, 12 May 2020 (UTC)
Ben, je viens de retenter, et ça marche, même sans utiliser ton contournement. Tant mieux. Merci du tuyau, en tout cas, et bon courage !
LoquaxFR (talk) 13:35, 12 May 2020 (UTC)

Wikidata

Hello, step by step we are spreading Lingua Libre among the community in Wikimedia Spain and two questions have arisen; on the one hand, would it be possible for a bot to automatically add the audios from Wikidata to the different Wiktionaries? On the other hand, can different accents from the same language be added to the audio statement in Wikidata?. Thanks. Rodelar (talk) 12:09, 22 May 2020 (UTC)

Hello Rodelar, thanks for adding Spanish pronunciations.
About Wikidata, I added your request on this Phabricator ticket in order to remember it.
To add audio pronunctions to the Wiktionary pages, there are at least two options
  1. the current method is to write a bot that add them. We code is available here. Lingua Libre Bot is already running for fr and oc Wiktionaries. You can have a look to the code about oc Wiktionary and try to adapt it for eswiktionary. If so, you can send a pull request to be added in the Lingua Libre Bot code. Then, the bot will add the new audio pronunciations (in any language) on the Spanish Wiktionary. It has to be done one by one by for each Wiktionary because the page structure is different from one Wiktionary version to another. And here comes the second "solution".
  2. the other method is to get the pronunciation data (and other data) directly from Wikidata and to display them in the Spanish Wiktionary. Wikipedia already does that (with the infobox for example). This requires that the access to the lexicographical data be enabled. The T212843 ticket follows progress on that but it is currently not yet possible to access them.
Cheers Pamputt (talk) 06:29, 23 May 2020 (UTC)

Compilation de textes

Check-green.svg Done

Bonjour,

Y a-t-il un outil qui permet de soumettre un texte, et il compilera automatiquement les enregistrements de mots ?

AirSThib (talk) 13:11, 2 May 2020 (UTC).

Bonjour AirSThib, juste pour être sûr de bien comprendre ce que tu veux ; tu voudrais copier coller un long texte dans la fenêtre d'ajout de mots et que Lingua Libre « découpe » automatiquement les mots pour que tu puisses ensuite les enregistrer un par un. C'est bien ça ? Pamputt (talk) 08:48, 3 May 2020 (UTC)
Bonjour @Pamputt En fait c'est plutôt l'inverse, je voudrais que j'entre un texte et que Lingua Libre compile les mots, les ajoute bout à bout pour créer un texte enregistré. AirSThib (talk), le 08:46, 4 May 2020 (UTC).
Non, ce n'est pas encore possible d'enregistrer un texte, un poème ou n'importe quoi d'autres de long. Pour le moment Lingua Libre détecte les blancs et passe au mot suivant ; c'est son seul mode de fonctionnement. Mais ta demande revient régulièrement donc j'ai ouvert un ticket sur Phabricator pour en garder trace. Pamputt (talk) 10:23, 5 May 2020 (UTC)
@AirSThib Did Pamputt answered your question ? Yug (talk) 18:39, 22 September 2020 (UTC)

2020.05.29 - new LinguaLibre UI and UX

Hi, let's create below a list of points to review and improve. The discussion must be centered around finding pratical, rapid solution to the issues found :) Yug (talk) 16:20, 30 May 2020 (UTC)

CSS

  • Check-green.svg Done CSS could be improve. As admin, where could we edit it or suggest modifications ? (ex: h2 { margin-top: 1em; })
    I guess we should edit MediaWiki:Common.css to modify that. Pamputt (talk) 12:51, 31 May 2020 (UTC)
    The best would to make a pull request on the skin's git repository. — 0x010C ~talk~ 14:36, 1 June 2020 (UTC)
  • Recording icon : the previous version had reddish micro-phone icon to enlighten the "Record Wizard" button. I guess the icon was willfully dropped.
    Which icon are you talking about? Pamputt (talk) 12:51, 31 May 2020 (UTC)
    MediaWiki:Common.css now contains guideline on how to submit new CSS to Lingua Libre. Thanks Pamputt & 0x ;) Yug (talk) 10:15, 3 June 2020 (UTC)
    Breezeicons-status-22-mic-red-LinguaLibre.svg <- this icon, which we previously CSS integrated. The bright red was chosen on purpose to attract the visitor eyes to the recording button. Also, the new skin is white/blue centered. All these must be balanced. Yug (talk) 10:19, 3 June 2020 (UTC)
    @Yug This icon was purposely removed by the UI/UX specialist who created the new mockups. — 0x010C ~talk~ 11:03, 3 June 2020 (UTC)

Content

  • Check-green.svg Done LinguaLibre:Stats#Number_of_records_per_languages (edit SPARQL query)- the table is not-human readable. English name or ISO639-3 codes are needed. LL's languages items locally just have English name & wikidata id... editing the query so it display English name and/or query from wikidata the iso639-3 code would be appreciated.
    Full language names have been dropped since the database has grown too much for the request to respond without a timeout. This may be restored once the work on either the SPARQL endpoint performance or the QueryViz caching feature has been done. — 0x010C ~talk~ 14:36, 1 June 2020 (UTC)

Baleswari Odia (dialect of Odia language) and Odia

Some requested features for Lingua Libre including an option for changing the default naming option and custom metadata (particular multilingual descriptions in wiki-code). (details below)

Just recorded over 300 words in the Baleswari dialect of the Odia language. The new UI is certainly better, more effective and faster than the last one.

a. Multiple recordings of the same word

Check-green.svg Done (part of a group of 4 sections)

However, the "remove words already recorded" feature does not detect words recorded by the same user on a different date. So, a newer version of the same recording gets updated on Commons. This is not useful. Ideally:
One should be able to upload multiple recordings of the same word. While uploading, they should be able to see the duplicates, and have an option to remove some/all the duplicates from the new list. If a user decides to re-record an existing word, new files should be created instead of uploading a new version of an existing file as it is happening now) e.g. if the old file was "OLDNAME.wav", the new file should be "OLDNAME_01.wav". If both "OLDNAME.wav" and "OLDNAME_01.wav" exist, then the third recording should be "OLDNAME_02.wav".

  • Thanks for your recordings Psubhashish.
    About the first point (removal af duplicate), I will check and open a Phrabicator ticket if I can reproduce because this is definitely a regression compared to the previous version of the website.
    Thanks, please do add "psubhashish1" to the subscriber list when you create the ticket. --Subhashish Panigrahi (talk) 13:18, 4 June 2020 (UTC)
    Psubhashish, I just tested and if I click on "remove all the duplicates" at the "recording list" step, the words that I added and I have already recorded are removed. Could you try again? If it does not work for you, can you open a Phabricator ticket and describe exactly what you do in order to be able to reproduce? Pamputt (talk) 15:33, 2 June 2020 (UTC)
    It is working for recordings made from a particular list on LinguaLibre. But, when I try to record the pronunciation of a word that I myself had recorded earlier, it doesn't flag that a duplicate exists on Commons. Ideally, it should let me know that a recording that I myself made already exists so that I can decide if I want to record or not. If I decide to rerecord, it should go as a new recording with a suffix "_01.FILENAME". --Subhashish Panigrahi (talk) 13:18, 4 June 2020 (UTC)
    Currently it is working this way. When you create (or load) a list of words to record, there is always a button saying you can remove from the list all the words you have already recorded. Lingua Libre does not say you directly that there are words you have already recorded in the list (I think because user experience is better this way (less messages)). To have different recording is only possible now if you add information in brackets after the word you want to save (example: "cat (some information)"). That's said, I do not really see what use case would need such feature. Pamputt (talk) 07:45, 5 June 2020 (UTC)
    @Psubhashish Regarding the deduplication feature, it is working fine on my side. If it's not working on your side, it may be related to the Odia script, which in unicode can have several code points for the same symbol, which do some mess when we do comparaisons. This issue will need to be investigated further. Could you provide one or two examples of transcriptions with witch the deduplicate feature doesn't work for you?
    Regarding your second point, this will not be done, as we don't want to create duplicate files on Commons. If a record has the exact same metadata, it should replace the previous one. If you want to record another speaker, create a new speaker profile for him (step 2 of the Record Wizard). If you want to record in a different dialect/language, add this new language/dialect on your speaker profile (on step 2) and select it for your records (on step 3), see also my answer in section D below. If you want to record some heteronym, you can add a textual qualifier between bracets at the end of the transcription of that word, example: "desert (arid region)" and "desert (leave)".
    Best regards — 0x010C ~talk~ 23:28, 8 June 2020 (UTC)
    I feel that for heteronym's it would be better to have the textual qualifier as a separate metadata item. Otherwise, it might get difficult to find them. It's also easier to ensure accurate formatting if it's done automatically that if user's do it manually. Currently, Forvo has a big problem where there is no standard way to distinguish heteronym's leading to quite a bit of difficulty in finding them. Languageseeker (talk) 19:38, 20 October 2020 (UTC)

b. Custom-categorization:

Check-green.svg Done (part of a group of 4 sections)

There is no option for a user to decide about the categories. I might want to add a custom category (say a category for each date) for a particular batch.

  • About categories, you would like to be able to create your own categories on Wikimedia Commons. For example, instead of automatic categorisation in Category:Lingua Libre pronunciation by Psubhashish, you would like to be able to set a custom name for a category. This category would be categorized in Category:Lingua Libre pronunciation by Psubhashish, which means all custom categories would be subcategories of the main categories created automatically by Lingua Libre. Do you agree? If so, I think it looks like what is asked in T201135.
    Pardon for repeating the question. I see a discussion from 2018 but it doesn't tell me how to add a custom category for a batch. Can you probably explain here or, much better, add to the help page? --Subhashish Panigrahi (talk) 13:18, 4 June 2020 (UTC)
    This feature does not exist yet. The Phabricator ticket is just there to remind that this feature is frequently asked and should be considered by developers. Pamputt (talk) 22:04, 8 June 2020 (UTC)
    @Psubhashish This is indeed an interesting feature, I will prioritize it for the next update. — 0x010C ~talk~ 23:07, 8 June 2020 (UTC)
    @Pamputt & Psubhashish a way to go would be to have a bot which use mw:API:Edit on the list of audio files. Basic js skills is enough to go forward, and LinguaLibre will sooner or later need such bot for maintaining Commons pages. Just... let's keep it it mind. Yug (talk) 18:31, 22 September 2020 (UTC) (PS: I'am learning about Commons bot at the moment, come back to me if needed. Do we have other bot master here ?)

c. Custom metadata parameters:

Check-green.svg Done (part of a group of 4 sections)

There should be at least one additional parameter for the metdata (description, etc. that appear on Commons) so that the user can add some additional information. I personally speak in multiple accents and I'd like to denote the accent used for each batch separately. Having this option would be of great help. It is not practical to edit hundreds of files manually to make such changes once uploaded on Commons. --Subhashish Panigrahi (talk) 07:40, 2 June 2020 (UTC)

  • Metadata: this is an interesting idea. You would like to have something like « free text » that would be a parameter attached to one speaker (you can define several speakers for yourself, one for each accent). A Phabricator ticket should be opened to track this idea.
    Pamputt (talk) 11:46, 2 June 2020 (UTC)
    I have added this to Phrabricator (T254241). --Subhashish (talk) 06:43, 5 June 2020 (UTC)
    See the image above for an example how I'd personally like to use a bilingual wikicode-based description as opposed to the current one. The latter is not very helpful for someone who is viewing a standalone file. --Subhashish (talk)

d. Standardized naming:

Check-green.svg Done (part of a group of 4 sections)

Currently, Lingua Libre follows a naming which prefixes a long text whereas audio recordings of pronunciations are generally named on Commons in "LANGUAGECODE-DIALECT OR VARIATION CODE-WORD" format e.g. if the word "color" needs to be recorded in an American accent, an ideal way of naming it would be "File:En-us-color.wav" whereas "en" stands for English, and "us" for American. In the picture uploaded above, I have used a similar format ("ori" being the language code for "Odia" and "nor" being the code for the Northern Balasore (or Baleswari Odia) dialect). I understand that Lingua Libre follows a different format. But can a user opt (or modify in a batch) the naming that they prefer? Better, can Lingua Libre suggest a standardized naming for users so that the recordings on Commons have a much more standardized naming? The naming that I've suggested is something I learned from others on Commons but they make sense from a linguistics standpoint. It's simple, short and does the job. I had to use another code and spend hours to rename only a few hundred files whereas having an option to change in the first place while uploading could have been much easier. --Subhashish (talk)

It has been decided not to get only the language and the place where the locutors have learnt their language. If I understood well, it is more relevant from a linguist point of view because most of the people are not aware that they speak a specific variety of their language. For example, we could use en-us but why it is more relevant than en-us-Texas or en-us-Florida where the accents are probably different. Maybe Lyokoï or Noé could say more about this point. Pamputt (talk) 22:11, 8 June 2020 (UTC)
Hi @Psubhashish
We perfectly know this naming convention that exists on Commons, and it's true that it's shortness is an advantage. But we purpously decided not to use it. We've immagined our naming convention so that our filenames can be as precise and unique as possible: a file corresponds to a transcription recorded in a specific language/dialect by a specific person.
  • The other naming convention doesn't allow 2 person to record the same word in the same language, without starting using tricks like appending 2, 3, 4,... ; that's why we include the name of the speaker in the title.
  • Language codes used on Commons are a bit random sometimes especialy if we look at minor languages, which often have not those standardized code. But as Lingua Libre aims to be able to record all languages, common or minor, we prefered using Wikidata Qids for every languages; it may be less plesant to read but it fits every known languages/dialects. To follow Pamputt's example, we have a standardize code for Texan english, which is Q7707309 ;).
By applying those rules for each records, we are sure to be able to have a consistent naming convention, for all languages and dialects, supporting records of same words by multiple speakers, and that's why we can safely override files if the same speaker record the same word in the same language.
(for your case of two dialects, you won't get any problem if you want to record the same words one time in standard(?) Odia (WD:Q33810 = LL:Q336) and another time in Baleswari Odia (WD:Q4850727 = LL:Q322719), as long as you select the right language/dialect before the recording process.)
The real issue is that we have imported on Lingua Libre only a subset of all available languages/dialects on Wikidata (as it's growing fast), and we have to manually import missing ones from time to time :/
Best regards — 0x010C ~talk~ 23:05, 8 June 2020 (UTC)

Bugs

ratelimited

Check-green.svg Done See LinguaLibre:User rights, T260649, T245214, Commons:Commons:Guide_to_batch_uploading#Rate_limits, mw:Special:MyLanguage/Manual:$wgRateLimits. Test your ratelimit : here.

Luilui6666 also got about 1/4 of her audios not uploading, this for past 3 months. She either have to wait and retry or to give up. The last event, on July 10th, returned an error mentioning

[RequestQueue] Reject ratelimited
Object:
 *: "See https://commons.wikimedia.org/w/api.php for API usage. Suscrib..."
 code:"ratelimited"
 info:"You've exceeded your rate limit. Please wait some times and tray again...

I can't say more. Did any of you encountered such event ? Yug (talk) 16:30, 16 July 2020 (UTC)

I do not know either. You should open a Phabricator ticket about this issue. Pamputt (talk) 07:46, 18 July 2020 (UTC)
@Luilui6666 @Yug @Pamputt This error could be due to your user rights on Wikimedia Commons. Without the "autopatrolled" rights, you are not able to upload more than ~400 files/hour. — WikiLucas (🖋️) 23:40, 17 August 2020 (UTC)
Indeed, this is a possible explanation. I opened T260649 to keep track. Pamputt (talk) 06:23, 18 August 2020 (UTC)
@Pamputt, WikiLucas00, & 0x010C the classic 1 hour recording sprint generate 800 records. It is frequent to do a 2hours, 1500+ recording sprint per day. Also, we must take note that 0x010C won't be able to save us from now on. Do we have an identified fall back ? Yug (talk) 05:59, 5 September 2020 (UTC)
@Pamputt I searched in the following without success:
I didn't find anything relevant to upload amount. Luilui6666 limitation happened more than 4 days after her account creation. I'am not sure what went on. Yug (talk) 07:35, 18 September 2020 (UTC)
@Pamputt Found it in Commons:Guide_to_batch_uploading#Rate_limits !
Rate limits

Normal users on Commons are rate limited to 380 uploads per 72 minutes. Users granted image-reviewer, patroller, or autopatrolled status have a ratelimit of 999 uploads per 1 second. Users can apply for these user rights at COM:RFR.

It mentions mw:Special:MyLanguage/Manual:$wgRateLimits. This "380 uploads" match the "around 400 uploads" described by User:Luilui6666. Yug (talk) 12:37, 18 September 2020 (UTC)

Ajouter la langue Mozabite

Check-green.svg Done

Bonjour, ma langue mere est le Mozabite (mzb) Q36149 (wikidata). je peux pas contribuer en cet langue. pouvez vous l'ajouter s'il vous plais. --Arha06 (talk) 19:24, 22 July 2020 (UTC)

@Arha06 bonjour et merci pour vos enregistrements sur Lingua Libre. Je viens d'ajouter le mozabite sur Lingua Libre. Il est donc possible d'enregistrer des mots dans cette langue dès à présent. Bonnes contributions. Pamputt (talk) 08:26, 24 July 2020 (UTC)

Adding list into RecordWizard

Check-green.svg Done

Hello. The Record Wizard offers the Local List, Nearby and Wikimedia category buttons as ways to generate a list of words to record. The alternative seems to be typing words, one by one, in the "Type here the word to record". I have my own list in a file, but when I paste it in that field, it thinks it's 1 word instead of many. Is there a way I can provide my own list without having to type the words one by one? Julien Baley (talk) 22:19, 27 August 2020 (UTC)

Hi Julien, it is possible to create local list as much as you want/need. So to create your own, you can paste the content of your file into List:Fra/Julien Baley (for example); each word is separated by #. Pamputt (talk) 18:19, 28 August 2020 (UTC)
Thanks a lot! Is there any naming convention, or can I create whatever I want? Julien Baley (talk) 19:19, 28 August 2020 (UTC)
No, so far there is no naming convention. Yet, if you want your list be record by other user, you should use a self-supporting name. You can see some examples here. Pamputt (talk) 19:46, 28 August 2020 (UTC)
@Julien Baley please browse Help:Main ;) Yug (talk) 18:04, 22 September 2020 (UTC)

How much to record at once?

Check-green.svg Done

Hello! I have a little question regarding risk management; when I record words, where is the data stored before it's uploaded? What happens if my Internet connection dies out? Can I refresh the page, or am I losing all the non-uploaded recordings? What are your suggestions in that respect? Julien Baley (talk) 09:32, 2 September 2020 (UTC)

@Julien Baley I am not sure (to be tested) but I would say that the recordings are stored on the server as long as your browser is not closed. So if your internet connection dies out but you do not change the web page, you may be able to send to server the words from the last you recorded as soon as your internet connection is restored. Maybe it is even possible to continue to record if recordings are stored locally on your computer before to be sent to the server. Yet, as I said, it should be tested. Pamputt (talk) 22:26, 9 September 2020 (UTC)
@Julien Baley The audio are first stored locally, within your browser's tab memory. In case of staled upload, KEEP THIS TAB OPEN, reclick "Upload" (or "Retry Upload" ?) few hours later. Yug (talk) 07:58, 18 September 2020 (UTC)
For new accounts, there may be some limit in the number of daily uploads. We suspect something around 400 uploads. If so, keep the computer and browser tab open, stay on that page, and retry upload later. Yug (talk) 11:58, 18 September 2020 (UTC)
Hello ! As you noticed in the other section I confirmed that "new users" (according to Wikimedia Commons) are limited to 380 upload per 72 minutes.
Checking on this Commons API, I can see that User:Titodutta has lot of higher-rights groups providing a ratelimit of 999 uploads / sec, while user:Julien Baley is still a new user within initial groups [ "*", "user", "autoconfirmed" ] which provide a maximum of 380 uploads per 72 minutes. See LinguaLibre:User_rights#User_rights_on_Commons.
@Julien Baley , you previously recorded 60 audio files. If your recorded files above 380 are not uploading... keep the browser tab open, then after one and half hour, click again to upload. You may also request higher rights on Commons, in line with this request. Yug (talk) 18:19, 22 September 2020 (UTC)
@Yug Oh, it may not show on my account, because I'm recruiting people to record words in other languages, and I'd like them not to encounter any problem with the uploading. I usually have a list of several hundreds of words ready, but I can aim to keep it under 380 to avoid any issue. Julien Baley (talk) 11:32, 30 September 2020 (UTC)
@Julien Baley I think the uploader is your account user:Julien Baley, and the speaker doesn't necessary have an account nor on LinguaLibre or Commons. He just need to be defined in LinguaLibre.
If your speaker creates its own account and work independently, your could follow LinguaLibre:User_rights#Request_new_user_rights so to request user rights on Commons for that account. It has proven to be granted quickly, see the Luilui6666's request I made for this user. Yug (talk) 19:39, 3 October 2020 (UTC)

Userrights

Check-green.svg Done
On user rights, I think it would be good to change a bunch of them on LinguaLibre. LinguaLibre is not a central Wikimedia project, better to have lower "gates". (We depend on Commons)
@Titodutta As I understand it we dependent upon Commons for upload rights. Each mediawiki as build in groups such as users, autopatrolled, admin, etc, which have specific user rights allowing actions and ratelimits defining the number of this action allowed within a time period. So far, I found:
  • user group's upload right: "user": { "hits": 380, "seconds": 4320 } (72 mins)
  • autopatrolled group's upload righ: "autopatrolled": { "hits": 32, "seconds": 60 }
Most LinguaLibre contributions are done via confirmed Wikimedian account, likely to be part of the autopatrolled group on commons, so we rarely run into any upload limit. We run into it when whe have a non-wikimedia, who suddenly rush into LinguaLibre, like did Luilui6666 : 5000 upload within one month (I made a student-rate donation in exchange for this dedicated work, worth it!).
Can we tell Commons API's "Hey, this user account is ok, please grant it <userright>" or should we specifically ask for userights there via Commons:Requests_for_rights and a mentor-based application. By example, I would ask there for User:Luilui6666 to get Autopatrolled users so her upload ratelimit move from 380 per 72mins (group users) to 999/sec. Some digging in this direction would be welcome, so I just created:
  • LinguaLibre:User rights, a new page to expand according to our emerging knowledge on both user rights and ratelimit. Yug (talk) 14:50, 20 September 2020 (UTC)
  • For upload we need to depend on Wikimedia Common's rate and rules, and I believe that's for good. I agree with you that we'll very rarely face this problem for 2 reasons: a) most of us have rights on Wikimedia Commons, b) 380 per 72mins is also too high. Anyway, the only solution I can think of, if an editor is uploading too many words per hour, and uploaded around 500–1,000 files on Commons, try the "autopatrolled" right on Wikimedia Commons. --টিটো দত্ত (Titodutta) (কথা) 18:57, 20 September 2020 (UTC)

L’importateur de langue ne fonctionne plus ?

Check-green.svg Done -- it works, issue closed. Yug (talk) 19:52, 6 October 2020 (UTC)

Bonjour j’essaie d’importer la langue dagbani (Q32238 sur WD) après une demande sur Twitter, mais l’outil d’import ne fonctionne pas, y’a-t’il un moyen de le corriger ou de le contourner ? Lyokoï (talk) 18:00, 19 September 2020 (UTC)

Avec les audios accélérés, il semblerait qu'on ait quelques bugs trés génants. Yug (talk) 14:05, 21 September 2020 (UTC)
J'ai ouvert un ticket sur Phabricator. Peut-être que créer manuellement l'élément permet de contourner le problème. Pamputt (talk) 15:49, 21 September 2020 (UTC)
@Pamputt Il faut faire comment manuellement ? Lyokoï (talk) 18:30, 23 September 2020 (UTC)
@Lyokoï en passant par Special:NewItem, tu peux créer un nouvel élément pour ta langue. Ensuite il faut ajouter les propriétés (tu peux t'inspirer de Q21). Mais pas sûr que ça soit automatiquement reconnu par le système. Ca ne coute pas grand chose d'essayer donc on sera vite fixé. Pamputt (talk) 01:01, 24 September 2020 (UTC)
@Pamputt Merci Je vais essayer ! Lyokoï (talk) 18:32, 27 September 2020 (UTC)
@Lyokoï je viens de faire le test avec le bankon Q386221. Et ça a l'air de fonctionner. Plus de détail ici sur la manière de faire bien que toutes les images aient disparu. Pamputt (talk) 16:44, 29 September 2020 (UTC)
@Pamputt C’est bon, j’ai testé aussi. Lyokoï (talk) 14:43, 30 September 2020 (UTC)
@Lyokoï & Pamputt I tested with Western Kurd (Kurmanji), via administrator language import tool in the Top-right Action tab : it worked. Chrome + Ubuntu 20.04. I think we can close this bug and just keep an eyes on it. Yug (talk) 15:29, 6 October 2020 (UTC)
Indeed, it works here as well. I really do not understand what happened because no one touched the code of MediaWiki:Gadget-LinguaImporter.js last days ... So OK to close the bug report. Pamputt (talk) 19:02, 6 October 2020 (UTC)
Maybe you imported an existing language ? Yug (talk) 19:52, 6 October 2020 (UTC)
No, for sure not. Few weeks ago, the gadget did not allow to enter anything in the field where you type the Wikidata QID. So, no explanation so far but not a big deal. Pamputt (talk) 19:54, 6 October 2020 (UTC)

{{done}}