LinguaLibre

Difference between revisions of "Chat room"

Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.

 
(17 intermediate revisions by 2 users not shown)
Line 326: Line 326:
  
 
== Merging of items about languages ==
 
== Merging of items about languages ==
 
+
:''See also [[Help:SPARQL]] and [[Help:SPARQL for maintenance]].''
 
Hi y'all,
 
Hi y'all,
  
Line 361: Line 361:
 
Love the MediaWiki skin of LinguaLibre and I am curious of skin and customizations made. Who are the authors? (can not see credits) --[[User:Zblace|Zblace]] ([[User talk:Zblace|talk]]) 10:15, 19 February 2022 (UTC)
 
Love the MediaWiki skin of LinguaLibre and I am curious of skin and customizations made. Who are the authors? (can not see credits) --[[User:Zblace|Zblace]] ([[User talk:Zblace|talk]]) 10:15, 19 February 2022 (UTC)
 
:The skin is known as BlueLL. The source code is available on [https://github.com/lingua-libre/BlueLL github]. It has been developed by Wikimedia France  in 2020. That's said, it is true there is no licence and credits on Github. I will ask to {{u|Adélaïde Calais WMFr}} if she remember anything so that I can the missing informations. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 16:58, 19 February 2022 (UTC)
 
:The skin is known as BlueLL. The source code is available on [https://github.com/lingua-libre/BlueLL github]. It has been developed by Wikimedia France  in 2020. That's said, it is true there is no licence and credits on Github. I will ask to {{u|Adélaïde Calais WMFr}} if she remember anything so that I can the missing informations. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 16:58, 19 February 2022 (UTC)
::Hi {{ping|Zblace}}, this skin's author is [[User:0x010C]], ad its opensource. Can be reused freely. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 22:45, 22 May 2022 (UTC)
+
::Hi {{ping|Zblace}}, this skin's author is [[User:0x010C]], and its opensource. Can be reused freely. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 22:45, 22 May 2022 (UTC)
  
 
== New property: translation ==
 
== New property: translation ==
Line 419: Line 419:
 
I wrote a [https://github.com/rkosov/Lingua-Libre-User-Audio-Downloader python program] that downloads all the files created by one user. For video files, it downloads the full webm. For audio files, the default is to download the wave file. However, for audio files, you can optionally choose either mp3 or ogg files. Currently, the configuration requires a minor modification of lluad.py. If there is strong demand, I will write a command line parser for it. Please report any bugs or errors on the github page. Feature requests are welcome. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 02:28, 20 May 2022 (UTC)
 
I wrote a [https://github.com/rkosov/Lingua-Libre-User-Audio-Downloader python program] that downloads all the files created by one user. For video files, it downloads the full webm. For audio files, the default is to download the wave file. However, for audio files, you can optionally choose either mp3 or ogg files. Currently, the configuration requires a minor modification of lluad.py. If there is strong demand, I will write a command line parser for it. Please report any bugs or errors on the github page. Feature requests are welcome. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 02:28, 20 May 2022 (UTC)
 
:{{Ping|Languageseeker}} please add your tool to [[Help:Download datasets]]. It lists several tools with different specifics, your tool is welcome and may help some Python users as well. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 22:41, 22 May 2022 (UTC)
 
:{{Ping|Languageseeker}} please add your tool to [[Help:Download datasets]]. It lists several tools with different specifics, your tool is welcome and may help some Python users as well. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 22:41, 22 May 2022 (UTC)
 +
 +
== Garbage Values in prop:P14  ==
 +
:''See also [[Help:SPARQL for maintenance]] and [[Help:SPARQL_for_maintenance#.E2.9C.85_Speakers_.E2.86.92_Undefined_place_of_residence]].''
 +
As part of my Anki project, I queried the entire LL database and I'm trying to parse the output of ?speaker prop:P14 ?residence. I've noticed that there are a number of garbage values in provided for P14, such as Q1, Q2, Q103962887, Q6099648, Strasbourg. There seem to be three cases.
 +
# Users wishing to enter an extremely vague place such as Earth or the Universe. These should be set to None.
 +
# Users accidentally linking to a disambiguation page. These require correction.
 +
# Users not even entering a Wikidata item which require manual correction.
 +
 +
To solve the root of the problem, I propose that P14 should be restricted to only Wikidata items that exist and have P17. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 21:22, 25 May 2022 (UTC)
 +
:{{Ping|Languageseeker}} it's a good find. If you still have that SPARQL query under hand please add it into [[Help:SPARQL for maintenance]]. Yes, it's something we should clean up i think. There may be some few case where the speaker dont want to share its location but in 95% of cases i think we can go ahead, correct or ask them to correct it. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:39, 26 May 2022 (UTC)
 +
:I noticed that when creating a new speaker, place of learning is optional. Not cool. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 21:32, 27 May 2022 (UTC)
 +
:: {{ping|YUG}} For the life of me, I can't get the federated query to work, but I have a separate query to get the location and country labels from wikidata. These are the problematic ones. Note, that Q20 is on the list because Q20 "Norway" is missing P17
 +
 +
* ['MichaelSchoenitzer', None]
 +
* ['D.Muralidharan', None]
 +
* ['Kaderousse', None]
 +
* ['Krokus', None]
 +
* ['विदुला टोकेकर', 'Q103962887']
 +
* ['DoctorandusManhattan', 'Q2']
 +
* ['Justforoc', 'Q2']
 +
* ['Student16 de', None]
 +
* ['Didierwiki', 'Q6099648']
 +
* ['Sarah2149', None]
 +
* ['DomesticFrog', 'Q1']
 +
* ['Drkanchi', None]
 +
* ['Satdeep Gill', None]
 +
* ['Iwan.Aucamp', 'Q20']
 +
* ['Skimel', 'Q2']
 +
* ['Abeɣzan', None]
 +
* ['Gibraltar Rocks', None]
 +
* ['Bomdapatrick', None]
 +
* ['Ibtissam RAHMOUNI', None]
 +
* ['Trabelsiismail', None]
 +
* ['Ziko', 'Q2']
 +
* ['Youcefelallali', None]
 +
* ['Foxxipeter7', None]
 +
* ['Celevra089', None]
 +
* ['Bodhisattwa', None]
 +
* ['Atudu', None]
 +
* ['KageyamaxNishinoya', 'Q30915818']
 +
* ['Darkdadaah', None]
 +
* ['JayashreeVI', None]
 +
* ['रश्मीमहेश', 'Q103962887']
 +
* ['गीता गोविंद नेने', 'Q103893785']
 +
* ['Awangba Mangang', None]
 +
* ['Abigaljo', None]
 +
* ['FaelDaug', 'Q29423162']
 +
[[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 02:16, 30 May 2022 (UTC)
 +
 +
== Anki Extension Release ==
 +
 +
I just released [https://ankiweb.net/shared/info/124265771 Lingua Libre and Forvo Addon]. It has a number of advanced options to improve search results and can run either as a batch operation or on an individual note.
 +
 +
By default, it first checks Lingua Libre and, if there are no results on Lingua Libre, it then checks Forvo.  To run as a pure Lingua Libre extension, you will need to set "disable_Forvo" to <code>True</code> in your configuration section.
 +
 +
Please reports bugs, issues, ideas on [https://github.com/rkosov/Lingua-Libre-and-Forvo-Audio-Downloader github]. I would love any feedback. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 02:23, 31 May 2022 (UTC)
 +
 +
== Results of Coverage Test of French Lemma and Non-Lemma forms is English Wiktionary ==
 +
 +
While playing around with generating lists for pronunciation from Wiktionary, I decided to run a few tests on the current coverage of French lemma and non-lemma forms in English Wiktionary. I choose French because it is the largest datasets in LL.
 +
 +
Current Coverage of French in Lingua Libre
 +
* Total French Entries in Lingua Libre by a native speaker: 233 982
 +
* Unique French Entries in Lingua Libre by a native speaker: 154 358
 +
* Percentage of overlap: 34%
 +
* Term with the greatest number of pronunciations: "blanc" with 40
 +
 +
Current Coverage of [https://en.wiktionary.org/wiki/Category:French_lemmas Category:French lemmas]
 +
* Total entries in Category:French lemmas: 84 482
 +
* Pronounced entries: 50 917
 +
* Entries with pronunciation: 33 565
 +
* Coverage Percentage: 60.27%
 +
 +
Current Coverage of [https://en.wiktionary.org/wiki/Category:French_non-lemma_forms Category:French non-lemma forms]
 +
* Total entries in Category:French non-lemma forms: 29 1225
 +
* pronounced entries: 26 791
 +
* Entries with pronunciation: 264 434
 +
* Coverage Percentage: : 9.20%
 +
 +
For me, there are several lessons to be drawn.
 +
# First, there has been amazing growth on LL. Covering 60.27% percent is a real achievement.
 +
# The overlap percentage is quite small overall.
 +
# There needs to be a clearer sense of when LL should stop requesting pronunciations for a certain term because 40 pronunciations of "blanc" seems a bit excessive.
 +
#  A need exists to continue pro-actively targeting entries in Wiktionary that are not in Lingua Libre. Currently, 297 999 French lemma and non-lemma forms  require pronunciations.
 +
# Generating lists from Wiktionary and checking coverage is not as hard as I thought.
 +
# Lingua Libre has almost caught up with Forvo in the number of French pronunciations (233 982 vs 254, 703). Overall, Lingua Libre has shown amazing and healthy progress in a very short period of time. I'm excited about these results. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 03:07, 1 June 2022 (UTC)
 +
:{{Ping|Languageseeker}} This investigation is pretty cool. (I'm not sure i understand all your numbers yet, but i will read again when back on my PC). Its quite nice to see we are reaching Forvo level for our lead language. It's possible we have more unique words than forvo since we have [[user:Olafbot]] actively guiding and pushing us on that path.
 +
:On Lili we have chosen to be a learning AND linguistic diversity audio database. When you account for gender, regional accents, age, voice type, having 40 french audios for a word is still 400+ voices short.
 +
:Also, all contributors are not able to contribute audio perfect files due to various shortcomings (hardware, no recording room, no noose cancelling system, etc). We lack proper rating and review system. It's on our [slow] roadmap tho. 😉
 +
:PS: Should i answer to you in French i get a feeling you are French or learning it. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 15:07, 1 June 2022 (UTC)
 +
:: {{Ping|YUG}} Salut, Yug. Oui, je suis en train d'apprendre le français. Comme nous avons discutez pendant notre reunion, c'est difficile de definer les limits d'une language. Comme je le vois, les formes lemma ne suffit pas. Maintenant, je suis en train de crée un Olafbot sur steroid pour francais. Mon plan est de réaliser un program python qui peux analyser les modèle utilizer sur Wiktionary. [[User:Languageseeker|Languageseeker]] ([[User talk:Languageseeker|talk]]) 15:48, 7 June 2022 (UTC)

Latest revision as of 15:48, 7 June 2022

Chat rooms in various languages:
English · 🌐

Chatroom FAQ

How to download all audios of one language? By speaker?

Datasets are availale here. A script is updating the datasets every 2 days, using CommonsDownloadTool. For more, see Help:Download datasets.

How to add missing languages?

Administrators can add new languages on demand, they do so within few days. Please provide your language's ISO 639-3 code and/or its Wikidata ID. For more, see Help:Add a new language.

How to keep my wikimedia project up to date?

Contact Poslovitch, the master of Lingua Libre Bot. For more info, check out Help:Bots and LinguaLibre:Bot.

What IRL events are coming? When? Where?

Please see LinguaLibre:Events.

How to translate LinguaLibre User Interface into a new language?

Go to translatewiki.net. For more, see Help:Translate.

How to archive sections which have been answered?

After reviewing the section, add {{done}} ~~~~ to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Archives
2021202020192018

Datasets out of date

Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)

Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)

Publish on Wikimedia Commons

Hello, I just tested, but my records are not published on Commons. My tests: on Firefox, then on Chrome, with 50, then with 1 expression (s), with license CC3.0-BY-SA and CC1.0. —Eihel (talk) 06:51, 2 May 2021 (UTC)

Problème de publication sur Wikimedia Commons
phab:T281636Eihel (talk) 07:10, 2 May 2021 (UTC)
Usually I have the same with the first two recordings in a session. Then I can upload them again at the end. Try again with more recordings, and using "retry filed upload" button. Poemat (talk) 08:07, 2 May 2021 (UTC)
Yup, I had this bug many times. (I say "had" because I don't remember having encountered it after the fire incident.) Just don't give up and it should be published eventually. DSwissK (talk) 11:56, 2 May 2021 (UTC)
(As of 3 May 2021 and as I checked, I'm not aware of any code changes (history) which may have of affected this. Seb35 made some other code change this same day.) Yug (talk) 09:47, 3 May 2021 (UTC)

I add a user who has the same problem: Le Commissaire. —Eihel-LiLi (talk) 15:33, 6 May 2021 (UTC)

Bonjour @Seb35 , Faudrait voir avec Le Commissaire si le problème persiste aussi (avant de clore le ticket Phab. Sincères salutations. —Eihel (talk) 10:01, 4 June 2021 (UTC)
J’ai mis un message à Le Commissaire sur sa page de discussion.
Le problème que vous avez eu était spécifique à votre compte, c’est peut-être arrivé à d’autres personnes mais ça semble assez rare. Aussi, à partir du moment où un utilisateur a réussi à faire un envoi vers Commons, alors c’est un problème différent du vôtre (celui-ci, qui ressemble mais l’erreur est intermittente). Plus globalement, il faudrait que le message d’erreur soit explicite plutôt que d’aller à chercher dans la console du navigateur, je vais ouvrir un ticket Phabricator en ce sens. Seb35 (talk) 10:28, 4 June 2021 (UTC)

Exclusion lists

If anyone uses the regularly updated Olafbot's lists of wanted words (List:Fra/Lemmas-without-audio-sorted-by-number-of-wiktionaries, etc.), and spotted an item that should be removed without recording, you can use the brand new exclusion lists to remove it. For example on the list List:Fra/Lemmas-without-audio-sorted-by-number-of-wiktionaries there was the word "abandonar", which apparently doesn't belong to the contemporary French corpus. Having added it on the exclusion list (here: user:Olafbot/exclusion list/Fra) the bot knows this item should never appear in French lists it maintains, and removes it during the next update.

Each "Lemmas without audio" list (afr, ang, ara, ast, aze, bel, ben, bul, cat, ces, cmn, cym, dan, deu, ekk, ell, eng, epo, eus, fao, fas, fin, fra, gla, gle, glg, grc, heb, hin, hrv, hun, hye, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, kor, lat, lit, ltz, lvs, mar, mkd, mlg, nld, nor, oci, pan, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tha, tur, ukr, vie, yid, yue) has a corresponding exclusion list (afr, ang, ara, ast, aze, bel, ben, bul, cat, ces, cmn, cym, dan, deu, ekk, ell, eng, epo, eus, fao, fas, fin, fra, gla, gle, glg, grc, heb, hin, hrv, hun, hye, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, kor, lat, lit, ltz, lvs, mar, mkd, mlg, nld, nor, oci, pan, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tha, tur, ukr, vie, yid, yue). I hope it will help.

Normally I would add a link to the exclusion list in a description of each lemmas list, but unfortunately, Lingua Libre engine doesn't allow adding any kind of comments or descriptions to lists, so this ad is the only way to spread a word about the new functionality. Olaf (talk) 09:54, 13 September 2021 (UTC)

@Olaf Thank you so much for this useful new function! Indeed, the Record Wizard does not yet understand comments, categories nor templates on List pages, but this will be considered for future updates. — WikiLucas (🖋️) 18:48, 13 September 2021 (UTC)

Ajout d'une nouvelle langue

Bonjour !

Je souhaite ajouter la langue Q3196953 mais en suivant la procédure, je ne vois pas LinguaImporter. Quelqu'un peut-il me dire pourquoi?

Cdt, BamLifa

@BamLifa c'est parce que tu n'es pas administrateur. Je viens d'importer le Nande (Q646152) Pamputt (talk) 17:16, 13 September 2021 (UTC)
@Pamputt , merci beaucoup pour cette précision. Si cette option n'est réservée qu'aux admins, pourquoi en parler dans la doc sans cette précision ? En plus, vue la multitude des langues que nous avons qui n'existent pas encore chez Lingua libre, ne pensez-vous pas que vous devriez simplifier cette tâche ? J'ai encore une autre langue à ajouter, le Bira (bila). BamLifa (talk) 12:41, 20 September 2021 (UTC)
@BamLifa c'est indiqué sur cette page (c'est même le titre de la section (Outil pour les administrateurs)). Je ne me rappelle pas pourquoi c'est réservé aux admins mais ça limite au moins les vandales qui voudraient importer des choses qui ne sont pas des langues. Bref, j'ai importé le Bira (Q656403) et le Bila (Q656404). Si ce ne sont pas les bonnes langues, peux-tu me donner le code ISO 639-3 correspondant (ou au moins l'identifiant Wikidata) ? Pamputt (talk) 14:06, 20 September 2021 (UTC)
@Pamputt , Merci beaucoup. BamLifa (talk) 05:34, 22 September 2021 (UTC)

Lists still don't work properly

@WikiLucas00 @Poslovitch It's better than before, but still, sometimes the Record Wizard hangs when a list is chosen. Then I have to reload the page, and try again. Usually the second or the third time of trying the same list, it starts to work. Probably a race condition. Olaf (talk) 09:47, 30 September 2021 (UTC)

@Olaf It also happens to me sometimes, but I think that it could be related to the button for removing words you already recorded. When you load a list of words you never recorded (typically Olafbot's lists), ticking the button seems to kill the loading. Best — WikiLucas (🖋️) 10:23, 30 September 2021 (UTC)
Thank you. Indeed, with this switch unchecked everything seems to work. Olaf (talk) 16:02, 1 October 2021 (UTC)

Liste des mots à prononcer

Salut ! Existe-t-il une page où des mots peuvent être ajoutés pour qu'un bon samaritain puisse parler ? Vivaelcelta (talk) 11:30, 3 October 2021 (UTC)

Bonjour Vivaelcelta, les listes sont faites pour cela. Vous pouvez créer votre propre liste qui pourra ensuite être enregistrée par n'importe qui. Pamputt (talk) 16:50, 3 October 2021 (UTC)
Merci Pamputt. — Vivaelcelta (talk) 22:38, 3 October 2021 (UTC)

Projet Outils pour la patrouille

See LinguaLibre:Events/Patrol assistance tool prototyping project.

Hi,

This week, a project lead by student of University Toulouse 3 - Paul Sabatier is starting. It will be about the prototyping of patrolling tools. I supervise this project, assisted by Adélaïde Calais. The students study computer science with a specialization in Artificial Intelligence. The aim is to have them prototyping (or even developing) tools to help Lingua Libre's patrol, by automatically detecting any kind of mistake/error related to the files. We already identified a few types of mistakes: clicks, crackles, pops and labelling issues (wrong label/wrong language).

We need the community on two points :

  1. are there other problems you could think of?
  2. we need some recordings having issues, in order for the students to be able to work. If you already recorded them again, it is not a big deal, Commons has a file history. Don't hesitate to provide us the files that have or had problems.

Lastly, I created a project page, available here.

See you, Lepticed7 (talk) 09:19, 19 October 2021 (UTC)

Hello Lepticed7, Translated page —Eihel (talk) 19:49, 22 October 2021 (UTC)
Lepticed7, Adélaïde, could you specify the dates for this project ?
Also, were your point 1 and two answered by the community somewhere ? (If not I could give it a try) Yug (talk) 13:19, 15 November 2021 (UTC)
@Yug Hi, I updated the project page with the dates. And I didn’t get any answers to my questions. Lepticed7 (talk) 11:25, 28 November 2021 (UTC)

Rashidun Caliphate

Hello @Zinou2go , LL-Q13955 (ara)-Zinou2go-الخلافة الراشدة.wav is problematic (currently الخلافة الراشدة (Q204439) on LiLi): it contains several cuts (clicks). I proposed the file for deletion in Commons. The recordings seem to be working better, could you record Rashidun Caliphate again? I didn't check the other records, but they are likely to have "clicks" as well. Also, can an admin delete this item on LiLi, please? Cordially. —Eihel (talk) 15:31, 12 November 2021 (UTC)

@Eihel Please do not nominate files for deletion before asking for the speaker to record it again and waiting a while for their answer. Also, these recordings will come useful for the team currently working on the audio issues of Lingua Libre, so we'd better not delete them (I thought you read my messages on Discord about this). — WikiLucas (🖋️) 15:48, 12 November 2021 (UTC)
@WikiLucas00 , J'ai enlevé la suppression sur Commons. —Eihel (talk) 15:54, 12 November 2021 (UTC)

Code of Conduct

Hi everyone, I just noticed again MediaWiki's mw:Code of Conduct (2015) and Wikimedia Foundation's foundation:Universal Code of Conduct (2021/02). Back in 2015, 0x010C included the first one as a condition to contribute to RecordWizard's codebase. As far as I know, Lili.org and its community, so far, has no Code of Conduct. We may be implicitely binded by it or by some Wikimedia France's Code of Conduct, but it would be cleaner to explicitly adopt one and display it here, in written. We could therefor do the following :

  1. Short round to confirm with have nothing in place so far.
  2. Vote for 2 months to adopt the most recent foundation:Universal Code of Conduct (2021/02)
  3. Copy the text into LinguaLibre:Universal Code of Conduct.

Yug (talk) 14:48, 14 November 2021 (UTC)

Pre-discussion

Do we already have a Code of Conduct binding LinguaLibre ? Yug (talk) 14:48, 14 November 2021 (UTC)

Vote

Are you for or against adopting the foundation:Universal Code of Conduct (2021) as a code of conduct for LinguaLibre's community ?
Possible votes : {{Support}} • {{Weak support}} • {{Weak oppose}} • {{Oppose}}

  • Support Support (proposer) — better to be explicit, have a framework in place, just to be clear to all on where we stand. Yug (talk) 14:48, 14 November 2021 (UTC)

Lingua Libre website should be more appealing to Language Learners

See also Forvo.com.

It would be useful if LinguaLibre follows the example of Forvo to increase the number of language learners interested in the Project.

Forvo.com has a way of displaying the information that engage users and makes it very easy to find pronunciations.

For example, if someone wants to learn how to pronounce "Honoré de Balzac" in French, it would be faster to find the audio on Forvo than on LinguaLibre. Also, Forvo displays the data in a way more appealing to language learners:

Would it be possible to improve the way that data is displayed on LinguaLibre to make it more appealing to Language Learners ? In such way, the number of active users recording audios would increase significantly. -- Marreromarco

Some people previously reported such "issue". There is a ticket on Phabricator to keep this in mind. However, the priority is currently given to develop patrol tools for Lingua Libre and we do not expect to see major improvements related to the audio brosing in the coming months (at least if we have no more external developers). I think it is like this because Lingua Libre has been though so that it helps for recording, not for listening; the second is let to the other Wikimedia projects, mainly Wiktionaries et Wikidata. Pamputt (talk) 16:00, 14 November 2021 (UTC)
YES ! There are oral discussions and proposals in this direction, but LinguaLibre being a volunteers-based team, we are moving slowly. Forvo is a for-profit entity, it locks the copyright and resale of recordings made on its platform to the speaker-creator and to themselves, to then sell those recordings with a profit. They therefor have money and swift decision-making to sustain their UI/UX efforts. We are shorter on those sides. --Yug (talk) 16:30, 14 November 2021 (UTC)

Sound Library's forking and hacking

On the Sound Library side, I was able to duplicate/fork it, which allows to start hack its CSS. Copy those codes into your own namespace :

In those codes, you then have to replace all occurrences of "Yug" by your username, and it's should work. You can start hacking toward a more elegant interface. Note: the JS copy is in your *personal* JS and has a "stop" condition so the various JS instances won't fight. --Yug (talk) 16:30, 14 November 2021 (UTC)

Allow recording only in the user's Native Language to avoid passing "mispronunciations" to Wiktionary

I started a discussion on the German Wiktionary because some words on LinguaLibre are not available on the DeWikt. The German Community told me that LinguaLibre adds words into Commons, but the Bot only accepts audios from “few” trusted users using a filter.

The English and German Wiktionaries use a bot called "DerbethBot" to add audios from Commons. However, the English Wiktionary community asked to block Lingua Libre's recordings because there were non-native speakers recording audios and the Bot had no way to differentiate them from Native speakers. After the audios were introduced in the English Wiktionary they had to forbid adding audios from LinguaLibre:

https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2020/July#Labeling_non-native_audio

I believe that it is necessary to avoid giving “mispronunciations” to Wictionaries. That is similar to vandalism on a Wiktionary if the reader doesn't know that it is hearing a bad pronunciation and believes that it is “native speaker”:

Some suggestions: 1) Would it be possible to name the audios files to specify if the speaker is a native or not? For example, if a French speaker records the word "maison" it could be named "maison-fr-native.ogg" . If a language learner records the same word : "maison-fr-learner.ogg"

2) A radical way to address the issue would be to only allow to record in one's native language. Of course, users could change it, but strong warnings could be added and always remind people to record only their native language. Forvo seems to take this approach.

It might be valuable for Linguists to have recordings of non-native speakers to study their accent features in an L-2 Language. However, in my humble opinion the pronunciations added to Wiktionary should be only native speakers and bots should have a way to differentiate them.

Link to the German Wiktionary discussion about LinguaLibre: https://de.wiktionary.org/wiki/Wiktionary:Teestube#:~:text=von%20technischer%20seite%20gibt%20es%20keinem%20problem%2C%20zwei%20bots%20auf%20de.wiktionary%20arbeiten%20zu%20lassen.

Hi, this depends on the Wikitionary policy, and it could be different from a language to another one. Anyway, it is already possible to select only recordings done by native speaker. To do that, the speaker has to fill the language level (P16) property ith the value native (Q15) (see for example Pamputt (Q466)). Other values for language level (P16) are given here. Pamputt (talk) 16:38, 16 November 2021 (UTC)


Sursilvan

-- Done

User:Franz.Roos.1955 made 2 recordings in en:wp:Sursilvan : rauna (rauna (Q689785)), ‎tschitta (tschitta (Q689786)). Sursilvan has no iso code. Do we have a procedure for such languages ? (I forgot if the case already shown up). Yug (talk) 20:37, 17 November 2021 (UTC)

There is not issue. It simply uses the Wikidata identifier when there is no ISO code. Se for example Auvernhat dialect (Q1186). To record in such languages, we have to create an item for this language/dialect on Lingua Libre, and this is already done for Sursilvan (Q74905). Pamputt (talk) 21:59, 17 November 2021 (UTC)
Thank Pamputt for the clarification. Yug (talk) 23:12, 17 November 2021 (UTC)

commons:commons:structured data

I've been very pleased with LL's tooling, that does so much of the process of uploading to Commons, sensible naming, description-writing, and categorisation for me; however, I have an idea for an additional step LL could automate. This is in Commons' no-longer-so-new structured data section, which manifests (among other ways) as a tab on the file page.

As an example of what could be automatically added to a file's datastore, there is a property called 'audio transcription' which serves a similar role to Commons' TimedText subtitle functionality (silly example: commons:TimedText:051226-kakapo-billbooming.ogg.en.srt) but for shorter clips -- in other words, seemingly designed with applications like LinguaLibre in mind.

Since these are of the so-called 'monolingual text' datatype, the source language can be specified (or where not part of the main set of languages Wikimedia uses, the special code 'mis' is used and 'language of work or name' used as a qualifier) at the same time as the actual text that is being spoken, which LL has access to since the audio file started out as a text prompt!

What think y'all? Arlo Barnes (talk) 04:25, 19 November 2021 (UTC)

Hi Arlo Barnes there is Phabricator ticket about this topic. Currently there are not yet all properties on Wikidata to fit all Lingua Libre properties. For example, I proposed to create a property for the language level of a speaker but it did not get enough support. SO I guess, we should first list all properties we would like to add on SDC. Pamputt (talk) 07:18, 19 November 2021 (UTC)

[Feature Request] Play next sound automatically while checking recordings

After recording sounds it is important to check them to verify their quality. However, it is very tiring to record 380 words and afterwards have to click 380 times on the “Next button” while checking them.

After recording, would it be possible to add a button to "Play next sound automatically" ? Screenshot Here Marreromarco (talk) 04:09, 20 November 2021 (UTC)

Agreed, it is already tracked on Phabricator. Pamputt (talk) 09:45, 20 November 2021 (UTC)

"How to use Lingua Libre for your language learning"

I recently found a "new" way to benefit from the sounds on Lingua Libre. I would suggest that it could be advertised on the Lingua Libre main website and on the Wikipedia in French/English:

  • GoldenDict is a FOSS Dictionary application very valuable for language learners.

A way to benefit from Lingua Libre recordings is to download the datasets, unzip them and "load" the sounds on GoldenDict (as Sound Directories. Screenshot here). In such a way, users have easily an offline "Pronunciation Dictionary". It is very easy to do. Here is an screenshot of how it looks to GoldenDict the French word "fuir". Another example here.

Lingua Libre sounds can be used with GoldenDict OFFLINE. That is a huge advantage in developing countries, where language learners often do not have reliable internet connection.

It would be valuable to create a description on the Lingua Libre website about "How to use Lingua Libre sounds for your language learning" .

There it would be possible to describe how to use the audios offline with GoldenDict, etc. If more methods are developed (Anki add-on), better GUI, Android App, etc. they could be explained there.--Marreromarco (talk) 04:41, 20 November 2021 (UTC)

1) Reuse of datasets : Yes ! Dataset download and reuse must be showcasted and strengthened. I think a "Reuses gallery" page could be created, with screenshot and minimal how-to for GoldenDict, Anki and others.
2) Anki: You are the 4th or 5th contributor to rise the need for an Anki add-on. We need to do something on this side, yes. It's more than 1~2 days work and too big for a volunteer work, so we need to apply for a grant. I'am looking in and mapping our options at the moment ({{Grants table}}). At one point we have to jump in and design a project, yes.
3) For e-learning app, a 5k€ project was designed by myself a year ago. The funding by local regional government was declined, but it could easily be refreshed.
We have to redesign some projects and apply in early 2022. Yug (talk) 09:28, 23 November 2021 (UTC)
The core question is the Human Resources.
*Daily routines* keeps WikiLucas, Pamputt, Poslovitch and myself –aka the community-side contributors— busy maintaining the place, welcoming and guiding new users, cleaning pages, etc. We are now quite smooth, successful and stable on this side.
To *push forward* on developments, UI, tools, e-learning, communication, grants, we each have one or two side projects in mind, pushing those slowly. But as always in FOSS projects the task ahead is much larger and we could achieve much more with more human resources.
Overall, it's possible we are at a new turning right now. As things are stable, with road maps available, we just need 1 to 3 new coordinators and communicants contributors to tip the dynamic into forward-offensive mode, with communication therefor new arrivals, new speakers, new devs, new coordinators and really push forward with new events/workshop, funds and SMART features.
@Marreromarco, I'am currently writing down structuring "community how to" to ease new contributor's jumping in (see LinguaLibre:Roles, LinguaLibre:Workshops, {{Grants table}}). You are doing a nice push on communication (It's FOSS) and with your questions you are mapping out Lili's needs. Pamputt and WikiLucas are following our progresses. All this is pretty interesting. Yug (talk) 10:48, 23 November 2021 (UTC)
I would like to work on the "Public Relations" Department of LinguaLibre! - EDIT (28th Nov. 2021) : Any PR campaign would fail miserably if there is no search function. I explain the reasons at the end of this section: LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign

Marreromarco (talk) 23:49, 23 November 2021 (UTC)

Sound good :) Your outreach to YouTubers and popular FOSS blogs is spot on.
I am back from a wikibreak, I am cleaning up some last pages, then since the maintenance side is stable I would like to focus my energy on projects design –recording rare languages, technology, PR campaign– and associated grant requests to secure funding and the actual realization of those visions. We can collaborate. You lead on the PR : design your campaign. I can review and help it to fit some Grants formats. Yug (talk) 18:00, 24 November 2021 (UTC)

I created a new wiki page in the "events" section of a "PR Campaign for 2022". Please visit LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign and participate in the discussion with new ideas. EDIT (28th Nov. 2021) I will NOT contribute anymore to a PR campaign. the reasons are explained as comment on the relevant section Marreromarco (talk) 21:20, 25 November 2021 (UTC)

Creating a LL catgory for a dialect

Would be grateful if someone could tell me if it's possible to create a LL category for a dialect?

We're working in Konkani, which has its own (but small) Wikipedia at http://gom.wikipedia.org Under Konkani, there are some dialects spoken, the pronunciation of one can be different from the other.

Would like to create a category for Saxtti (the Salcete dialect of Konkani). This will ensure that readings don't get overwritten by other dialects. Also, it would allow the recordings of many others which might have already been done in Konkani as a how.

Question: How do we create space for the dialects of a language?

Thanks very much, in advance! --Fredericknoronha (talk) 13:34, 27 November 2021 (UTC)

Hello @Fredericknoronha and welcome to Lingua Libre. I imported Goan Konkani (Q700683) (gom) as it was not on Lingua Libre yet. On Lingua Libre, dialects are treated the same way as languages. You can create an element for your dialect on Wikidata (example for auvergnat dialect) and tell us once it is ready, so that we can import it on Lingua Libre with an admin tool. You can also directly create an element for your dialect on Lingua Libre, following the steps described at Help:Add a new language and taking example of Auvernhat dialect (Q1186). Don't hesitate to ping an admin if you have any questions.
All the best — WikiLucas (🖋️) 15:35, 27 November 2021 (UTC)
« there are some dialects spoken, the pronunciation of one can be different from the other. […] This will ensure that readings don't get overwritten by other dialects. »
If the writing are similar but only the pronunciation differs depending on where the speaker comes from, it looks like different accents.
Recordings are specific to a word, a language and a speaker. Which means me recording in French the word "bonjour" will be one audio file on Lili. WikiLucas can record in French the same word "bonjour", it will create an other audio file on Lili. My recording(s), since i come from the South West, will carry the southern accent. Recordings by WikiLucas, who lives 700km East of me, will cary the Lyon area accent. Lingualibre will store 2 recordings, one per user. Yug (talk) 21:59, 27 November 2021 (UTC)
Hello Fredericknoronha, I have imported Salcete Konkani (Q701734) so that you can now record words in that dialect. Pamputt (talk) 17:21, 28 November 2021 (UTC)

Feedback about Lingua Libre by Professor Carol Genetti, PhD

Dear Members of Lingua Libre, I am pleased to share a message from Professor Carol Genetti, a linguist and leading expert in endangered languages. Professor Genetti is author of one of the best books in the field of Linguistics called "How Languages Work". Her vast knowledge and experience are extremely valuable and after reviewing Lingua Libre she said:

Thank you for contacting me and letting me know about this initiative. It is an interesting idea. I especially like the multilingual menus -- very helpful.

Are you aware of this website, hosted by the University of Hawaii (and, I believe, funded by Google). So one thing that occurs to me is the proliferation of such sites. How will people in an endangered-language community find out about their options, and then make an informed choice about which of these online resources will be best over time for their communities? Should such efforts cross-reference each other?

My second thought has to do with longevity. It takes a significant commitment to support a site like this over time. The challenge is having someone who can keep such sites funded, working, organized, relevant, and engaging users over time. How will you make sure that the data will be available in 10, 50, 150 years? Maybe you get that automatically by being associated with Wikipedia. If so, state that. Also, there should be a clear statement of how such data might be used, and by whom, so speakers know that if they record a wordlist, someone might use if for some purpose without their permission (is that right?). I'm sorry to have to bring a down-to-earth message to the inspiration and passion for endangered languages that has clearly fueled this work, but having seen other initiatives stumble in this way, I wanted to be sure that you are thinking about this. Speakers will be entrusting you with such valuable pieces of their lives and their cultures. How will you safeguard this over time? Let people know. Those issues aside, here are a couple of other comments:

  • There should be a statement targeted for speakers of endangered languages - why would they want to do this? What is the value for them and their communities? What will happen to the recordings? etc.
  • Will you provide speakers with suggestions for what vocabulary to record, e.g. greetings, colors, verb forms?
  • It would be helpful if it was clear from the large list of languages which ones have recordings. Maybe put those in a different color font?
  • It would be helpful to include translations of the words into one of the world's major languages or the national language. Otherwise, someone's grandkids coming to this in 30 years will not know what the words mean.
  • Do you want to move beyond single words to a piece of connected discourse, such as a short poem or story, a song, or the reading of some common text (such as a sentence from the UN Declaration for Linguistic Rights)?
  • Should there be a means to flag inappropriate content?

I hope that you find this helpful. And I'm so glad you liked my book! It is lovely to hear that people have found it helpful.

Carol Genetti Vice Provost for Graduate and Postdoctoral Programs NYU Abu Dhabi (she/her/hers)

Marreromarco (talk) 09:23, 4 December 2021 (UTC)

Hey, this is some interesting feedback.
  • "What will happen to the recordings?": Our homepage lacks such important information. We should plan a redesign for 2022 (inspired by the homepage of Common Voice?) so that we finally have a homepage that properly explains what Lingua Libre is and can do.
  • "Suggestions of things to record?": This already exists. They're called Lists. We have some pending improvements on that matter (easier to find and contribute to, etc.)
  • "Show which languages have recordings": The datasets page could help, but I guess it would be interesting to put that on an easy-to-find page (again, like Common Voice's languages page?)
  • "Include translations of the words into one of the world's major languages or the national language": we only support "transcription" for now.
    • How could we even "link" the recordings to translations? (Lexemes? Plain text?)
    • Who would have to do that? (the locutor? a dedicated team of contributors?)
    • Where would it be done? (in the RecordWizard?)
    • -> That's an interesting thing to think about, but might be slightly out of scope right now
  • "Sentences, stories, songs...?": Yes, indeed. The Record Wizard is already able to do that (with some config tweaks that have to be done by the locutor), but it would be great to streamline this further. Dedicated UI, ability to record an audiobook (or Wikipedia, Wikisource, Wikinews article) as a mixture of sentences that can be stored locally before being all merged together into one audio file sent to Commons, ability for multiple contributors to work on the same book/article... That's something we should also discuss with the Librivox folks: they use Audacity so far, but they might be interested in a tool that's better suited to their needs.
  • "flag inappropriate content?": My insight is focused on technical stuff. This sounds more like some editorial guidelines that would have to be debated by the community.
  • "longevity?": Should Lingua Libre vanish tomorrow, the audio recordings are not lost. They're all stored on Wikimedia Commons, and that makes them as "immortal" as files stored on hard disks, SSDs, CDs or magnetic bands and mirrored half a dozen times around the world can be. However, I can't say much about our Wikibase, which, at the current time, is the only place where all the recordings and locutor-related metadata is stored. That's a serious single point of failure. There are no dumps and therefore no mirrorring. We'll definitely have to discuss it with Wikimedia France and the Tech Team.
Hopefully my answers are clear and comprehensible. I'm pleased to have received feedback from Pr. Genetti. Now it's our turn to take matters in our hands! --Poslovitch (talk) 13:13, 5 December 2021 (UTC)

How to delete lists?

-- Done

Hello, recently I completed some lists. Now everything is done and those lists are needless. Is there any possibility to delete lists? Greetings --Onkel Tomm (talk) 10:02, 10 December 2021 (UTC)

@Onkel Tomm hello, admins can delete those lists. The lists you created are here. Which ones should I delete ? Yug (talk) 10:25, 10 December 2021 (UTC)
Hello Yug, please delete all 8 lists, because they are all finally finished. Thanks. --Onkel Tomm (talk) 13:44, 10 December 2021 (UTC)
@Onkel Tomm We are clean ! thank for asking, it keeps the place clean :) Yug (talk) 15:10, 10 December 2021 (UTC)

Case study

Hello all, I noticed a file upload which gather interesting use cases.

Item Label Speaker Account Filename Category
Ingenieur (Q709231) (arch.) "Ingenieur" fleur (Q674858) 'fleur' User:Beat_Ruest File:LL-Q150_(fra)-fleur_(Beat_Ruest)-Ingenieur.wav commons:Category:Lingua Libre pronunciation by Beat Ruest
Mispelling of "Ingénieur" Carries the misspelling Category page was not created, therefor virtually "lost" to Wikimedia Commons and commons:Category:Lingua_Libre_pronunciation_by_user.

Questions:

  • Question 1: How do we handle mispelling ? I assume renaming ALL THREE of the Ingenieur (Q709231)'s label AND Property:P3 'recording' AND Wikimedia file File:LL-Q150_(fra)-fleur_(Beat_Ruest)-Ingenieur.wav rename. Is that ok or will it break something ?
  • Question 2: Category should be automatically created. How do we go for this ? I assume a request on LinguaLibre:Bot
  • Question 3: What about the category by *speaker/voice* (Ingenieur (Q709231) 'fleur'), which curently doesn't exist, and which can have multiple speakers with the same name 'fleur' ?

Yug (talk) 10:39, 10 December 2021 (UTC)

Question 1: it is a good start. I guess, we need to fix it both on Lingua Libre and on Wikimedia Commons
Question 2: you speak about categories on Wikimedia Commons? If so, I guess a bot can do it (Lingua Libre Bot or another one).
Question 3: actually the speaker is identified as "fleur (Beat Ruest)". Only one locutor of Beat Ruest can use the nickname "fleur".
Pamputt (talk) 11:23, 20 December 2021 (UTC)
Q1, Q2 agree.
Q3 : @Pamputt check the categories on commons:File:LL-Q150_(fra)-fleur_(Beat_Ruest)-Ingenieur.wav. Yug (talk) 14:56, 20 December 2021 (UTC)
@Yug you mean the problem is c:File:LL-Q150_(fra)-fleur_(Beat_Ruest)-Ingenieur.wav is categorized in "Category:Lingua Libre pronunciation by Beat Ruest" and not in "Category:Lingua Libre pronunciation by fleur (Beat Ruest)" or similar name? Pamputt (talk) 07:57, 5 January 2022 (UTC)
Yes, we dont have categorization by speaker "Fleur (Beat Ruest)". Low importance, but could be a feature request. Yug (talk) 18:01, 5 January 2022 (UTC)

Gestion de doublons

See also Help:Homographs (new, needs review!)

Bonsoir !

Il y a-t-il une gestion de doublons dans LL pour les mots d'une même langue ? BamLifa (talk) 13:45, 18 December 2021 (UTC)

Bonjour BamLifa, si un même locuteur enregistre le même mot alors l'enregistrement précédent sera écrasé (un même locuteur ne peut enregistrer qu'une seule fois le même mot). En revanche, rien n'empêche l'enregistrement d'un même mot par plusieurs locuteurs et locutrices différentes, c'est même un des objectifs de Lingua Libre : mettre en lumière la diversité des prononciations. Pamputt (talk) 11:19, 20 December 2021 (UTC)
@Pamputt : Comment sont alors gérés les homographes non homophones ? ^^ Totodu74 (talk) 00:03, 5 January 2022 (UTC)
Bonjour Totodu74, il est possible d'ajouter des indications entre parenthèses (cette information est stockée à l'aide de qualifier (P18)). Voir par exemple fils (pluriel de fil) (Q1685) et fils (enfant) (Q1686). Pamputt (talk) 07:55, 5 January 2022 (UTC)
@Totodu74, salut, la question des homographes est en partie résolue dans nos langues africaines qui sont essentiellement des langues à tons. --Rçag (talk) 11:18, 9 January 2022 (UTC)
Rçag, could you explain your solution a bit so we learn from it.
@BamLifa, Rçag, Pamputt, & Totodu74 the page Help:Homographs is there to gather best practices. It's new, review and edits welcome. Yug (talk) 15:05, 12 January 2022 (UTC)

Comment changer de pseudonyme

Bonjour, sur les projets de Wikimedia, mon pseudonyme est Manjiro91 (et anciennement GamissimoYT), comment change-t-on de pseudonyme ? GamissimoYT (talk) 17:13, 11 January 2022 (UTC)

Bonjour GamissimoYT. Lingua Libre utilise le même pseudo que celui qui est en utilisation sur Wikimedia Commons. Donc si vous voulez utiliser le pesudonyme Manjiro91, déconnectez-vous de Lingua Libre, puis de Wikimedia Commons. Ensuite, connectez vous à Commons avec le pseudo Manjiro91 et enfin reconnectez vous à Lingua Libre. Pamputt (talk) 21:05, 11 January 2022 (UTC)

@Pamputt Mon pseudonyme Wikimedia Commons est Manjiro91 (anciennement GamissimoYT mais le changement de pseudonyme ne s'effectue pas sur LiLi. GamissimoYT (talk) 13:38, 12 January 2022 (UTC)

@GamissimoYT , tu as bien fait les connexions/déconnexions dans l'ordre que j'ai indiqué ? Si tu es sûr que tu es connecté avec Manjiro91 sur Wikimedia Commons, alors tu peux essayer de te déconnecter de Lingua Libre et te reconnecter dans la foulée. Essayer de vider le cache du navigateur peut peut-être aidé aussi. Pamputt (talk) 07:37, 13 January 2022 (UTC)

Merging of items about languages

See also Help:SPARQL and Help:SPARQL for maintenance.

Hi y'all,

For the record, I just merge a couple of items about the same language:

I detected them with this SPARQL query:

SELECT ?idWD (COUNT(?item) AS ?compte) (GROUP_CONCAT(?item) AS ?items) WHERE {
  ?item prop:P2 entity:Q4 ; prop:P12 ?idWD .
}
GROUP BY ?idWD
HAVING ( ?compte > 1 )

Ping @WikiLucas00 it seems you are responsible for some of them...

Cheers, VIGNERON (talk) 09:29, 19 February 2022 (UTC)

Thanks VIGNERON for finding them and cleaning it. Now what to do with recording items that use the doublon language item (for example with Duala). I think we must modify language (P4) for all recording items so that languages are not counted twice and also to clean up the database (there are also transcription problems for items listed in the Duala example). Pamputt (talk) 16:16, 19 February 2022 (UTC)
Thank you @VIGNERON for pointing these out. As you can see, most of them were not created manually but using the tool (the pages wheighted circa 4kB, with labels in many languages). It seems that the Lingua Importer tool has (or had?) a problem, but I could not reproduce it (trying to import languages that are already in LL wikibase).
During last summer's hackathon we talked a bit about languages in our wikibase, but I can't remember why we need to have language elements in our Wikibase, and not just use the existing base of WikiData 🤔 — WikiLucas (🖋️) 23:23, 19 February 2022 (UTC)

MediaWiki customizations of LinguaLibre

Love the MediaWiki skin of LinguaLibre and I am curious of skin and customizations made. Who are the authors? (can not see credits) --Zblace (talk) 10:15, 19 February 2022 (UTC)

The skin is known as BlueLL. The source code is available on github. It has been developed by Wikimedia France in 2020. That's said, it is true there is no licence and credits on Github. I will ask to Adélaïde Calais WMFr if she remember anything so that I can the missing informations. Pamputt (talk) 16:58, 19 February 2022 (UTC)
Hi @Zblace , this skin's author is User:0x010C, and its opensource. Can be reused freely. Yug (talk) 22:45, 22 May 2022 (UTC)

New property: translation

Hello, I've created translation (P38) to be used in case there is no writing in the recording language but instead a translation in the vehicular language. See for example what I did here and there. Do you agree with that? Any comment? Pamputt (talk) 16:33, 19 February 2022 (UTC)

It's a good idea! Many users tend to add a translation as they find it important for other people to have. It will also be handy for cases like your second example, where we only have the translation but not the transcription of the source language: we will be able to query the base to see all audios of a language that have a translation. — WikiLucas (🖋️) 23:28, 19 February 2022 (UTC)
I am thinking about a way to populate automatically this property via the Record Wizard. Currently, it seems that the Record Wizard populates qualifier (P18) when something is written between brackets (see fils (pluriel de fil) (Q1685) for example but I have not checked recently). So, if we modify the Record Wizard code, it is possible to recognize this is a translation in another language and so to populate translation (P38). But I would like to be sure to propose the best way to do it before asking for such development. The idea is to be managed automatically (or at least not completely manually). Pamputt (talk) 00:18, 20 February 2022 (UTC)

Lingua Libre Wishlist for 2022-2023

Hi everyone !
This week, Wikimedia France is preparing its budget for the fiscal year to come : July 2022 to June 2023. If there are things you would like to see done or to do with our help on Lingua Libre, please share it on this page : https://lingualibre.org/wiki/LinguaLibre:2022-2023_projection
Have a great week-end ! --Adélaïde Calais WMFr (talk) 17:23, 11 March 2022 (UTC)

marreromarco Thank you for your suggestions. However, I have some reservations about "Add function to "Request" a Pronunciation to Native Speakers" at this current stage for two reasons. First, this will require quite a bit of moderation to correct requests for grammar and spelling (e.g. HASBAND) as well as remove terrible requests. This will place a large burden on a few users and can easily lead to questionable decisions by moderators. Second, Forvo is flooded with requests that are either overly specific (e.g. "He came back from abyss and won the tie.") and, therefore, likely benefit only one user. IMHO, Rdrg109 proposal to focus on providing pronunciations for entries on the various wiktionaries is a better approach to building up the LL at this point. It will provide a solid foundation for users to find any word in LL. It might be a better time to open up LL to general requests once this project is completed and the community has grown. Languageseeker (talk) 15:49, 21 May 2022 (UTC)

How to get the city country label in SPARQL

See also Help:SPARQL.

I'm working on an Anki extension for LL, but I'm having a little trouble writing the sparql query. In short, I want to be able to get the city and country for a recording in LL. However, when I query P14, I get the link to the item instead of 'residence': {'type': 'literal', 'value': 'Q142'} or 'residence': {'type': 'literal', 'value': 'Q142'}. Instead I hope to get city:"" and country "France" for the first query city:"Paris" and country:"France" for the second one. Any ideas? Languageseeker (talk) 20:23, 19 May 2022 (UTC)

Hi Languageseeker thanks for your work on a Anki extension. Could you post here the query you have now? Pamputt (talk) 16:58, 20 May 2022 (UTC)
Hi Pamputt . The query that I'm using is a very lightly modified version of the bot query.
ENDPOINT = "https://lingualibre.org/bigdata/namespace/wdq/sparql"
API = "https://lingualibre.org/api.php"
BASEQUERY = """
SELECT DISTINCT
    ?record ?file ?transcription ?recorded
    ?languageIso ?languageQid ?languageWMCode
    ?residence ?learningPlace ?languageLevel
    ?speaker ?linkeduser
WHERE {
  ?record prop:P2 entity:Q2 .
  ?record prop:P3 ?file .
  ?record prop:P4 ?language .
  ?record prop:P5 ?speaker .
  ?record prop:P6 ?recorded .
  ?record prop:P7 ?transcription .
  ?language prop:P13 ?languageIso.
  ?speakerLanguagesStatement llq:P16 ?languageLevel .
  ?speaker prop:P11 ?linkeduser .
  ?speaker prop:P14 ?residence .
  ?speaker llp:P4 ?speakerLanguagesStatement .
  ?speakerLanguagesStatement llv:P4 ?speakerLanguages .
  OPTIONAL { ?speakerLanguagesStatement llq:P16 ?languageLevel . }
  FILTER( ?speakerLanguages = ?language) .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
  }
  #filters
}"""
Currently, I'm running it with filters = "" because it seems that a query for a single term takes around 70s, while fetching a single transcription takes about 145 seconds. My plan is to group the results by transcription and then write that into a json file to avoid the costly query. Basically, I need the speaker name, the term, their country, their city, the ISO code of the language, date created, and the filename, languageLevel.
For example, for the term un chien, the json would look like:
{ "term": {"un chien": {"speaker": "Julien Baley", "language": "fra", "city": "", "country": "France", "recorded": "2020-11-27", "filename": "LL-Q150_(fra)-Julien_Baley-un_chien.wav", "languageLevel": "Q15"}}} Languageseeker (talk) 23:17, 20 May 2022 (UTC)

Contribution: Python program to download all files created by a specific user

See also Help:Download datasets.

I wrote a python program that downloads all the files created by one user. For video files, it downloads the full webm. For audio files, the default is to download the wave file. However, for audio files, you can optionally choose either mp3 or ogg files. Currently, the configuration requires a minor modification of lluad.py. If there is strong demand, I will write a command line parser for it. Please report any bugs or errors on the github page. Feature requests are welcome. Languageseeker (talk) 02:28, 20 May 2022 (UTC)

@Languageseeker please add your tool to Help:Download datasets. It lists several tools with different specifics, your tool is welcome and may help some Python users as well. Yug (talk) 22:41, 22 May 2022 (UTC)

Garbage Values in prop:P14

See also Help:SPARQL for maintenance and Help:SPARQL_for_maintenance#.E2.9C.85_Speakers_.E2.86.92_Undefined_place_of_residence.

As part of my Anki project, I queried the entire LL database and I'm trying to parse the output of ?speaker prop:P14 ?residence. I've noticed that there are a number of garbage values in provided for P14, such as Q1, Q2, Q103962887, Q6099648, Strasbourg. There seem to be three cases.

  1. Users wishing to enter an extremely vague place such as Earth or the Universe. These should be set to None.
  2. Users accidentally linking to a disambiguation page. These require correction.
  3. Users not even entering a Wikidata item which require manual correction.

To solve the root of the problem, I propose that P14 should be restricted to only Wikidata items that exist and have P17. Languageseeker (talk) 21:22, 25 May 2022 (UTC)

@Languageseeker it's a good find. If you still have that SPARQL query under hand please add it into Help:SPARQL for maintenance. Yes, it's something we should clean up i think. There may be some few case where the speaker dont want to share its location but in 95% of cases i think we can go ahead, correct or ask them to correct it. Yug (talk) 12:39, 26 May 2022 (UTC)
I noticed that when creating a new speaker, place of learning is optional. Not cool. Yug (talk) 21:32, 27 May 2022 (UTC)
@YUG For the life of me, I can't get the federated query to work, but I have a separate query to get the location and country labels from wikidata. These are the problematic ones. Note, that Q20 is on the list because Q20 "Norway" is missing P17
  • ['MichaelSchoenitzer', None]
  • ['D.Muralidharan', None]
  • ['Kaderousse', None]
  • ['Krokus', None]
  • ['विदुला टोकेकर', 'Q103962887']
  • ['DoctorandusManhattan', 'Q2']
  • ['Justforoc', 'Q2']
  • ['Student16 de', None]
  • ['Didierwiki', 'Q6099648']
  • ['Sarah2149', None]
  • ['DomesticFrog', 'Q1']
  • ['Drkanchi', None]
  • ['Satdeep Gill', None]
  • ['Iwan.Aucamp', 'Q20']
  • ['Skimel', 'Q2']
  • ['Abeɣzan', None]
  • ['Gibraltar Rocks', None]
  • ['Bomdapatrick', None]
  • ['Ibtissam RAHMOUNI', None]
  • ['Trabelsiismail', None]
  • ['Ziko', 'Q2']
  • ['Youcefelallali', None]
  • ['Foxxipeter7', None]
  • ['Celevra089', None]
  • ['Bodhisattwa', None]
  • ['Atudu', None]
  • ['KageyamaxNishinoya', 'Q30915818']
  • ['Darkdadaah', None]
  • ['JayashreeVI', None]
  • ['रश्मीमहेश', 'Q103962887']
  • ['गीता गोविंद नेने', 'Q103893785']
  • ['Awangba Mangang', None]
  • ['Abigaljo', None]
  • ['FaelDaug', 'Q29423162']

Languageseeker (talk) 02:16, 30 May 2022 (UTC)

Anki Extension Release

I just released Lingua Libre and Forvo Addon. It has a number of advanced options to improve search results and can run either as a batch operation or on an individual note.

By default, it first checks Lingua Libre and, if there are no results on Lingua Libre, it then checks Forvo. To run as a pure Lingua Libre extension, you will need to set "disable_Forvo" to True in your configuration section.

Please reports bugs, issues, ideas on github. I would love any feedback. Languageseeker (talk) 02:23, 31 May 2022 (UTC)

Results of Coverage Test of French Lemma and Non-Lemma forms is English Wiktionary

While playing around with generating lists for pronunciation from Wiktionary, I decided to run a few tests on the current coverage of French lemma and non-lemma forms in English Wiktionary. I choose French because it is the largest datasets in LL.

Current Coverage of French in Lingua Libre

  • Total French Entries in Lingua Libre by a native speaker: 233 982
  • Unique French Entries in Lingua Libre by a native speaker: 154 358
  • Percentage of overlap: 34%
  • Term with the greatest number of pronunciations: "blanc" with 40

Current Coverage of Category:French lemmas

  • Total entries in Category:French lemmas: 84 482
  • Pronounced entries: 50 917
  • Entries with pronunciation: 33 565
  • Coverage Percentage: 60.27%

Current Coverage of Category:French non-lemma forms

  • Total entries in Category:French non-lemma forms: 29 1225
  • pronounced entries: 26 791
  • Entries with pronunciation: 264 434
  • Coverage Percentage: : 9.20%

For me, there are several lessons to be drawn.

  1. First, there has been amazing growth on LL. Covering 60.27% percent is a real achievement.
  2. The overlap percentage is quite small overall.
  3. There needs to be a clearer sense of when LL should stop requesting pronunciations for a certain term because 40 pronunciations of "blanc" seems a bit excessive.
  4. A need exists to continue pro-actively targeting entries in Wiktionary that are not in Lingua Libre. Currently, 297 999 French lemma and non-lemma forms require pronunciations.
  5. Generating lists from Wiktionary and checking coverage is not as hard as I thought.
  6. Lingua Libre has almost caught up with Forvo in the number of French pronunciations (233 982 vs 254, 703). Overall, Lingua Libre has shown amazing and healthy progress in a very short period of time. I'm excited about these results. Languageseeker (talk) 03:07, 1 June 2022 (UTC)
@Languageseeker This investigation is pretty cool. (I'm not sure i understand all your numbers yet, but i will read again when back on my PC). Its quite nice to see we are reaching Forvo level for our lead language. It's possible we have more unique words than forvo since we have user:Olafbot actively guiding and pushing us on that path.
On Lili we have chosen to be a learning AND linguistic diversity audio database. When you account for gender, regional accents, age, voice type, having 40 french audios for a word is still 400+ voices short.
Also, all contributors are not able to contribute audio perfect files due to various shortcomings (hardware, no recording room, no noose cancelling system, etc). We lack proper rating and review system. It's on our [slow] roadmap tho. 😉
PS: Should i answer to you in French i get a feeling you are French or learning it. Yug (talk) 15:07, 1 June 2022 (UTC)
@YUG Salut, Yug. Oui, je suis en train d'apprendre le français. Comme nous avons discutez pendant notre reunion, c'est difficile de definer les limits d'une language. Comme je le vois, les formes lemma ne suffit pas. Maintenant, je suis en train de crée un Olafbot sur steroid pour francais. Mon plan est de réaliser un program python qui peux analyser les modèle utilizer sur Wiktionary. Languageseeker (talk) 15:48, 7 June 2022 (UTC)