LinguaLibre

Difference between revisions of "Chat room"

Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.

m (→‎Feedback about Lingua Libre by Professor Carol Genetti, PhD.: Corrected text to remove private information.)
Line 231: Line 231:
 
I am pleased to share a message from Professor [https://en.wikipedia.org/wiki/Carol_Genetti Carol Genetti], a linguist and leading expert in endangered languages.  Professor Genetti is author of one of the best books in the field of Linguistics called "How Languages Work". Her vast knowledge and experience are extremely valuable and after reviewing Lingua Libre she said:
 
I am pleased to share a message from Professor [https://en.wikipedia.org/wiki/Carol_Genetti Carol Genetti], a linguist and leading expert in endangered languages.  Professor Genetti is author of one of the best books in the field of Linguistics called "How Languages Work". Her vast knowledge and experience are extremely valuable and after reviewing Lingua Libre she said:
  
''Thank you for contacting me and letting me know about this initiative. It is an interesting idea. I especially like the multilingual menus -- very helpful.
+
''Thank you for contacting me and letting me know about this initiative. It is an interesting idea. I especially like the multilingual menus -- very helpful.''
  
''Are you aware of [https://www.endangeredlanguages.com/ this website], hosted by the University of Hawaii (and, I believe, funded by Google). So one thing that occurs to me is the proliferation of such sites. How will people in an endangered-language community find out about their options, and then make an informed choice about which of these online resources will be best over time for their communities? Should such efforts cross-reference each other?''
+
''Are you aware of this website, hosted by the University of Hawaii (and, I believe, funded by Google). So one thing that occurs to me is the proliferation of such sites. How will people in an endangered-language community find out about their options, and then make an informed choice about which of these online resources will be best over time for their communities? Should such efforts cross-reference each other?''
  
''My second thought has to do with longevity. It takes a significant commitment to support a site like this over time. I know because I started one and had people working on it for about three years, but in the end I didn't have the long-term funding (and so staffing) to keep it going. The people involved were all very well meaning (including me) and it was a labor of love, but life happened to all of us and took us in different directions, then when no one was paying attention we lost the web domain, etc., etc. This has happened to a number of excellent websites that have been started. Here is another one: [https://www.sorosoro.org/en/ Sorosoro], originally funded by the Chirac Foundation, but there have been no updates since 2017. It was initially developed so that speakers from communities could upload their own data, but that's no longer possible. It's a lovely site, but I doubt that they get many visitors, and it seems to be static at this point.
+
''My second thought has to do with longevity. It takes a significant commitment to support a site like this over time. The challenge is having someone who can keep such sites funded, working, organized, relevant, and engaging users over time. How will you make sure that the data will be available in 10, 50, 150 years? Maybe you get that automatically by being associated with Wikipedia. If so, state that. Also, there should be a clear statement of how such data might be used, and by whom, so speakers know that if they record a wordlist, someone might use if for some purpose without their permission (is that right?).
 +
''
 +
''I'm sorry to have to bring a down-to-earth message to the inspiration and passion for endangered languages that has clearly fueled this work, but having seen other initiatives stumble in this way, I wanted to be sure that you are thinking about this. Speakers will be entrusting you with such valuable pieces of their lives and their cultures. How will you safeguard this over time? Let people know.
 
''
 
''
''So the challenge is having someone who can keep such sites funded, working, organized, relevant, and engaging users over time. How will you make sure that the data will be available in 10, 50, 150 years? Maybe you get that automatically by being associated with Wikipedia. If so, state that. Also, there should be a clear statement of how such data might be used, and by whom, so speakers know that if they record a wordlist, someone might use if for some purpose without their permission (is that right?).''
 
 
''I'm sorry to have to bring a down-to-earth message to the inspiration and passion for endangered languages that has clearly fueled this work, but as someone who went down my own inspired -- but ultimately unsustainable -- path, I just wanted to be sure that you are thinking about this. Speakers will be entrusting you with such valuable pieces of their lives and their cultures. How will you safeguard this over time? Let people know.''
 
 
 
''Those issues aside, here are a couple of other comments:''
 
''Those issues aside, here are a couple of other comments:''
  
* ''There should be a statement targeted for speakers of endangered languages - why would they want to do this? What is the value for them and their communities? What will happen to the recordings? etc.''
+
''* There should be a statement targeted for speakers of endangered languages - why would they want to do this? What is the value for them and their communities? What will happen to the recordings? etc.''
* ''Will you provide speakers with suggestions for what vocabulary to record, e.g. greetings, colors, verb forms?''
+
''* Will you provide speakers with suggestions for what vocabulary to record, e.g. greetings, colors, verb forms?''
* ''It would be helpful if it was clear from the large list of languages which ones have recordings. Maybe put those in a different color font?''
+
''* It would be helpful if it was clear from the large list of languages which ones have recordings. Maybe put those in a different color font?''
* ''It would be helpful to include translations of the words into one of the world's major languages or the national language. Otherwise, someone's grandkids coming to this in 30 years will not know what the words mean.''
+
''* It would be helpful to include translations of the words into one of the world's major languages or the national language. Otherwise, someone's grandkids coming to this in 30 years will not know what the words mean.''
* ''Do you want to move beyond single words to a piece of connected discourse, such as a short poem or story, a song, or the reading of some common text (such as a sentence from the UN Declaration for Linguistic Rights)?''
+
''* Do you want to move beyond single words to a piece of connected discourse, such as a short poem or story, a song, or the reading of some common text (such as a sentence from the UN Declaration for Linguistic Rights)?''
* ''Should there be a means to flag inappropriate content?''
+
''* Should there be a means to flag inappropriate content?''
  
 
''I hope that you find this helpful. And I'm so glad you liked my book! It is lovely to hear that people have found it helpful.''
 
''I hope that you find this helpful. And I'm so glad you liked my book! It is lovely to hear that people have found it helpful.''
  
''All the best,
+
''Carol Genetti''
Carol''
+
''Vice Provost for Graduate and Postdoctoral Programs''
 
+
''NYU Abu Dhabi''
'''''Carol Genetti
+
''(she/her/hers)''
Vice Provost for Graduate and Postdoctoral Programs
 
NYU Abu Dhabi'''''''
 
  
 
[[User:Marreromarco|Marreromarco]] ([[User talk:Marreromarco|talk]]) 09:23, 4 December 2021 (UTC)
 
[[User:Marreromarco|Marreromarco]] ([[User talk:Marreromarco|talk]]) 09:23, 4 December 2021 (UTC)

Revision as of 07:14, 5 December 2021

Chat rooms in various languages:
English · 🌐

Chatroom FAQ

How to download all audios of one language? By speaker?

Datasets are availale here. A script is updating the datasets every 2 days, using CommonsDownloadTool. For more, see Help:Download datasets.

How to add missing languages?

Administrators can add new languages on demand, they do so within few days. Please provide your language's ISO 639-3 code and/or its Wikidata ID. For more, see Help:Add a new language.

How to keep my wikimedia project up to date?

Contact Poslovitch, the master of Lingua Libre Bot. For more info, check out Help:Bots and LinguaLibre:Bot.

What IRL events are coming? When? Where?

Please see LinguaLibre:Events.

How to translate LinguaLibre User Interface into a new language?

Go to translatewiki.net. For more, see Help:Translate.

How to archive sections which have been answered?

After reviewing the section, add {{done}} ~~~~ to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Archives
202320222021202020192018

Datasets out of date

Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)

Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)

Publish on Wikimedia Commons

Hello, I just tested, but my records are not published on Commons. My tests: on Firefox, then on Chrome, with 50, then with 1 expression (s), with license CC3.0-BY-SA and CC1.0. —Eihel (talk) 06:51, 2 May 2021 (UTC)

Problème de publication sur Wikimedia Commons
phab:T281636Eihel (talk) 07:10, 2 May 2021 (UTC)
Usually I have the same with the first two recordings in a session. Then I can upload them again at the end. Try again with more recordings, and using "retry filed upload" button. Poemat (talk) 08:07, 2 May 2021 (UTC)
Yup, I had this bug many times. (I say "had" because I don't remember having encountered it after the fire incident.) Just don't give up and it should be published eventually. DSwissK (talk) 11:56, 2 May 2021 (UTC)
(As of 3 May 2021 and as I checked, I'm not aware of any code changes (history) which may have of affected this. Seb35 made some other code change this same day.) Yug (talk) 09:47, 3 May 2021 (UTC)

I add a user who has the same problem: Le Commissaire. —Eihel-LiLi (talk) 15:33, 6 May 2021 (UTC)

Bonjour @Seb35 , Faudrait voir avec Le Commissaire si le problème persiste aussi (avant de clore le ticket Phab. Sincères salutations. —Eihel (talk) 10:01, 4 June 2021 (UTC)
J’ai mis un message à Le Commissaire sur sa page de discussion.
Le problème que vous avez eu était spécifique à votre compte, c’est peut-être arrivé à d’autres personnes mais ça semble assez rare. Aussi, à partir du moment où un utilisateur a réussi à faire un envoi vers Commons, alors c’est un problème différent du vôtre (celui-ci, qui ressemble mais l’erreur est intermittente). Plus globalement, il faudrait que le message d’erreur soit explicite plutôt que d’aller à chercher dans la console du navigateur, je vais ouvrir un ticket Phabricator en ce sens. Seb35 (talk) 10:28, 4 June 2021 (UTC)

Exclusion lists

If anyone uses the regularly updated Olafbot's lists of wanted words (List:Fra/Lemmas-without-audio-sorted-by-number-of-wiktionaries, etc.), and spotted an item that should be removed without recording, you can use the brand new exclusion lists to remove it. For example on the list List:Fra/Lemmas-without-audio-sorted-by-number-of-wiktionaries there was the word "abandonar", which apparently doesn't belong to the contemporary French corpus. Having added it on the exclusion list (here: user:Olafbot/exclusion list/Fra) the bot knows this item should never appear in French lists it maintains, and removes it during the next update.

Each "Lemmas without audio" list (afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue) has a corresponding exclusion list (afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue). I hope it will help.

Normally I would add a link to the exclusion list in a description of each lemmas list, but unfortunately, Lingua Libre engine doesn't allow adding any kind of comments or descriptions to lists, so this ad is the only way to spread a word about the new functionality. Olaf (talk) 09:54, 13 September 2021 (UTC)

@Olaf Thank you so much for this useful new function! Indeed, the Record Wizard does not yet understand comments, categories nor templates on List pages, but this will be considered for future updates. — WikiLucas (🖋️) 18:48, 13 September 2021 (UTC)

Ajout d'une nouvelle langue

Bonjour !

Je souhaite ajouter la langue Q3196953 mais en suivant la procédure, je ne vois pas LinguaImporter. Quelqu'un peut-il me dire pourquoi?

Cdt, BamLifa

@BamLifa c'est parce que tu n'es pas administrateur. Je viens d'importer le Nande (Q646152) Pamputt (talk) 17:16, 13 September 2021 (UTC)
@Pamputt , merci beaucoup pour cette précision. Si cette option n'est réservée qu'aux admins, pourquoi en parler dans la doc sans cette précision ? En plus, vue la multitude des langues que nous avons qui n'existent pas encore chez Lingua libre, ne pensez-vous pas que vous devriez simplifier cette tâche ? J'ai encore une autre langue à ajouter, le Bira (bila). BamLifa (talk) 12:41, 20 September 2021 (UTC)
@BamLifa c'est indiqué sur cette page (c'est même le titre de la section (Outil pour les administrateurs)). Je ne me rappelle pas pourquoi c'est réservé aux admins mais ça limite au moins les vandales qui voudraient importer des choses qui ne sont pas des langues. Bref, j'ai importé le Bira (Q656403) et le Bila (Q656404). Si ce ne sont pas les bonnes langues, peux-tu me donner le code ISO 639-3 correspondant (ou au moins l'identifiant Wikidata) ? Pamputt (talk) 14:06, 20 September 2021 (UTC)
@Pamputt , Merci beaucoup. BamLifa (talk) 05:34, 22 September 2021 (UTC)

Lists still don't work properly

@WikiLucas00 @Poslovitch It's better than before, but still, sometimes the Record Wizard hangs when a list is chosen. Then I have to reload the page, and try again. Usually the second or the third time of trying the same list, it starts to work. Probably a race condition. Olaf (talk) 09:47, 30 September 2021 (UTC)

@Olaf It also happens to me sometimes, but I think that it could be related to the button for removing words you already recorded. When you load a list of words you never recorded (typically Olafbot's lists), ticking the button seems to kill the loading. Best — WikiLucas (🖋️) 10:23, 30 September 2021 (UTC)
Thank you. Indeed, with this switch unchecked everything seems to work. Olaf (talk) 16:02, 1 October 2021 (UTC)

Liste des mots à prononcer

Salut ! Existe-t-il une page où des mots peuvent être ajoutés pour qu'un bon samaritain puisse parler ? Vivaelcelta (talk) 11:30, 3 October 2021 (UTC)

Bonjour Vivaelcelta, les listes sont faites pour cela. Vous pouvez créer votre propre liste qui pourra ensuite être enregistrée par n'importe qui. Pamputt (talk) 16:50, 3 October 2021 (UTC)
Merci Pamputt. — Vivaelcelta (talk) 22:38, 3 October 2021 (UTC)

Projet Outils pour la patrouille

See LinguaLibre:Events/Patrol assistance tool prototyping project.

Hi,

This week, a project lead by student of University Toulouse 3 - Paul Sabatier is starting. It will be about the prototyping of patrolling tools. I supervise this project, assisted by Adélaïde Calais. The students study computer science with a specialization in Artificial Intelligence. The aim is to have them prototyping (or even developing) tools to help Lingua Libre's patrol, by automatically detecting any kind of mistake/error related to the files. We already identified a few types of mistakes: clicks, crackles, pops and labelling issues (wrong label/wrong language).

We need the community on two points :

  1. are there other problems you could think of?
  2. we need some recordings having issues, in order for the students to be able to work. If you already recorded them again, it is not a big deal, Commons has a file history. Don't hesitate to provide us the files that have or had problems.

Lastly, I created a project page, available here.

See you, Lepticed7 (talk) 09:19, 19 October 2021 (UTC)

Hello Lepticed7, Translated page —Eihel (talk) 19:49, 22 October 2021 (UTC)
Lepticed7, Adélaïde, could you specify the dates for this project ?
Also, were your point 1 and two answered by the community somewhere ? (If not I could give it a try) Yug (talk) 13:19, 15 November 2021 (UTC)
@Yug Hi, I updated the project page with the dates. And I didn’t get any answers to my questions. Lepticed7 (talk) 11:25, 28 November 2021 (UTC)

Rashidun Caliphate

Hello @Zinou2go , LL-Q13955 (ara)-Zinou2go-الخلافة الراشدة.wav is problematic (currently الخلافة الراشدة (Q204439) on LiLi): it contains several cuts (clicks). I proposed the file for deletion in Commons. The recordings seem to be working better, could you record Rashidun Caliphate again? I didn't check the other records, but they are likely to have "clicks" as well. Also, can an admin delete this item on LiLi, please? Cordially. —Eihel (talk) 15:31, 12 November 2021 (UTC)

@Eihel Please do not nominate files for deletion before asking for the speaker to record it again and waiting a while for their answer. Also, these recordings will come useful for the team currently working on the audio issues of Lingua Libre, so we'd better not delete them (I thought you read my messages on Discord about this). — WikiLucas (🖋️) 15:48, 12 November 2021 (UTC)
@WikiLucas00 , J'ai enlevé la suppression sur Commons. —Eihel (talk) 15:54, 12 November 2021 (UTC)

Code of Conduct

Hi everyone, I just noticed again MediaWiki's mw:Code of Conduct (2015) and Wikimedia Foundation's foundation:Universal Code of Conduct (2021/02). Back in 2015, 0x010C included the first one as a condition to contribute to RecordWizard's codebase. As far as I know, Lili.org and its community, so far, has no Code of Conduct. We may be implicitely binded by it or by some Wikimedia France's Code of Conduct, but it would be cleaner to explicitly adopt one and display it here, in written. We could therefor do the following :

  1. Short round to confirm with have nothing in place so far.
  2. Vote for 2 months to adopt the most recent foundation:Universal Code of Conduct (2021/02)
  3. Copy the text into LinguaLibre:Universal Code of Conduct.

Yug (talk) 14:48, 14 November 2021 (UTC)

Pre-discussion

Do we already have a Code of Conduct binding LinguaLibre ? Yug (talk) 14:48, 14 November 2021 (UTC)

Vote

Are you for or against adopting the foundation:Universal Code of Conduct (2021) as a code of conduct for LinguaLibre's community ?
Possible votes : {{Support}} • {{Weak support}} • {{Weak oppose}} • {{Oppose}}

  • Support Support (proposer) — better to be explicit, have a framework in place, just to be clear to all on where we stand. Yug (talk) 14:48, 14 November 2021 (UTC)

Lingua Libre website should be more appealing to Language Learners

See also Forvo.com.

It would be useful if LinguaLibre follows the example of Forvo to increase the number of language learners interested in the Project.

Forvo.com has a way of displaying the information that engage users and makes it very easy to find pronunciations.

For example, if someone wants to learn how to pronounce "Honoré de Balzac" in French, it would be faster to find the audio on Forvo than on LinguaLibre. Also, Forvo displays the data in a way more appealing to language learners:

Would it be possible to improve the way that data is displayed on LinguaLibre to make it more appealing to Language Learners ? In such way, the number of active users recording audios would increase significantly. -- Marreromarco

Some people previously reported such "issue". There is a ticket on Phabricator to keep this in mind. However, the priority is currently given to develop patrol tools for Lingua Libre and we do not expect to see major improvements related to the audio brosing in the coming months (at least if we have no more external developers). I think it is like this because Lingua Libre has been though so that it helps for recording, not for listening; the second is let to the other Wikimedia projects, mainly Wiktionaries et Wikidata. Pamputt (talk) 16:00, 14 November 2021 (UTC)
YES ! There are oral discussions and proposals in this direction, but LinguaLibre being a volunteers-based team, we are moving slowly. Forvo is a for-profit entity, it locks the copyright and resale of recordings made on its platform to the speaker-creator and to themselves, to then sell those recordings with a profit. They therefor have money and swift decision-making to sustain their UI/UX efforts. We are shorter on those sides. --Yug (talk) 16:30, 14 November 2021 (UTC)

Sound Library's forking and hacking

On the Sound Library side, I was able to duplicate/fork it, which allows to start hack its CSS. Copy those codes into your own namespace :

In those codes, you then have to replace all occurrences of "Yug" by your username, and it's should work. You can start hacking toward a more elegant interface. Note: the JS copy is in your *personal* JS and has a "stop" condition so the various JS instances won't fight. --Yug (talk) 16:30, 14 November 2021 (UTC)

Allow recording only in the user's Native Language to avoid passing "mispronunciations" to Wiktionary

I started a discussion on the German Wiktionary because some words on LinguaLibre are not available on the DeWikt. The German Community told me that LinguaLibre adds words into Commons, but the Bot only accepts audios from “few” trusted users using a filter.

The English and German Wiktionaries use a bot called "DerbethBot" to add audios from Commons. However, the English Wiktionary community asked to block Lingua Libre's recordings because there were non-native speakers recording audios and the Bot had no way to differentiate them from Native speakers. After the audios were introduced in the English Wiktionary they had to forbid adding audios from LinguaLibre:

https://en.wiktionary.org/wiki/Wiktionary:Beer_parlour/2020/July#Labeling_non-native_audio

I believe that it is necessary to avoid giving “mispronunciations” to Wictionaries. That is similar to vandalism on a Wiktionary if the reader doesn't know that it is hearing a bad pronunciation and believes that it is “native speaker”:

Some suggestions: 1) Would it be possible to name the audios files to specify if the speaker is a native or not? For example, if a French speaker records the word "maison" it could be named "maison-fr-native.ogg" . If a language learner records the same word : "maison-fr-learner.ogg"

2) A radical way to address the issue would be to only allow to record in one's native language. Of course, users could change it, but strong warnings could be added and always remind people to record only their native language. Forvo seems to take this approach.

It might be valuable for Linguists to have recordings of non-native speakers to study their accent features in an L-2 Language. However, in my humble opinion the pronunciations added to Wiktionary should be only native speakers and bots should have a way to differentiate them.

Link to the German Wiktionary discussion about LinguaLibre: https://de.wiktionary.org/wiki/Wiktionary:Teestube#:~:text=von%20technischer%20seite%20gibt%20es%20keinem%20problem%2C%20zwei%20bots%20auf%20de.wiktionary%20arbeiten%20zu%20lassen.

Hi, this depends on the Wikitionary policy, and it could be different from a language to another one. Anyway, it is already possible to select only recordings done by native speaker. To do that, the speaker has to fill the language level (P16) property ith the value native (Q15) (see for example Pamputt (Q466)). Other values for language level (P16) are given here. Pamputt (talk) 16:38, 16 November 2021 (UTC)


Sursilvan

Check-green.svg Done

User:Franz.Roos.1955 made 2 recordings in en:wp:Sursilvan : rauna (rauna (Q689785)), ‎tschitta (tschitta (Q689786)). Sursilvan has no iso code. Do we have a procedure for such languages ? (I forgot if the case already shown up). Yug (talk) 20:37, 17 November 2021 (UTC)

There is not issue. It simply uses the Wikidata identifier when there is no ISO code. Se for example Occitan auvernhat (Q1186). To record in such languages, we have to create an item for this language/dialect on Lingua Libre, and this is already done for Sursilvan (Q74905). Pamputt (talk) 21:59, 17 November 2021 (UTC)
Thank Pamputt for the clarification. Yug (talk) 23:12, 17 November 2021 (UTC)

commons:commons:structured data

I've been very pleased with LL's tooling, that does so much of the process of uploading to Commons, sensible naming, description-writing, and categorisation for me; however, I have an idea for an additional step LL could automate. This is in Commons' no-longer-so-new structured data section, which manifests (among other ways) as a tab on the file page.

As an example of what could be automatically added to a file's datastore, there is a property called 'audio transcription' which serves a similar role to Commons' TimedText subtitle functionality (silly example: commons:TimedText:051226-kakapo-billbooming.ogg.en.srt) but for shorter clips -- in other words, seemingly designed with applications like LinguaLibre in mind.

Since these are of the so-called 'monolingual text' datatype, the source language can be specified (or where not part of the main set of languages Wikimedia uses, the special code 'mis' is used and 'language of work or name' used as a qualifier) at the same time as the actual text that is being spoken, which LL has access to since the audio file started out as a text prompt!

What think y'all? Arlo Barnes (talk) 04:25, 19 November 2021 (UTC)

Hi Arlo Barnes there is Phabricator ticket about this topic. Currently there are not yet all properties on Wikidata to fit all Lingua Libre properties. For example, I proposed to create a property for the language level of a speaker but it did not get enough support. SO I guess, we should first list all properties we would like to add on SDC. Pamputt (talk) 07:18, 19 November 2021 (UTC)

[Feature Request] Play next sound automatically while checking recordings

After recording sounds it is important to check them to verify their quality. However, it is very tiring to record 380 words and afterwards have to click 380 times on the “Next button” while checking them.

After recording, would it be possible to add a button to "Play next sound automatically" ? Screenshot Here Marreromarco (talk) 04:09, 20 November 2021 (UTC)

Agreed, it is already tracked on Phabricator. Pamputt (talk) 09:45, 20 November 2021 (UTC)

"How to use Lingua Libre for your language learning"

I recently found a "new" way to benefit from the sounds on Lingua Libre. I would suggest that it could be advertised on the Lingua Libre main website and on the Wikipedia in French/English:

  • GoldenDict is a FOSS Dictionary application very valuable for language learners.

A way to benefit from Lingua Libre recordings is to download the datasets, unzip them and "load" the sounds on GoldenDict (as Sound Directories. Screenshot here). In such a way, users have easily an offline "Pronunciation Dictionary". It is very easy to do. Here is an screenshot of how it looks to GoldenDict the French word "fuir". Another example here.

Lingua Libre sounds can be used with GoldenDict OFFLINE. That is a huge advantage in developing countries, where language learners often do not have reliable internet connection.

It would be valuable to create a description on the Lingua Libre website about "How to use Lingua Libre sounds for your language learning" .

There it would be possible to describe how to use the audios offline with GoldenDict, etc. If more methods are developed (Anki add-on), better GUI, Android App, etc. they could be explained there.--Marreromarco (talk) 04:41, 20 November 2021 (UTC)

1) Reuse of datasets : Yes ! Dataset download and reuse must be showcasted and strengthened. I think a "Reuses gallery" page could be created, with screenshot and minimal how-to for GoldenDict, Anki and others.
2) Anki: You are the 4th or 5th contributor to rise the need for an Anki add-on. We need to do something on this side, yes. It's more than 1~2 days work and too big for a volunteer work, so we need to apply for a grant. I'am looking in and mapping our options at the moment ({{Grants table}}). At one point we have to jump in and design a project, yes.
3) For e-learning app, a 5k€ project was designed by myself a year ago. The funding by local regional government was declined, but it could easily be refreshed.
We have to redesign some projects and apply in early 2022. Yug (talk) 09:28, 23 November 2021 (UTC)
The core question is the Human Resources.
*Daily routines* keeps WikiLucas, Pamputt, Poslovitch and myself –aka the community-side contributors— busy maintaining the place, welcoming and guiding new users, cleaning pages, etc. We are now quite smooth, successful and stable on this side.
To *push forward* on developments, UI, tools, e-learning, communication, grants, we each have one or two side projects in mind, pushing those slowly. But as always in FOSS projects the task ahead is much larger and we could achieve much more with more human resources.
Overall, it's possible we are at a new turning right now. As things are stable, with road maps available, we just need 1 to 3 new coordinators and communicants contributors to tip the dynamic into forward-offensive mode, with communication therefor new arrivals, new speakers, new devs, new coordinators and really push forward with new events/workshop, funds and SMART features.
@Marreromarco, I'am currently writing down structuring "community how to" to ease new contributor's jumping in (see LinguaLibre:Roles, LinguaLibre:Workshops, {{Grants table}}). You are doing a nice push on communication (It's FOSS) and with your questions you are mapping out Lili's needs. Pamputt and WikiLucas are following our progresses. All this is pretty interesting. Yug (talk) 10:48, 23 November 2021 (UTC)
I would like to work on the "Public Relations" Department of LinguaLibre! - EDIT (28th Nov. 2021) : Any PR campaign would fail miserably if there is no search function. I explain the reasons at the end of this section: LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign

Marreromarco (talk) 23:49, 23 November 2021 (UTC)

Sound good :) Your outreach to YouTubers and popular FOSS blogs is spot on.
I am back from a wikibreak, I am cleaning up some last pages, then since the maintenance side is stable I would like to focus my energy on projects design –recording rare languages, technology, PR campaign– and associated grant requests to secure funding and the actual realization of those visions. We can collaborate. You lead on the PR : design your campaign. I can review and help it to fit some Grants formats. Yug (talk) 18:00, 24 November 2021 (UTC)

I created a new wiki page in the "events" section of a "PR Campaign for 2022". Please visit LinguaLibre:Events/Winter 2021-2022 Public Relations Campaign and participate in the discussion with new ideas. EDIT (28th Nov. 2021) I will NOT contribute anymore to a PR campaign. the reasons are explained as comment on the relevant section Marreromarco (talk) 21:20, 25 November 2021 (UTC)

Creating a LL catgory for a dialect

Would be grateful if someone could tell me if it's possible to create a LL category for a dialect?

We're working in Konkani, which has its own (but small) Wikipedia at http://gom.wikipedia.org Under Konkani, there are some dialects spoken, the pronunciation of one can be different from the other.

Would like to create a category for Saxtti (the Salcete dialect of Konkani). This will ensure that readings don't get overwritten by other dialects. Also, it would allow the recordings of many others which might have already been done in Konkani as a how.

Question: How do we create space for the dialects of a language?

Thanks very much, in advance! --Fredericknoronha (talk) 13:34, 27 November 2021 (UTC)

Hello @Fredericknoronha and welcome to Lingua Libre. I imported Goan Konkani (Q700683) (gom) as it was not on Lingua Libre yet. On Lingua Libre, dialects are treated the same way as languages. You can create an element for your dialect on Wikidata (example for auvergnat dialect) and tell us once it is ready, so that we can import it on Lingua Libre with an admin tool. You can also directly create an element for your dialect on Lingua Libre, following the steps described at Help:Add a new language and taking example of Occitan auvernhat (Q1186). Don't hesitate to ping an admin if you have any questions.
All the best — WikiLucas (🖋️) 15:35, 27 November 2021 (UTC)
« there are some dialects spoken, the pronunciation of one can be different from the other. […] This will ensure that readings don't get overwritten by other dialects. »
If the writing are similar but only the pronunciation differs depending on where the speaker comes from, it looks like different accents.
Recordings are specific to a word, a language and a speaker. Which means me recording in French the word "bonjour" will be one audio file on Lili. WikiLucas can record in French the same word "bonjour", it will create an other audio file on Lili. My recording(s), since i come from the South West, will carry the southern accent. Recordings by WikiLucas, who lives 700km East of me, will cary the Lyon area accent. Lingualibre will store 2 recordings, one per user. Yug (talk) 21:59, 27 November 2021 (UTC)
Hello Fredericknoronha, I have imported Salcete Konkani (Q701734) so that you can now record words in that dialect. Pamputt (talk) 17:21, 28 November 2021 (UTC)

Feedback about Lingua Libre by Professor Carol Genetti, PhD.

Dear Members of Lingua Libre, I am pleased to share a message from Professor Carol Genetti, a linguist and leading expert in endangered languages. Professor Genetti is author of one of the best books in the field of Linguistics called "How Languages Work". Her vast knowledge and experience are extremely valuable and after reviewing Lingua Libre she said:

Thank you for contacting me and letting me know about this initiative. It is an interesting idea. I especially like the multilingual menus -- very helpful.

Are you aware of this website, hosted by the University of Hawaii (and, I believe, funded by Google). So one thing that occurs to me is the proliferation of such sites. How will people in an endangered-language community find out about their options, and then make an informed choice about which of these online resources will be best over time for their communities? Should such efforts cross-reference each other?

My second thought has to do with longevity. It takes a significant commitment to support a site like this over time. The challenge is having someone who can keep such sites funded, working, organized, relevant, and engaging users over time. How will you make sure that the data will be available in 10, 50, 150 years? Maybe you get that automatically by being associated with Wikipedia. If so, state that. Also, there should be a clear statement of how such data might be used, and by whom, so speakers know that if they record a wordlist, someone might use if for some purpose without their permission (is that right?). I'm sorry to have to bring a down-to-earth message to the inspiration and passion for endangered languages that has clearly fueled this work, but having seen other initiatives stumble in this way, I wanted to be sure that you are thinking about this. Speakers will be entrusting you with such valuable pieces of their lives and their cultures. How will you safeguard this over time? Let people know. Those issues aside, here are a couple of other comments:

* There should be a statement targeted for speakers of endangered languages - why would they want to do this? What is the value for them and their communities? What will happen to the recordings? etc. * Will you provide speakers with suggestions for what vocabulary to record, e.g. greetings, colors, verb forms? * It would be helpful if it was clear from the large list of languages which ones have recordings. Maybe put those in a different color font? * It would be helpful to include translations of the words into one of the world's major languages or the national language. Otherwise, someone's grandkids coming to this in 30 years will not know what the words mean. * Do you want to move beyond single words to a piece of connected discourse, such as a short poem or story, a song, or the reading of some common text (such as a sentence from the UN Declaration for Linguistic Rights)? * Should there be a means to flag inappropriate content?

I hope that you find this helpful. And I'm so glad you liked my book! It is lovely to hear that people have found it helpful.

Carol Genetti Vice Provost for Graduate and Postdoctoral Programs NYU Abu Dhabi (she/her/hers)

Marreromarco (talk) 09:23, 4 December 2021 (UTC)