LinguaLibre

Difference between revisions of "Chat room"

Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.

Line 190: Line 190:
 
Preparing words to be used in Lingua Libre has always been challenging. But I think this is a shared challenge. Crawling text from different sources and creating a clean list of words is very important. I've used [[User:Titodutta/Bengali_words_from_pages|Tito's]] instructions in the past, but using multiple tabs and multiple tools is not the best user experience. So, I thought I'd create something that is functional for me and simple enough to be tweaked. Introducing [[User:Psubhashish/tools/Prepare words for Lingua Libre|"Prepare words for Lingua Libre"]]. The tool is currently set for Odia but can be easily tweaked for other languages using non-Latin scripts. I'd request Lingua Libre core team to incorporate the tool into Lingua Libre so that users can use the platform to create a wordlist. Extracting words from any random text is always hard, especially new contributors. --[[User:Psubhashish|Subhashish]] ([[User talk:Psubhashish|talk]]) 03:44, 14 March 2023 (UTC)
 
Preparing words to be used in Lingua Libre has always been challenging. But I think this is a shared challenge. Crawling text from different sources and creating a clean list of words is very important. I've used [[User:Titodutta/Bengali_words_from_pages|Tito's]] instructions in the past, but using multiple tabs and multiple tools is not the best user experience. So, I thought I'd create something that is functional for me and simple enough to be tweaked. Introducing [[User:Psubhashish/tools/Prepare words for Lingua Libre|"Prepare words for Lingua Libre"]]. The tool is currently set for Odia but can be easily tweaked for other languages using non-Latin scripts. I'd request Lingua Libre core team to incorporate the tool into Lingua Libre so that users can use the platform to create a wordlist. Extracting words from any random text is always hard, especially new contributors. --[[User:Psubhashish|Subhashish]] ([[User talk:Psubhashish|talk]]) 03:44, 14 March 2023 (UTC)
 
:Hi [[User:Psubhashish|Psubhashish]]. This is really nice. Do you think it would be easy to adapt it to create a [[Help:Create_a_new_generator|new generator]]? Generators can be used by anyone after they import them in their common.js. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 06:44, 14 March 2023 (UTC)
 
:Hi [[User:Psubhashish|Psubhashish]]. This is really nice. Do you think it would be easy to adapt it to create a [[Help:Create_a_new_generator|new generator]]? Generators can be used by anyone after they import them in their common.js. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 06:44, 14 March 2023 (UTC)
:: Thanks [[User:Pamputt]]. That would be fantastic, but I probably don't have the right knowhow for doing that. I did take ChatGPT's help to create a [[Psubhashish/common.js|.js version]] from the [[User:Psubhashish/tools/Prepare words for Lingua Libre|HTML code]] I had shared earlier but would appreciate any help. I think having a tool inside Lingua Libre would be great so really liked the idea of new generators. Common users would like things well packaged rather than jumping from one platform to another. --[[User:Psubhashish|Subhashish]] ([[User talk:Psubhashish|talk]]) 13:09, 14 March 2023 (UTC)
+
:: Thanks [[User:Pamputt]]. That would be fantastic, but I probably don't have the right knowhow for doing that. I did take ChatGPT's help to create a [[User:Psubhashish/common.js|.js version]] from the [[User:Psubhashish/tools/Prepare words for Lingua Libre|HTML code]] I had shared earlier but would appreciate any help. I think having a tool inside Lingua Libre would be great so really liked the idea of new generators. Common users would like things well packaged rather than jumping from one platform to another. --[[User:Psubhashish|Subhashish]] ([[User talk:Psubhashish|talk]]) 13:09, 14 March 2023 (UTC)

Revision as of 13:11, 14 March 2023

Chat rooms in various languages:
English · 🌐

Chatroom FAQ

How to download all audios of one language? By speaker?

Datasets are availale here. A script is updating the datasets every 2 days, using CommonsDownloadTool. For more, see Help:Download datasets.

How to add missing languages?

Administrators can add new languages on demand, they do so within few days. Please provide your language's ISO 639-3 code and/or its Wikidata ID. For more, see Help:Add a new language.

How to keep my wikimedia project up to date?

Contact Poslovitch, the master of Lingua Libre Bot. For more info, check out Help:Bots and LinguaLibre:Bot.

What IRL events are coming? When? Where?

Please see LinguaLibre:Events.

How to translate LinguaLibre User Interface into a new language?

Go to translatewiki.net. For more, see Help:Translate.

How to archive sections which have been answered?

After reviewing the section, add {{done}} ~~~~ to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]].

Archives
20222021202020192018

Results of Coverage Test of French Lemma and Non-Lemma forms is English Wiktionary

While playing around with generating lists for pronunciation from Wiktionary, I decided to run a few tests on the current coverage of French lemma and non-lemma forms in English Wiktionary. I choose French because it is the largest datasets in LL.

Current Coverage of French in Lingua Libre

  • Total French Entries in Lingua Libre by a native speaker: 233 982
  • Unique French Entries in Lingua Libre by a native speaker: 154 358
  • Percentage of overlap: 34%
  • Term with the greatest number of pronunciations: "blanc" with 40

Current Coverage of Category:French lemmas

  • Total entries in Category:French lemmas: 84 482
  • Pronounced entries: 50 917
  • Entries with pronunciation: 33 565
  • Coverage Percentage: 60.27%

Current Coverage of Category:French non-lemma forms

  • Total entries in Category:French non-lemma forms: 29 1225
  • pronounced entries: 26 791
  • Entries with pronunciation: 264 434
  • Coverage Percentage: : 9.20%

For me, there are several lessons to be drawn.

  1. First, there has been amazing growth on LL. Covering 60.27% percent is a real achievement.
  2. The overlap percentage is quite small overall.
  3. There needs to be a clearer sense of when LL should stop requesting pronunciations for a certain term because 40 pronunciations of "blanc" seems a bit excessive.
  4. A need exists to continue pro-actively targeting entries in Wiktionary that are not in Lingua Libre. Currently, 297 999 French lemma and non-lemma forms require pronunciations.
  5. Generating lists from Wiktionary and checking coverage is not as hard as I thought.
  6. Lingua Libre has almost caught up with Forvo in the number of French pronunciations (233 982 vs 254, 703). Overall, Lingua Libre has shown amazing and healthy progress in a very short period of time. I'm excited about these results. Languageseeker (talk) 03:07, 1 June 2022 (UTC)
@Languageseeker This investigation is pretty cool. (I'm not sure i understand all your numbers yet, but i will read again when back on my PC). Its quite nice to see we are reaching Forvo level for our lead language. It's possible we have more unique words than forvo since we have user:Olafbot actively guiding and pushing us on that path.
On Lili we have chosen to be a learning AND linguistic diversity audio database. When you account for gender, regional accents, age, voice type, having 40 french audios for a word is still 400+ voices short.
Also, all contributors are not able to contribute audio perfect files due to various shortcomings (hardware, no recording room, no noose cancelling system, etc). We lack proper rating and review system. It's on our [slow] roadmap tho. 😉
PS: Should i answer to you in French i get a feeling you are French or learning it. Yug (talk) 15:07, 1 June 2022 (UTC)
@YUG Salut, Yug. Oui, je suis en train d'apprendre le français. Comme nous avons discutez pendant notre reunion, c'est difficile de definer les limits d'une language. Comme je le vois, les formes lemma ne suffit pas. Maintenant, je suis en train de crée un Olafbot sur steroid pour francais. Mon plan est de réaliser un program python qui peux analyser les modèle utilizer sur Wiktionary. Languageseeker (talk) 15:48, 7 June 2022 (UTC)
Hi @Languageseeker . I'm sorry I did not visit the Chat Room in a long time, and missed your report. Very interesting, good job! I remember a request I made to Olaf some time ago: it would be interesting to have a list similar to the one Olafbot is updating, but containing only lemmas of the target language (to quickly have nearly all lemmas of a dictionary illustrated with an audio pron). Also, I suggest you to use the categories of the French version of Wiktionary when you plan to work on French (and some other languages, that are more extensively described there). As you can see here, the category gathering French lemmas is more than 3 times more complete on the fr. version than on the en. version of Wiktionary. As you mentioned, these numbers are exciting, let's keep up the good work! All the best — WikiLucas (🖋️) 15:47, 26 November 2022 (UTC)

How to create user page

Hello, my user name is Ngangaesther from Kenya. I am still stuck on how am supposed to create my user page kindly help regards Esther

Odia language missing from Stats/Languages

Hi there, for some reason, the Odia-language stats are missing from the Stats/Languages page. Also, "The most prolific speakers for the current month " section in the Stats/Speakers page is not loading at all since the time I checked last (about 10 days). I have tried on Chromium and Firefox and the result is the same even after clearing cache. --Subhashish (talk) 19:40, 28 July 2022 (UTC)

Hello Subhashish, it should be back online. We had a hackathon to put it back. We are calling for devs to push forwards. Yug (talk) 11:07, 10 August 2022 (UTC)
Thank you for the update, Yug. --Subhashish (talk) 14:00, 10 August 2022 (UTC)

Manually-coded languages

I came across meta:Lingua Libre/SignIt recently (via betawiki) and was wondering if manually-coded languages would be appropriate for this as well? These are languages in sign modality, but strongly tied to a spoken/written language; they usually adopt the grammar of the nonmanual language, choosing instead to simply transpose the vocabulary. This means they are most often used in application-specific and pidgin contexts (Pidgin Sign for English and diver's signs are examples). In particular, I am interested in toki pona luka, a manual form of toki pona (Q338540). Since the vocab is the same as spoken/written toki pona, there are a minimal number of lexemes overall, so having a complete set of signs is easily achievable. Manually-coded languages including toki pona luka are generally not given a separate ISO 639 code since they are in effect equivalent to scripts. Would this cause a problem for the infrastructure as currently designed? Arlo Barnes (talk) 05:56, 17 August 2022 (UTC)


Hello Arlo Barnes,

I understand "manually coded languages" as synonymous to "signed languages", am I correct?
If there is no distinct ISO for the signed language, we could still:

  • Create a new wikidata item without ISO, which will be used as identifier by LinguaLibre infrastructure
  • Use the spoken/write language ISO, and create lists of words all suffixed by (signed).

Either of those solutions could work.

If you have some knowledge of signed toki pona luka please let me know. We are adding features on Lingualibre and SignIt in order to be able to record video of signed words by late 2022. We are almost there. If you would like to record some basic signed words to share with the world, then let me know. Yug (talk) 20:58, 17 August 2022 (UTC)

Signed languages and manually-coded languages share similarities (the manual modality) and differences (since sign languages are 'native' to the signed modality, they use it more fully, having complete deixis and time-reference systems, use of handshape classifiers, etc.) -- 'luka' means 'hand'/'five', so that's the part of the name that indicates the manual modality, but otherwise it's just garden-variety toki pona. I am interested in using SignIt to record this vocab, yes. The '(signed)' suffix seems like a good way to do it. Arlo Barnes (talk) 13:16, 19 August 2022 (UTC)
Arlo Barnes: We increasingly have tools to update and correct sign language recordings, so the suffix (signed) or the solution we choose appears incorrect, we still can correct it later using that bot.
I would encourage you to first train yourself and learn that manually-coded language over the coming months. Indeed, we still have a very last bug within our video recording chain, which makes rightful videos appears as audio on Commons. We expect to solve this last issue this fall (September or October ?). So for now, I encourage you to rest well, reload energy, to get ready to record later this year. Maybe identify near you some suitable place with elegant monochrome wall to film over or consider building yourself a low-cost recording studio,. Etc. We can discuss it to keep it low cost and effective if you are interested, as I'm also looking for such walls and/or considering building one for myself.
See also : Minimal Sign Language Studio guideline. Yug (talk) 22:30, 19 August 2022 (UTC)

Update my username

I have changed my Wikimedia username but the previous name still appears in Lingua Libre. I know it's not included in unified logins. Anyway, please update my username to Aishik Rehman. Hirok Raja (talk) 15:14, 1 September 2022 (UTC)

Hi Hirok Raja¸would you have an example of what you would like to see to be changed? I think you are talking about the filename but I am not sure, so with one example, it would be clearer. Pamputt (talk)
@Pamputt
1. Top menubar of lingualibre.org showing 'Hirok Raja' as my profile name.
2. After uploading when I try to check my uploads in Commons, it takes me to https://commons.m.wikimedia.org/wiki/Special:ListFiles/Hirok_Raja page.
3. 'Hirok Raja' being used as Default recorder in the file names and description
4. Change speaker name to 'Aishik Rehman' every time while recording is quite annoying to me.
5. Even here 'Hirok Raja' is showing as my signature by default ): Hirok Raja (talk) 19:16, 2 September 2022 (UTC)
I suspect this is due to long term cookies. Would be interesting to push a clean up for your connection cookies for Lingualibre, it will log you out, then come back here. On firefox.
Open about:preferences#privacy > Go to "Cookies and Site Data"> Click "Manage Data" > Search "Lingualibre" > Remove selected. Yug (talk) 21:10, 2 September 2022 (UTC)

Siège communautaire de Wikimédia France – ouverture du vote / Community representative to Wikimédia France’s board - votes are opened

(English version below. Do not hesitate to correct my English translation.)

(Message copié depuis le bistro du jour par Lepticed7 (talk))

Bonjour,

En tant que président de la commission électorale pour l'élection du siège communautaire au conseil d'administration de Wikimédia France, je vous annonce que le vote ouvre aujourd'hui (13 septembre) à 0h CEST. Il se terminera le 26 septembre à 23h59 CEST.

Comme il y a trois ans, le scrutin est public sur Meta. Les pages de votes sont disponibles dans la catégorie correspondante ou en lien sur la page principale. C'est un scrutin par approbation, le candidat qui aura le plus grand nombre de voix sera donc déclaré élu. Vous pouvez voter pour autant de candidats que vous le souhaitez.

Si vous avez des questions, vous pouvez les poser sur la page de discussion ou par courriel à election@wikimedia.fr.

Pour la commission électorale, Mathis B, le 12 septembre 2022 à 22:00 (CEST)


(Message copied from the French Wikipedia Bistro by Lepticed7 (talk))

Hello,

as the chairman of the electoral commission for the election of the community representative to Wikimédia France’s board, I announce that votes open today (13th september) at 0:00 CEST. They will be closed on 26th september at 23:59 CEST.

Like it was the case three years ago, voting is on Meta. Voting pages are available in the corresponding category or as links in the main page. The elected candidate will be the one with the most approbation votes. You can vote for as many candidates as you wish.

If you have any questions, you can ask them on the Talk page on Meta, or by email at election@wikimedia.fr.

For the electoral commission, Mathis B, 22:00, 12 septembre 2022 (CEST)

Is there a way to exclude username from Wikimedia Commons upload file name?

See also Help:Renaming.

This seems redundant and takes up a lot of space --Middle river exports (talk) 20:22, 9 October 2022 (UTC)

@Middle river exports Welcome MRE,
You could name your speaker with a single character I guess.
But keeping the name is voluntary. Each speaker has his/her own voice, which we want to document. If, outside of Wikimedia, you want to remove part of the filename, we have a technical tutorial to do so. See Help:Download datasets and Help:Renaming. Ping us back if your dataset is not up to date. Yug (talk) 13:16, 10 October 2022 (UTC)
I have solved this now by just changing my username to something shorter. This way I can upload English as Usmaan (عثمان) for example where instead of just repeating the username it shows two scripts which is more useful. (Apparently few enough people have Arabic script usernames that short common words are mostly available.) --عثمان (talk) 20:23, 10 October 2022 (UTC)
All Unicode characters should be ok, in words and usernames ;) Yug (talk) 19:46, 11 October 2022 (UTC)

Username update request

I realised my username on Mediawiki didn't carry over here when I changed it. On thus site could I please have it changed to: عُثمان --عثمان (talk) 08:45, 10 November 2022 (UTC)

Data on LinguaLibre:Stats isn't consistant with Wikipedia Commons's Category

On the Stats page, the French have 254,387 records

https://lingualibre.org/wiki/LinguaLibre:Stats/Languages

Meanwhile, the Category on commons.wikimedia.org has 253,464 records

https://commons.wikimedia.org/wiki/Category:Lingua_Libre_pronunciation-fra

The stats display more records. This data inconsistency is strange. -- User:Shenlebantongying, 10:36, 23 december 2022.

This means some item page exist here, but no audio are on Commons.
Item creation here and upload are done at step 5 of the recording, nearly simultaneously.
So I don't know what is going on. Yug (talk) 17:41, 26 December 2022 (UTC)

c:Category:Lingua Libre pronunciation-bxg

All files in this category are tagged with wrong language. I have requested moves for files in the category, but what's more to be done?--GZWDer (talk) 13:05, 12 January 2023 (UTC)

Thanks for reporting. Actually all these items are erroneous (see Special:WhatLinksHere/Q590228):
I have not checked yet if corresponding recordings are still on Commons. Pamputt (talk) 16:11, 13 January 2023 (UTC)

I can not publish my records recorded via Lingua Libre.

Dear Colleagues,

It records, but when I press the button to publish it on Wikimedia Commons. It does not work. It returns as "Retry failed upload" Any idea? Thank you. Key Mîrza (talk) 05:09, 28 January 2023 (UTC)

Is it happening for all your recordings or only some of them? Pamputt (talk) 08:49, 28 January 2023 (UTC)
It was all good until a month ago. Nowadays I am on a vacation in another city and trying to enter to my accout and make some more records. I can enter into my account and I can create records, but I can not publish them. I stuck at publishing stage. Nothing publishing. None of my records publishing. I even tried to record via my cell phone, even there nothig publishing. By the way, I just saw your previous message wecoming me. Thank you, for your kind wish. Best wishes... Key Mîrza (talk) 09:57, 28 January 2023 (UTC)
Hmmm, I do not know what to say. Sometimes some recordings do not upload but they other do. When none recording uploads, I do not know what could be the origin. Could you try with another webbrowser (firefox or Chrome)? To go further, I think we would need a Javascript expert that could have some hints. @Poslovitch & Lepticed7 maybe ? Another question, how many words do you try to record? If this is a lot, could you try with only a few (less than 10 for example). Pamputt (talk) 15:42, 28 January 2023 (UTC)
I tried 11 words together, then even 1 word only for testing purpose. Nothing worked. You said Java. Do I need java to be able to work with the application? If so, that I need to install Java. Because I formatted my PC. May be it is not installed. Thank you. Key Mîrza (talk) 17:06, 28 January 2023 (UTC)
Java is different than Javascript. Javascript is language supported by the webbrowser so you do not need to install anything else than a webbrowser to record pronunciations on Lingua Libre. Unfortunately, I cannot dig further in this direction because I almost know nothing about Javascript. Pamputt (talk) 21:18, 28 January 2023 (UTC)
Thank you, anyway. Key Mîrza (talk) 22:38, 28 January 2023 (UTC)
Key Mîrza, thank you a lot for your voice, it make us discover new languages. Please be aware Lili works best on solid desktop computers. Also, you likely have a limit of 380 records uploads per 72 minutes. So you may need to leave your tab open, and click "retry" after that. You can expand those right by making a demand on Commons. See LinguaLibre:User rights. Contact us if you think it may be that. Yug (talk) 15:07, 5 February 2023 (UTC)
It's confirmed, as all new contributor you are limited to 380 uploads per 72h. You can get more userrights by requesting those rights on Commons. Yug (talk) 15:15, 5 February 2023 (UTC)

Late 2022-2023 Winter report

Hello all, allow me to share few overall news from the various recent, ongoing, or near-future efforts.

  • 🤖 User:Pamputt has taken over Lingualibre Bot and added support for the Kurdish wiktionary. See github.
  • 🌏 Melody (WMFr intern) and myself made a mini-editathon on writing template emails for outreach. See Lingualibre:Events.
  • ⚡ User:Elfix and myself will attend are collaborating for sparql requests (me) optimization (Elfix). We aim to create and languages gallery this spring.
  • 🔴 Wikimedia France's freelance on the record wizard is back on track, delivery of fixes should occur around May-June.
  • 🙋‍♀️ Adelaide (WMFr) mentioned the wish of a second intern on Lingualibre outreach this summer, to reuse Melody's assets, expand actions and geographic diversity.
  • 🫱🏼‍🫲🏽 Wikimedia France yearly strategic meetup is this week, and is expected to strengthen its (linguistic) diversity and metrics axes, for which Lingualibre is one of their champions.
  • 🧓 Eve and myself (likely) will be present at Toulouse's Forom des Langues, in May, where ~60+ languages associations are present.

For specific deadlines and events coming soon, please also check Lingualibre:Events/Program. We always welcome contributors. When necessary, WMFr may refund transportation costs. Worth a try ! Yug (talk) 15:07, 5 February 2023 (UTC)

Edit your nickname

Good evening, I would like to change my nickname because it did not update when I was renamed Manjiro91 then Manjiro5 instead of GamissimoYT on Wikimedia projects. Thanks in advance Regards manȷıro💬 22:53, 23 February 2023 (UTC)

Tool to prepare words for Lingua Libre

Preparing words to be used in Lingua Libre has always been challenging. But I think this is a shared challenge. Crawling text from different sources and creating a clean list of words is very important. I've used Tito's instructions in the past, but using multiple tabs and multiple tools is not the best user experience. So, I thought I'd create something that is functional for me and simple enough to be tweaked. Introducing "Prepare words for Lingua Libre". The tool is currently set for Odia but can be easily tweaked for other languages using non-Latin scripts. I'd request Lingua Libre core team to incorporate the tool into Lingua Libre so that users can use the platform to create a wordlist. Extracting words from any random text is always hard, especially new contributors. --Subhashish (talk) 03:44, 14 March 2023 (UTC)

Hi Psubhashish. This is really nice. Do you think it would be easy to adapt it to create a new generator? Generators can be used by anyone after they import them in their common.js. Pamputt (talk) 06:44, 14 March 2023 (UTC)
Thanks User:Pamputt. That would be fantastic, but I probably don't have the right knowhow for doing that. I did take ChatGPT's help to create a .js version from the HTML code I had shared earlier but would appreciate any help. I think having a tool inside Lingua Libre would be great so really liked the idea of new generators. Common users would like things well packaged rather than jumping from one platform to another. --Subhashish (talk) 13:09, 14 March 2023 (UTC)