Results of Coverage Test of French Lemma and Non-Lemma forms is English Wiktionary
While playing around with generating lists for pronunciation from Wiktionary, I decided to run a few tests on the current coverage of French lemma and non-lemma forms in English Wiktionary. I choose French because it is the largest datasets in LL.
Current Coverage of French in Lingua Libre
- Total French Entries in Lingua Libre by a native speaker: 233 982
- Unique French Entries in Lingua Libre by a native speaker: 154 358
- Percentage of overlap: 34%
- Term with the greatest number of pronunciations: "blanc" with 40
Current Coverage of Category:French lemmas
- Total entries in Category:French lemmas: 84 482
- Pronounced entries: 50 917
- Entries with pronunciation: 33 565
- Coverage Percentage: 60.27%
Current Coverage of Category:French non-lemma forms
- Total entries in Category:French non-lemma forms: 29 1225
- pronounced entries: 26 791
- Entries with pronunciation: 264 434
- Coverage Percentage: : 9.20%
For me, there are several lessons to be drawn.
- First, there has been amazing growth on LL. Covering 60.27% percent is a real achievement.
- The overlap percentage is quite small overall.
- There needs to be a clearer sense of when LL should stop requesting pronunciations for a certain term because 40 pronunciations of "blanc" seems a bit excessive.
- A need exists to continue pro-actively targeting entries in Wiktionary that are not in Lingua Libre. Currently, 297 999 French lemma and non-lemma forms require pronunciations.
- Generating lists from Wiktionary and checking coverage is not as hard as I thought.
- Lingua Libre has almost caught up with Forvo in the number of French pronunciations (233 982 vs 254, 703). Overall, Lingua Libre has shown amazing and healthy progress in a very short period of time. I'm excited about these results. Languageseeker (talk) 03:07, 1 June 2022 (UTC)
- @Languageseeker This investigation is pretty cool. (I'm not sure i understand all your numbers yet, but i will read again when back on my PC). Its quite nice to see we are reaching Forvo level for our lead language. It's possible we have more unique words than forvo since we have user:Olafbot actively guiding and pushing us on that path.
- On Lili we have chosen to be a learning AND linguistic diversity audio database. When you account for gender, regional accents, age, voice type, having 40 french audios for a word is still 400+ voices short.
- Also, all contributors are not able to contribute audio perfect files due to various shortcomings (hardware, no recording room, no noose cancelling system, etc). We lack proper rating and review system. It's on our [slow] roadmap tho. 😉
- PS: Should i answer to you in French i get a feeling you are French or learning it. Yug (talk) 15:07, 1 June 2022 (UTC)
- @YUG Salut, Yug. Oui, je suis en train d'apprendre le français. Comme nous avons discutez pendant notre reunion, c'est difficile de definer les limits d'une language. Comme je le vois, les formes lemma ne suffit pas. Maintenant, je suis en train de crée un Olafbot sur steroid pour francais. Mon plan est de réaliser un program python qui peux analyser les modèle utilizer sur Wiktionary. Languageseeker (talk) 15:48, 7 June 2022 (UTC)
- Hi @Languageseeker . I'm sorry I did not visit the Chat Room in a long time, and missed your report. Very interesting, good job! I remember a request I made to Olaf some time ago: it would be interesting to have a list similar to the one Olafbot is updating, but containing only lemmas of the target language (to quickly have nearly all lemmas of a dictionary illustrated with an audio pron). Also, I suggest you to use the categories of the French version of Wiktionary when you plan to work on French (and some other languages, that are more extensively described there). As you can see here, the category gathering French lemmas is more than 3 times more complete on the fr. version than on the en. version of Wiktionary. As you mentioned, these numbers are exciting, let's keep up the good work! All the best — WikiLucas (🖋️) 15:47, 26 November 2022 (UTC)
How to create user page
Hello, my user name is Ngangaesther from Kenya. I am still stuck on how am supposed to create my user page kindly help regards Esther
Odia language missing from Stats/Languages
Hi there, for some reason, the Odia-language stats are missing from the Stats/Languages page. Also, "The most prolific speakers for the current month " section in the Stats/Speakers page is not loading at all since the time I checked last (about 10 days). I have tried on Chromium and Firefox and the result is the same even after clearing cache. --Subhashish (talk) 19:40, 28 July 2022 (UTC)
- Hello Subhashish, it should be back online. We had a hackathon to put it back. We are calling for devs to push forwards. Yug (talk) 11:07, 10 August 2022 (UTC)
I came across meta:LinguaLibre/SignIt recently (via betawiki) and was wondering if manually-coded languages would be appropriate for this as well? These are languages in sign modality, but strongly tied to a spoken/written language; they usually adopt the grammar of the nonmanual language, choosing instead to simply transpose the vocabulary. This means they are most often used in application-specific and pidgin contexts (Pidgin Sign for English and diver's signs are examples). In particular, I am interested in toki pona luka, a manual form of toki pona (Q338540). Since the vocab is the same as spoken/written toki pona, there are a minimal number of lexemes overall, so having a complete set of signs is easily achievable. Manually-coded languages including toki pona luka are generally not given a separate ISO 639 code since they are in effect equivalent to scripts. Would this cause a problem for the infrastructure as currently designed? Arlo Barnes (talk) 05:56, 17 August 2022 (UTC)
Hello Arlo Barnes,
I understand "manually coded languages" as synonymous to "signed languages", am I correct?
If there is no distinct ISO for the signed language, we could still:
- Create a new wikidata item without ISO, which will be used as identifier by LinguaLibre infrastructure
- Use the spoken/write language ISO, and create lists of words all suffixed by (signed).
Either of those solutions could work.
If you have some knowledge of signed toki pona luka please let me know. We are adding features on Lingualibre and SignIt in order to be able to record video of signed words by late 2022. We are almost there. If you would like to record some basic signed words to share with the world, then let me know. Yug (talk) 20:58, 17 August 2022 (UTC)
- Signed languages and manually-coded languages share similarities (the manual modality) and differences (since sign languages are 'native' to the signed modality, they use it more fully, having complete deixis and time-reference systems, use of handshape classifiers, etc.) -- 'luka' means 'hand'/'five', so that's the part of the name that indicates the manual modality, but otherwise it's just garden-variety toki pona. I am interested in using SignIt to record this vocab, yes. The '(signed)' suffix seems like a good way to do it. Arlo Barnes (talk) 13:16, 19 August 2022 (UTC)
- Arlo Barnes: We increasingly have tools to update and correct sign language recordings, so the suffix
(signed)or the solution we choose appears incorrect, we still can correct it later using that bot.
- I would encourage you to first train yourself and learn that manually-coded language over the coming months. Indeed, we still have a very last bug within our video recording chain, which makes rightful videos appears as audio on Commons. We expect to solve this last issue this fall (September or October ?). So for now, I encourage you to rest well, reload energy, to get ready to record later this year. Maybe identify near you some suitable place with elegant monochrome wall to film over or consider building yourself a low-cost recording studio,. Etc. We can discuss it to keep it low cost and effective if you are interested, as I'm also looking for such walls and/or considering building one for myself.
- See also : Minimal Sign Language Studio guideline. Yug (talk) 22:30, 19 August 2022 (UTC)
- Arlo Barnes: We increasingly have tools to update and correct sign language recordings, so the suffix
Update my username
I have changed my Wikimedia username but the previous name still appears in Lingua Libre. I know it's not included in unified logins. Anyway, please update my username to Aishik Rehman. Hirok Raja (talk) 15:14, 1 September 2022 (UTC)
- Hi Hirok Raja¸would you have an example of what you would like to see to be changed? I think you are talking about the filename but I am not sure, so with one example, it would be clearer. Pamputt (talk)
1. Top menubar of lingualibre.org showing 'Hirok Raja' as my profile name.
2. After uploading when I try to check my uploads in Commons, it takes me to https://commons.m.wikimedia.org/wiki/Special:ListFiles/Hirok_Raja page.
3. 'Hirok Raja' being used as Default recorder in the file names and description
4. Change speaker name to 'Aishik Rehman' every time while recording is quite annoying to me.
5. Even here 'Hirok Raja' is showing as my signature by default ): Hirok Raja (talk) 19:16, 2 September 2022 (UTC)
- I suspect this is due to long term cookies. Would be interesting to push a clean up for your connection cookies for Lingualibre, it will log you out, then come back here. On firefox.
about:preferences#privacy> Go to "Cookies and Site Data"> Click "Manage Data" > Search "Lingualibre" > Remove selected. Yug (talk) 21:10, 2 September 2022 (UTC)
Siège communautaire de Wikimédia France – ouverture du vote / Community representative to Wikimédia France’s board - votes are opened
(English version below. Do not hesitate to correct my English translation.)
En tant que président de la commission électorale pour l'élection du siège communautaire au conseil d'administration de Wikimédia France, je vous annonce que le vote ouvre aujourd'hui (13 septembre) à 0h CEST. Il se terminera le 26 septembre à 23h59 CEST.
Comme il y a trois ans, le scrutin est public sur Meta. Les pages de votes sont disponibles dans la catégorie correspondante ou en lien sur la page principale. C'est un scrutin par approbation, le candidat qui aura le plus grand nombre de voix sera donc déclaré élu. Vous pouvez voter pour autant de candidats que vous le souhaitez.
Si vous avez des questions, vous pouvez les poser sur la page de discussion ou par courriel à email@example.com.
Pour la commission électorale, Mathis B, le 12 septembre 2022 à 22:00 (CEST)
as the chairman of the electoral commission for the election of the community representative to Wikimédia France’s board, I announce that votes open today (13th september) at 0:00 CEST. They will be closed on 26th september at 23:59 CEST.
Like it was the case three years ago, voting is on Meta. Voting pages are available in the corresponding category or as links in the main page. The elected candidate will be the one with the most approbation votes. You can vote for as many candidates as you wish.
If you have any questions, you can ask them on the Talk page on Meta, or by email at firstname.lastname@example.org.
For the electoral commission, Mathis B, 22:00, 12 septembre 2022 (CEST)
Is there a way to exclude username from Wikimedia Commons upload file name?
- See also Help:Renaming.
- @Middle river exports Welcome MRE,
- You could name your speaker with a single character I guess.
- But keeping the name is voluntary. Each speaker has his/her own voice, which we want to document. If, outside of Wikimedia, you want to remove part of the filename, we have a technical tutorial to do so. See Help:Download datasets and Help:Renaming. Ping us back if your dataset is not up to date. Yug (talk) 13:16, 10 October 2022 (UTC)
- I have solved this now by just changing my username to something shorter. This way I can upload English as Usmaan (عثمان) for example where instead of just repeating the username it shows two scripts which is more useful. (Apparently few enough people have Arabic script usernames that short common words are mostly available.) --عثمان (talk) 20:23, 10 October 2022 (UTC)