LinguaLibre
Difference between revisions of "Chat room"
Welcome to the Chat room! Place used to discuss any and all aspects of Lingua Libre: the project itself, discussions of the operations, policy and proposals, technical issues, etc. Other forums include for code-oriented issues, . Feel free to participate in any language you want to.
WikiLucas00 (talk | contribs) m (Moving topic to French Chat Room) |
(→Mass deletion: r) |
||
Line 186: | Line 186: | ||
Hi, can you delete all recordings from 12:55 on the 5 June 2021 I made, I recorded them in the wrong language. See [https://lingualibre.org/wiki/Special:Contributions/Berrely here]. Thanks! [[User:Berrely|Berrely]] ([[User talk:Berrely|talk]]) 10:42, 2 August 2021 (UTC) | Hi, can you delete all recordings from 12:55 on the 5 June 2021 I made, I recorded them in the wrong language. See [https://lingualibre.org/wiki/Special:Contributions/Berrely here]. Thanks! [[User:Berrely|Berrely]] ([[User talk:Berrely|talk]]) 10:42, 2 August 2021 (UTC) | ||
:Hi {{u|Berrely}}, for now I have listed the recordings [[LinguaLibre:Misleading_items/Berrely_20210802|here]]. Currently, I am developing a bot that will fix massively such issues. I hope it will be finished before the end of August. Could you confirm that the "correct" language is Farsi (instead of English)? [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 11:10, 2 August 2021 (UTC) | :Hi {{u|Berrely}}, for now I have listed the recordings [[LinguaLibre:Misleading_items/Berrely_20210802|here]]. Currently, I am developing a bot that will fix massively such issues. I hope it will be finished before the end of August. Could you confirm that the "correct" language is Farsi (instead of English)? [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 11:10, 2 August 2021 (UTC) | ||
+ | :{{u|Pamputt}} yes the correct language is Farsi, not English. I already deleted the files from Commons, and intend to rerecord them. [[User:Berrely|Berrely]] ([[User talk:Berrely|talk]]) 13:21, 14 August 2021 (UTC) |
Revision as of 13:21, 14 August 2021
Chatroom FAQ
Datasets are availale here. A script is updating the datasets every 2 days, using CommonsDownloadTool. For more, see Help:Download datasets.
Administrators can add new languages on demand, they do so within few days. Please provide your language's ISO 639-3 code and/or its Wikidata ID. For more, see Help:Add a new language.
Contact Poslovitch, the master of Lingua Libre Bot. For more info, check out Help:Bots and LinguaLibre:Bot.
Please see LinguaLibre:Events.
Go to translatewiki.net. For more, see Help:Translate.
After reviewing the section, add {{done}} ~~~~
to the top of the section. After few days to 2 weeks, move the section's code to [[LinguaLibre:Chat_room/Archives/year]]
.
Datasets out of date
Hello. It seems that the datasets page, although it claims to run every 2 days, is completely out of date: all the available zips are from April 2020 or November 2019 (and the full zip from May 2019). Is this a known problem? Is there a plan to address it? Julien Baley (talk) 23:17, 27 August 2020 (UTC)
- Indeed, it seems to have an issue with the dataset updating. I opened a Phabricator ticket about this issue. Pamputt (talk) 18:24, 28 August 2020 (UTC)
Publish on Wikimedia Commons
Hello, I just tested, but my records are not published on Commons. My tests: on Firefox, then on Chrome, with 50, then with 1 expression (s), with license CC3.0-BY-SA and CC1.0. —Eihel (talk) 06:51, 2 May 2021 (UTC)
- phab:T281636 —Eihel (talk) 07:10, 2 May 2021 (UTC)
- Usually I have the same with the first two recordings in a session. Then I can upload them again at the end. Try again with more recordings, and using "retry filed upload" button. Poemat (talk) 08:07, 2 May 2021 (UTC)
I add a user who has the same problem: Le Commissaire. —Eihel-LiLi (talk) 15:33, 6 May 2021 (UTC)
- Bonjour @Seb35 , Faudrait voir avec Le Commissaire si le problème persiste aussi (avant de clore le ticket Phab. Sincères salutations. —Eihel (talk) 10:01, 4 June 2021 (UTC)
- J’ai mis un message à Le Commissaire sur sa page de discussion.
- Le problème que vous avez eu était spécifique à votre compte, c’est peut-être arrivé à d’autres personnes mais ça semble assez rare. Aussi, à partir du moment où un utilisateur a réussi à faire un envoi vers Commons, alors c’est un problème différent du vôtre (celui-ci, qui ressemble mais l’erreur est intermittente). Plus globalement, il faudrait que le message d’erreur soit explicite plutôt que d’aller à chercher dans la console du navigateur, je vais ouvrir un ticket Phabricator en ce sens. Seb35 (talk) 10:28, 4 June 2021 (UTC)
- Bonjour @Seb35 , Faudrait voir avec Le Commissaire si le problème persiste aussi (avant de clore le ticket Phab. Sincères salutations. —Eihel (talk) 10:01, 4 June 2021 (UTC)
Translation admins
Done — WikiLucas (🖋️) 14:05, 19 July 2021 (UTC)
I updated this ticket, explaining our need of translation admins. I'm espacially thinking of Sabelöga and Eihel, who have the skills and the needs to get this rights (e.g. here).
If the community agrees, we can ask the developper team currently working on the project to implement this new status into Lingua Libre, and we will then be able to elect new translation admins on LiLi.
You can vote by using {{Support}} or {{Oppose}}
.
All the best, — WikiLucas (🖋️) 12:21, 4 May 2021 (UTC)
- Hello WikiLucas, Especially since the tvar translation variables have just changed. —Eihel-LiLi (talk) 16:32, 5 May 2021 (UTC)
- UPDATE: Translation admins should now "exist" on Lingua Libre. See [T262855] Implement new user rights. --Poslovitch (talk) 19:35, 3 July 2021 (UTC)
Vote
- Support (proposer) — WikiLucas (🖋️)
- Support We are are early stage for the communnity, having 3 active referents for any given administrative task is required (see also en:Bus factor). It is also necessary to document process as we see them appears, in a concise therefore maintainable way. Yug (talk) 15:09, 4 May 2021 (UTC)
- In this project, the rights associated (example: pagetranslation) with translation administrators are already contained in the administrators. In addition, an administrator can self-grant the right without going through a formal request (on any WM). I therefore think that we are far from the indispensable (wo)man (especially after Strasbourg IMHO). Also, if I want to continue on this project and following the previous section… —Eihel-LiLi (talk) 16:29, 5 May 2021 (UTC)
- @Eihel-LiLi "Active" [and skilled] is an important word. I'm admin but not active on translations pages. We have about 4 admins truly active this past 6 months, AFAIK only WikiLucas was admin while truly active [and skilled] on pagetranslation. Adding 2+ more is required. Seems on the way. Yug (talk) 09:59, 6 May 2021 (UTC)
- And Pamputt too (already TA on WD for example). Cordially. —Eihel-LiLi (talk) 15:14, 6 May 2021 (UTC)
- @Eihel-LiLi "Active" [and skilled] is an important word. I'm admin but not active on translations pages. We have about 4 admins truly active this past 6 months, AFAIK only WikiLucas was admin while truly active [and skilled] on pagetranslation. Adding 2+ more is required. Seems on the way. Yug (talk) 09:59, 6 May 2021 (UTC)
- In this project, the rights associated (example: pagetranslation) with translation administrators are already contained in the administrators. In addition, an administrator can self-grant the right without going through a formal request (on any WM). I therefore think that we are far from the indispensable (wo)man (especially after Strasbourg IMHO). Also, if I want to continue on this project and following the previous section… —Eihel-LiLi (talk) 16:29, 5 May 2021 (UTC)
- Support Agree to ask for this new status. Pamputt (talk) 15:46, 4 May 2021 (UTC)
- Support Agreed. DSwissK (talk) 18:31, 4 May 2021 (UTC)
- Weak support —Eihel-LiLi (talk) 15:49, 6 May 2021 (UTC)
- Support J’ai confiance. Lyokoï (talk) 17:57, 10 May 2021 (UTC)
- Support I'm up for it! --Sabelöga (talk) 18:53, 19 May 2021 (UTC)
Discussion
- I'd rather see Titodutta. —Eihel-LiLi (talk) 01:20, 6 May 2021 (UTC)
- @Eihel-LiLi Titodutta is already an admin on LiLi, which means he has the
pagetranslation
right. Implementing this translation admin status would allow us to grant some users thepagetranslation
right without granting them all admin rights (like the right todelete
pages orblock
users for instance). — WikiLucas (🖋️) 07:31, 6 May 2021 (UTC)- Ah OK. I took the most prolific users, but I remembered that you and Pamputt are TAs… —Eihel-LiLi (talk) 15:04, 6 May 2021 (UTC)
Browsing the sound library
Nicolas NALLET is currently working on the page that will display the recordings of Lingua Libre, and would like to know the list of filters that we would like to use on this page (e.g. by language, by speaker, by date...)
Feel free to suggest other filters or give your opinion on suggested filters 🙂 — WikiLucas (🖋️) 12:58, 20 May 2021 (UTC)
(pinging @Yug, Pamputt, & Titodutta — WikiLucas (🖋️) 15:48, 20 May 2021 (UTC))
- Great news!
- The most obvious ones are, I guess, the following:
- by language
- by speaker
- by speaker's language proficiency (beginner, etc.)
- by genre (male, female, etc.)
- --Poslovitch (talk) 13:38, 20 May 2021 (UTC)
- Hello WikiLucas00 and Poslovitch
- by cat (
deepcat
,incategory
) - by coord (
nearcoord
,boost-nearcoord
) - by link (
linksto
)
- by cat (
- The codes in parentheses are those of CirrusSearch, an extension that can be added to LiLi. Poslovitch's proposals also have filters contained in WikibaseCirrusSearch (
haswbstatement
). Tell me what you think of this. Cordially. —Eihel (talk) 20:36, 20 May 2021 (UTC)- @Eihel could you describe a bit how do you imagine this would work? (since the recordings on Lingua Libre don't have cat or coord at all, and could have link but I couldn't find any examples, I'm a bit confused and would like to know more). Same question for CirrusSearch, we could look into it to see if it can be installed, but what use do you see for it? (the only use I know is for WikibaseCirrusSearch). Cheers, VIGNERON (talk) 14:42, 26 May 2021 (UTC)
- Code on github please. You may check Forvo and Codepen to find elegant html5 audio element and css. Yug (talk) 22:00, 26 May 2021 (UTC)
- Hello @VIGNERON , The WikibaseCirrusSearch extension requires the installation of the CirrusSearch extension. This means that it does not change much. It is true that my proposals are not very Catholic, but this project will evolve over time. To begin with, this page contains a cat (not all LiLi TPs contain a cat, this should be corrected). However, since you want an example, here is one (the TPs where we both participated with insource). Best regards. —Eihel (talk) 09:54, 4 June 2021 (UTC)
- For example, the lists - which are the way to correctly make a significant number of records - were already numerous before Strasbourg. Now only one language letter appears (a). A search on its history for its own lists is possible knowing how they were recorded. But for example, if I want the lists in French in a search, "List:Fra" is not sufficient, because we only get a part. In the future, categories should be created for lists: by user, by language, by set (from the same record session) and by subject (fruit, animals, etc.). Otherwise it will quickly be insurmountable from a moment. Cordially. —Eihel (talk) 14:04, 4 June 2021 (UTC)
- Code on github please. You may check Forvo and Codepen to find elegant html5 audio element and css. Yug (talk) 22:00, 26 May 2021 (UTC)
- @Eihel could you describe a bit how do you imagine this would work? (since the recordings on Lingua Libre don't have cat or coord at all, and could have link but I couldn't find any examples, I'm a bit confused and would like to know more). Same question for CirrusSearch, we could look into it to see if it can be installed, but what use do you see for it? (the only use I know is for WikibaseCirrusSearch). Cheers, VIGNERON (talk) 14:42, 26 May 2021 (UTC)
Plans for the next armageddon?
Are there any contingency plans implemented after the Big Fire? A regular backup for example? Poemat (talk) 22:49, 24 May 2021 (UTC)
- @Poemat good question, thanks for asking. There is obviously some plans. I'll let @Seb35, Nicolas NALLET, & Michael Barbereau WMFr complete and/or correct me but right now, there is daily backups on a server in an other datacenter. Cheers, VIGNERON (talk) 12:47, 26 May 2021 (UTC)
Request for Mon language Code= mnw
Done
Do not have Mon language for this so I added Thai language I would like to have this problem resolved thanks. message posted by User:咽頭べさ (talk)
- Hello again @咽頭べさ thank you for pointing out that Mon language was missing on Lingua Libre! I added it, you should from now on be able to record words in this language 🙂 Please read the message I posted on your talk page before recording new words.
- All the best, — WikiLucas (🖋️) 16:40, 27 May 2021 (UTC)
Celebrating the coming 500k milestone
Hello @DenisdeShawi, DSwissK, Eihel-LiLi, Julien Baley, KlaudiuMihaila, Lepticed7, Lyokoï, Olaf, Pamputt, Poemat, Poslovitch, Sabelöga, Theklan, Titodutta, Yug, & सुबोध कुलकर्णी
As you may have seen, we recorded 30,000 pronunciations during the current month (2nd most active month ever), the very first full calendar month since the rebirth of the website, after the datacenter fire that stalled the project for 6 weeks. If we keep a similar pace, we should reach in June the important milestone of 500,000 recordings made on Lingua Libre. That is incredible.
I wanted to ask you all, how do you want to celebrate this milestone? Feel free to suggest anything below, and let's try to celebrate it properly 🙂
All the best
— WikiLucas (🖋️) 14:33, 27 May 2021 (UTC)
- Hi there, I remember registering numbers up to 1399 in French (c:File:LL-Q150 (fra)-Poslovitch-1399.wav). I abide to get that number up to 4242 once we reach that milestone ! --Poslovitch (talk) 18:18, 27 May 2021 (UTC)
- Some kind of reward would be nice, like a star for the home-pedia user page. Or a physical sticker sent by post, similar to what Wikimedia does from time to time. Or an online event of sorts. KlaudiuMihaila (talk) 16:45, 29 May 2021 (UTC)
- We gather and make an apéro. Lepticed7 (talk) 16:54, 29 May 2021 (UTC)
- Maybe an online event is the simple to do actualy. What did you think about a Live on Twitch with some guests about Lingua Libre, its history, how people made some very big recording session, how its help describe language, etc… ? Lyokoï (talk) 10:22, 1 June 2021 (UTC)
- It's possible to have some budget for celebrating :)Xenophôn(talk) 08:54, 8 June 2021 (UTC)
- We gather and make an apéro. Lepticed7 (talk) 16:54, 29 May 2021 (UTC)
- Some kind of reward would be nice, like a star for the home-pedia user page. Or a physical sticker sent by post, similar to what Wikimedia does from time to time. Or an online event of sorts. KlaudiuMihaila (talk) 16:45, 29 May 2021 (UTC)
Failed to upload on Wiki Commons
Hi, I am an editor from Central Bikol Wiktionary. I have tried to record words and it went through. But it has failed to be uploaded on commons. I think it's the second time to happen. This was only after the Lingua Libre has came back. My internet connection is stable so I guess there might be some internal problems. I hope not. Kunokuno (talk) 14:58, 28 May 2021 (UTC)
- Hello @Kunokuno I'm truly sorry that this problem occurred, thanks for warning us about it.
- Could you please tell us your current setup (device, browser, microphone)? How many words did you record? Could you try to reproduce the bug with 10 words, and then look at your browser's console (instructions here) to tell us the error message if there is one?
- Thank you in advance.
- All the best. — WikiLucas (🖋️) 16:21, 28 May 2021 (UTC)
- Hello everyone, sorry for the late response. My records are still not getting through to commons. The record was successful, but it cannot be upload on commons. My device is a intel core i5 laptop, browser is google chrome, and I'm using a headset with a built in microphone. I have also tried recording on my phone but it has the same error. I have tried doing the screenshot for the error message, if there's any. Please check here. Sorry, I am not quite knowledgeable on the codes and programming languages. Kunokuno (talk) 13:53, 18 June 2021 (UTC)
500000!
Lili reached 500 000 recordings. Congratulations to everybody! Olaf (talk) 12:56, 15 June 2021 (UTC)
Lingua Libre video tutorial
Hi everyone! I made a short video tutorial for Lingua Libre, in French. If you like it, I could create one in English and we could include it in the {{Welcome}}
template, to help newcomers.
Here is the video, please tell me your thoughts about it! also available here on YouTube
All the best — WikiLucas (🖋️) 10:04, 23 June 2021 (UTC)
- I really like it. It is not too long, very clear, etc. So I think it would be a good idea to create one in English. Few remarks:
- if you create one video in English, is it possible just to make the movie with the interface in English and then to create the text as subtitle (Wikimedia Commons supports subtitles), so that it would be easy to translate the subtitles in several languages (remain the problem of the interface itself in English).
- on Wikimedia Commons, I think you should write what music is used in the video and where does it come from in order to be sure it is a free-licence music
- Very nice job. Pamputt (talk) 18:56, 23 June 2021 (UTC)
- Thank you @Pamputt ) ! I think I will indeed make a video with the interface in English, with no built-in subtitles as you suggested, and we will then be able to add TimedText subtitles on Commons. I think I'll also make a version with built-in substitles (so basically the same video as here but with everything in English), in order to have a cleaner English version to be post and share on YouTube.
- EDIT: I added English subtitles on the French video, to test the functionality, it seems to work well!
- Thank you for your remark about the music, I added the information on the file's description.
- See you! 🙂 — WikiLucas (🖋️) 10:08, 24 June 2021 (UTC)
Auto-inserting recorded words to Wiktionary
Done (discussion in progress on the LL:Technical board) — WikiLucas (🖋️) 15:17, 26 July 2021 (UTC)
Hi, I am back after a long hiatus! :) I wanted to ask about auto-inserting recorded words to Wiktionary. Is it possible to automatically insert recorded files into the respective Wiktionary entries if I had imported those words from a specific Wiktionary category? For instance, I did a test batch today from "ଶ୍ରେଣୀ:ବାଲେଶ୍ୱରୀ_ଶବ୍ଦ" from the Odia Wiktionary. The uploaded words do appear on Commons but I need to manually add each recording. Is there a way to automate that?
My second question is something that I had asked long back - is there a way to change (or choose from two options) the filename. For instance, I would like to use the Commmons convention of "TWO_LETTER_ISO_CODE-WORDNAME.EXTENSION" format (e.g. "or-କଳା.wav"). If there is already a file that exists, then the new file can be "or-କଳା-01.wav". In that way, viewing the words in the Commons category would be easier meaning "or-କଳା.wav" and "or-କଳା-01.wav" will appear close to each other. One can even check which of the recordings is better to use on Wikimedia projects. In the backend you can of course connect the files to your Wikibase by providing unique IDs to each recording.
Hugs of solidarity for your grave loss because of the fire! With everything going on with COVID last and this year, this was horrible! <3 --Subhashish (talk) 14:45, 24 June 2021 (UTC)
- Hi @Psubhashish I'm Lingua Libre Bot's operator. It cannot operate on Wiktionaries on which it has not received the bot flag. Feel free to file a request on LinguaLibre:Bot. I'm falling behind with the various currently pending requests since I've been the handyman of Lingua Libre on and off, but at some point I'll be able to tackle these ;) --Poslovitch (talk) 15:23, 24 June 2021 (UTC)
- Hello all! @Poslovitch done, please let me know if there is anything that I could do.
File name
- Hi Psubhashish regarding the second question about the filename, it has been decided to have only one record by word and by locutor. This means that if you record again the same word, the previous record will be replaced by the new one. Thus, it is possible to correct a bad/wrong pronunciation. Why would you like to record two times the same word? Pamputt (talk) 17:50, 24 June 2021 (UTC)
- @Psubhashish & Pamputt I think that Psubhashish is refering to the historical naming convention (NB there is no actual naming convention on Commons, it is merely some advice for naming files) of pronunciation files on Commons (see here), that was unchanged since 2005 and clearly insufficient. This page was suggesting just to put a 2-letter language code and the pronunced word in the filename, which was problematic as soon as another speaker pronounced the same word (that's why they suggested to add a number if it was the case). I changed this page recently, to advice users to display in the filename at least the language spoken (iso 639-3 if possible), the word written in the language's writing system, an identifier for the user, and a place related to the speaker (the place where they learned the language and/or where they live). Lingua Libre's automatic naming already does that, except for the place of learning/residence (which is for the moment only available on the speaker's element, on Lingua Libre). @Psubhashish I don't understand why you would want to change your filenames for some more reductive ones. The more precise the filename is, the better it is to know information about the speaker! And it is still very easy to search for a precise word, you just have to type the word+.wav in Commons, or the word itself directly in Lingua Libre's searchbar.
- All the best — WikiLucas (🖋️) 18:48, 24 June 2021 (UTC)
- @Pamputt and @WikiLucas00 I didn't actually mean to create ambiguous filenames based on the older convention. I was worried for the multiple kinds of naming inside the category c:commons:Odia pronunciation. The way the files are organized there are or-NAME.extension (e.g. File:Or-ଅନ୍ୟ.wav). What I am proposing is slightly different than how you want to capture the information in the file. I am all for metadata being captured inside the description. In fact, I'd support to add a field to describe the ISO 639-2/639-3 three-letter-codes (e.g. Ori-nor-ଏଇଚି.wav). There is currently no link to the Lingua Libre QID and I'd propose to add that too.
- What I was proposing was not to reduce information collected but simplifying the filename. We're struggling at the moment to use a bot, find and search and insert a file from Commons into a Wiktionary entry. I'd love to hear from you all what the issue would be if the file descriptions template (
{{Lingua Libre record}}
contains information such as language name, language ISO (including variation), language Glottocode (which linguists prefer because ISO is faulty. ref. requirements by language archives such as Living Tongues, ELR and Language Archive Cologne), and information about speaker's age range, gender and region (as dialects also vary from region to region, optional field as this is personal data).
- What I was proposing was not to reduce information collected but simplifying the filename. We're struggling at the moment to use a bot, find and search and insert a file from Commons into a Wiktionary entry. I'd love to hear from you all what the issue would be if the file descriptions template (
- The filename, however, can be simpler as using a bot to search for duplicates is hard now for the community because the QID and username are included in the filename. What if all that information, as I explained above, are included in the information below in the template and the file name can be the ISO 639-1 (for standard spoken forms or macrolanguages) or ISO 639-2 or 639-3 (for dialects/variations)? As I had explained in my previous comment, nor-NAME.wav and nor-NAME-01.wav will appear close to eachother because of alphabetical sorting. An average user without the knowledge of bots can even manually test the quality of recordings if they are using files on different Wikimedia projects. Can at least this be piloted for one language? --Subhashish (talk) 02:12, 25 June 2021 (UTC)
I have created a sub-section just to make clearer the discussion. I am completely lost. Currently, the files created on Lingua LIbre are all named such as File:LL-Q33810_(ori)-Psubhashish-ଫସ୍କା.wav, which mean File:LL-QID (LANGUAGE_CODE)-(LOCUTOR NAME)-WORD.wav, with QID the identifier of the language on Wikidataidentifier of the recording on Lingua Libre, LANGUAGE CODE can be either two or three letters (ISO 639-3) if there is no 2 letters code for the language, LOCUTOR NAME, the name of the person who record on Lingua Libre and WORD the word that has been recorded. So could you give us an example pointing to a file that has not a suitable name from your point of view? I think it will help me to get your point. Pamputt (talk) 06:13, 25 June 2021 (UTC)
- @Pamputt Watch out. The QID is not the recording's, it's the language's Wikidata QID ;) --Poslovitch (talk) 08:13, 25 June 2021 (UTC)
- Indeed :) Thank you, I correct in my previous message. Pamputt (talk) 08:34, 25 June 2021 (UTC)
- Hello all, what meant to say is I understand that you have a convention for LL. But I personally do not want my username of the QID of the language or too many signs or even blank spaces. All of that are a problem when it comes to a few thousand recordings by multiple authors where the same word recorded by different people do not even appear close to each other in a sorted list. As I had written earlier, metadata can be better captured in a more formatted way inside Commons and you're capturing it even better inside the Wikibase of LL. The question is whether the file name should have all the metdata or can it have even the most essential metadata. The username is irrelevant in a filename. If I click a picture of the Eiffel Tower or the Taj Mahal, my username appearing in the filename can only indicate a copyright owner pride. :D QID is a Wikimedian's paradise. It makes no sense to a common user. Entries on Commons are not just for use by Wikimedians but for the larger public. An ISO code (or a Glottolog ID) does this job (though one can argue that not all the people understand ISO codes). The three letter ISO code would address the language-dialect-variation in most cases. The word itself in the preferred script is self explanatory. All the metadata can be included inside the page using the LL template. I do not understand the insistence on adding additional info (QID and username). Also, just curious what really is the issue with ISO-FILENAME.EXTN (ori-କ.wav) for the first occurrence and ISO-FILENAME.EXTN (ori-କ-1.wav) for the second occurrence and so on? --Subhashish (talk) 09:19, 25 June 2021 (UTC)
- Not all the languages we allow to record on LinguaLibre have an ISO code. That's why the QID is useful. --Poslovitch (talk) 09:33, 25 June 2021 (UTC)
- @Psubhashish , Poslovitch replied about the QID. About the username, the goal is to ensure that there is only one record per speaker. With such name, if you record twice the same words, only the lastest record will remain. It is very useful if you want to correct a wrong/bad pronunciation because the preivous recording is automatically replaced by the new one. Thus, no need for the user to ask for a deletion of the previous file on Wikimedia Commons.
- That's said, I do not see the benefits to shorten the filename name. If you are looking for a given word, using the search engine on Wikimedia Commons should find the recordings. If you are interested by mass import, so Lingua Libre Bot is probably the tool you are looking for. If you want to do it by yourself, there are already some Python codes (other that LLBot) that do this job. See for example this code that is used on the French Wiktionary. Pamputt (talk) 11:09, 25 June 2021 (UTC)
- Not all the languages we allow to record on LinguaLibre have an ISO code. That's why the QID is useful. --Poslovitch (talk) 09:33, 25 June 2021 (UTC)
- Hello all, what meant to say is I understand that you have a convention for LL. But I personally do not want my username of the QID of the language or too many signs or even blank spaces. All of that are a problem when it comes to a few thousand recordings by multiple authors where the same word recorded by different people do not even appear close to each other in a sorted list. As I had written earlier, metadata can be better captured in a more formatted way inside Commons and you're capturing it even better inside the Wikibase of LL. The question is whether the file name should have all the metdata or can it have even the most essential metadata. The username is irrelevant in a filename. If I click a picture of the Eiffel Tower or the Taj Mahal, my username appearing in the filename can only indicate a copyright owner pride. :D QID is a Wikimedian's paradise. It makes no sense to a common user. Entries on Commons are not just for use by Wikimedians but for the larger public. An ISO code (or a Glottolog ID) does this job (though one can argue that not all the people understand ISO codes). The three letter ISO code would address the language-dialect-variation in most cases. The word itself in the preferred script is self explanatory. All the metadata can be included inside the page using the LL template. I do not understand the insistence on adding additional info (QID and username). Also, just curious what really is the issue with ISO-FILENAME.EXTN (ori-କ.wav) for the first occurrence and ISO-FILENAME.EXTN (ori-କ-1.wav) for the second occurrence and so on? --Subhashish (talk) 09:19, 25 June 2021 (UTC)
- Indeed :) Thank you, I correct in my previous message. Pamputt (talk) 08:34, 25 June 2021 (UTC)
@Pamputt thanks for sharing this. I share the same concern with you when it comes to ISO and had shared about Glottolog ID. Glottolog ID is something field researchers as Gregory Anderson (Living Tongues) or organizations such as LAC and ELP use. But apart from Glottolog being used by field researchers, the classification is indeed really detailed. Does using QID solve any particular issue? I am yet to have explore the LL Bot but have made a request. BTW can the LL Bot be used for inserting files that are already there on a Commons folder? You still didn't share why the ISO-FILENAME.EXTN and ISO-FILENAME-01.EXTN option is a bad one and why "LL-QID (ISO)-USERNAME-NAME.wav" is preferred over the former for the languages with ISO standards. Also, have you considered the need for the same word being recorded multiple times by someone who speaks in different accents or there is a need for different intonations/moods? A word might be written the same way in a particular writing system but there are often aforementioned needs. If a new recording overwrites an existing one, many might accidentally overwrite audio files that are needed. --Subhashish (talk) 14:34, 25 June 2021 (UTC)
- @Psubhashish I let Poslovitch answer concerning LLBot.
- Does using QID solve any particular issue?
- Using QID allow us to be able to record any language/dialect even those that would not be yet available in Glottolog. In addition, we are sure that the QID is stable and will not change in the future.
- You still didn't share why the ISO-FILENAME.EXTN and ISO-FILENAME-01.EXTN option is a bad one and why "LL-QID (ISO)-USERNAME-NAME.wav" is preferred over the former for the languages with ISO standards.
- This is what I tried to explain in the previous message. This is used to manage double recording and to correct bad pronuncitation files easily. If we use "ISO-FILENAME.EXTN", it is not linked to a locutor and so it means several files can be created by the same locutor, and the "bad" files will be kept. A name such as "LL-QID (ISO)-USERNAME-NAME.wav" solves this problem (maybe "LL" is not needed but it is only two letters). In addition, how you would record word from dialects or languages that do not have ISO codes if we use something like "ISO-FILENAME.EXTN"?
- Also, have you considered the need for the same word being recorded multiple times by someone who speaks in different accents or there is a need for different intonations/moods? A word might be written the same way in a particular writing system but there are often aforementioned needs. If a new recording overwrites an existing one, many might accidentally overwrite audio files that are needed.
- This are really rare cases. If a user wants to record himself/herself with several accents, probably most of the recordings will not be "natural", which mean the audio files will be poor quality for reusing. That's said, there is a way to manage words that spell the same but have differents pronunciations. In such cases, it is possible to add in bracket a precision about the word we want to record. For example in French, we have File:LL-Q150 (fra)-0x010C-fils (pluriel de fil).wav (fils (plural of fil)) and File:LL-Q150 (fra)-0x010C-fils (enfant).wav (fils (child)). So that, using the bracket, we are sure about the user intent Pamputt (talk) 16:53, 25 June 2021 (UTC)
LinguaLibreBot pour le Wiktionnaire en Chaoui
- To be later moved to LinguaLibre:Bot.
Bonjour, je veux relancer la discussion pour permettre le bot LinguaLibre d'ajouter des audio sur shy.wiktionary.org . je suis le seul admin de ce projet. je peux vous aider pour l'algorithme des pages, si vous n'êtes pas contre. merci d'avance.--Reda Kerbouche (talk) 12:48, 2 July 2021 (UTC)
- Salut Reda Kerbouche : on est train de réfléchir comment prendre en charge au mieux les différents Wiktionnaires. Si vais essayer de me motiver pour proposer quelque chose durant l'été. Pamputt (talk) 12:54, 2 July 2021 (UTC)
- Bonjour Reda ! Je suis le dresseur du bot. Il faut remplir le formulaire dans LL:Bot pour que l'on ai les renseignements de base. Et d'ici à la fin de l'été, nous devrions avoir quelque chose de fonctionnel. --Poslovitch (talk) 14:02, 2 July 2021 (UTC)
Salut Poslovitch et Yug normalement Pamputt vous a transmis toutes les informations nécessaire. Pamputt a même fait une demande pour avoir le statut de Bot dans le wiktionnaire en Chaoui. et je viens de créer la page d'utilisateur du bot, que vous pouvez modifier comme vous voulez. S'il y a de nouveau tenez-moi au courant, surtout pour le test du bot amicalement--Reda Kerbouche (talk) 14:45, 6 August 2021 (UTC)
Extracting Words
Hello, in the last days I have watched the Videos about the Annual Plan 2021/2022 and after that I have thinked about if it is helpful to use the words that where mentioned in the videos extract them and add them as files to Wikimedia Commons. From my point of view it is useful. It were a chance to get a lot of words in a short time. What do you think about that. Is it something that is helpful or not.--Hogü-456 (talk) 19:04, 27 July 2021 (UTC)
- Hogü-456 welcome, Can you share with us the webpage or material you think about via a link ? So we can see better what you are talking about.
- If you want to extract unique words from text, we already have some scripts to extract lists from raw text or to extract unique words from wikipages. Yug (talk) 14:18, 28 July 2021 (UTC)
Mass deletion
Hi, can you delete all recordings from 12:55 on the 5 June 2021 I made, I recorded them in the wrong language. See here. Thanks! Berrely (talk) 10:42, 2 August 2021 (UTC)
- Hi Berrely, for now I have listed the recordings here. Currently, I am developing a bot that will fix massively such issues. I hope it will be finished before the end of August. Could you confirm that the "correct" language is Farsi (instead of English)? Pamputt (talk) 11:10, 2 August 2021 (UTC)
- Pamputt yes the correct language is Farsi, not English. I already deleted the files from Commons, and intend to rerecord them. Berrely (talk) 13:21, 14 August 2021 (UTC)