Babel user information
Users by language
I AM ON A BREAK to recalibrate, focus on other life priorities, regain energy to come back to this beautiful project soon.

I am a Wikimedian, documentary filmmaker and National Geographic Explorer. I am interested in studying access, decolonization of knowledges and the free-culture movement. I have been active in language documentation with a focus on endangered languages and the use of multimedia as a democratic tool. I also have been an organizational leader and have served both in professional and volunteer-advisory roles at the Internet Society, Wikimedia Foundation, Mozilla, Centre for Internet Society, Creative Commons, Digital Language Diversity Project (DLDP), Wikitongues and now defunct ScholarlyHub.

I am very interested in publicly-owned and public-governed multimedia archives, and I put some volunteer time into action. I have contributed over 68,000 pronunciation recordings On Lingua Libre and over 4,000 sentence recordings on Mozilla Common Voice. My primary contribution to Lingua Libre is in the Central (Mugalbandi) and Baleswari dialects of the Odia language.

LinguaLibre/other pronunciation-related publications

Things I have made/broken

Personal lists

Potential bugs or required features

Kind (issue/new feature request) Summary Context/Steps to reproduce Response
Suspected issue Words already uploaded using LL does not get removed while creating a new list
  1. Included a word "ଉଦ୍ଦେଶ୍ୟରେ" in a new batch and selected "Remove words already recorded" while loading words from a local list
  2. Even though the word already exists in two places (first, second -- both uploading using LL) on Commons, it does not appear as a duplicate on LL Record Wizard

Hi @Psubhashish could you please try to reproduce this issue with recordings that were not renamed? Just to be sure: the Record wizard can only remove words that the current speaker already recorded, for the moment it can't remove words recorded by other speakers (there is a ticket on phabricator asking for this feature). — WikiLucas (🖋️) 12:13, 18 August 2021 (UTC)

Feature LL helps remove words recorded already. But there is no way to download that word. This would help a lot in creating a list locally.
Could you develop a little bit your idea please? You would like to export a textual file containing the words that you already recorded? Or you would like to download the sound files you uploaded? — WikiLucas (🖋️) 12:15, 18 August 2021 (UTC)
@WikiLucas00 ha ha you caught off guard! I was trying to make a rough list as I am discovering new things here first before fleshing out suggestions for improvement. By listing, I mean a text file containing the words recorded, not the audio file. I guess one can download audio files from Commons in bulk too. But that's another question and do share if there is a way that you might know. --Subhashish (talk) 09:32, 21 August 2021 (UTC)
@Psubhashish Using Petscan and your Lingua Libre category on Commons, you can export the text list of all your recorded files. Here is the query. You can change the output to plain text, wikicode, json etc if you want to (in the Output tab). I hope this fits to your needs. All the best — WikiLucas (🖋️) 15:17, 21 August 2021 (UTC)
Feature Number counter while reviewing recorded audio While reviewing recorded audio it is not possible to see the change in the counter at the bottom. For instance, I am reviewing the recorded audio number 10 and the total number of recorded sounds is 300. I cannot see the exact number of a particular sound in the counter.
Issue RecordWizard field "Spoken languages" is confusing. Should one add all the languages/dialects they know or the one they are going to speak in the next step in a particular batch? If I am a speaker who is multilingual (which is the case for most people in South Asia), I'd prefer that the form asks me the specific dialect/language I am going to speak in a batch. I might speak six languages but they are not relevant for each word in a particular batch.
Issue "Place of residence" is meaningless without the "place of language learning". One might have learned a language in one place but might be living in another. The latter might or might not have impact on the language that they speak. However, where they learned the language is very important (in most cases).
Feature Need an option to record offline and upload/sync when connected to the internet
  • I am planning for a workshop to record pronunciation of words in an indigenous language in a remote place. This would mean traveling to places with probably no internet connectivity, and then recording there offline, and uploading to LL later when connected to the internet.
  • This might be possible to have a MediaWiki + Wikibase environment locally by forking LL. There are two challenges:
a. I don't know yet how to set up one such environment locally.
b. I don't know how to enable the local wiki to speak to LL when connecting to the internet
Potential feature How to record words in a language with no writing system/script?
  • When a language is only oral and has no formal writing system/script of its own, International Phonetic Alphabet (IPA) is often used by linguists to "write" the pronunciations. Will IPA-based word listing work on LL?
  • Another possibility in such a case is the speaker's familiarity of a neighboring dominant script. This can be problematic in many levels (for starters, colonization by users of dominant scripts) but can be a temporary fix just for the field recording. If such recordings are made and uploaded, how can they be converted into IPA later so that the file names do not show the dominant script?
Feature Parsing words from any public web page Legally and technically, words per se are not copyrighted. Hence, parsing and creating a list of words is a great way to make way for recording words from different topics. Wikipedia categories or Wiktionary entries are not always diverse, considering their diversity scope is limited to the personal interest of active Wikimedians and/or a good amount of content don't make their way to these projects because of citation issues (not everything that is public is citable -- they might have many words in a particular topic though and hence are of interest to LL).
Bug All words under a dialect (e.g. Baleswari-Odia) should be listed under the language (e.g. Odia) in Statistics A language being a superset of a dialect, all words recorded under a dialect should be listed under a language as well. Right now each dialect has its own category in the Statistics page which is great. But these words do not appear in the total number of recordings its respective language name.