User
Difference between revisions of "Psubhashish"
Psubhashish (talk | contribs) m (→Personal lists) |
Psubhashish (talk | contribs) m (I AM ON A BREAK) |
||
(9 intermediate revisions by 2 users not shown) | |||
Line 7: | Line 7: | ||
{{Speaker of the month|01/2022|7259}} | {{Speaker of the month|01/2022|7259}} | ||
{{Speaker of the month|08/2022|5003}} | {{Speaker of the month|08/2022|5003}} | ||
+ | {{Speaker of the month|03/2023|3346}} | ||
+ | {{Speaker of the month|06/2023|3974}} | ||
{{Userboxbottom}} | {{Userboxbottom}} | ||
{{#babel:records-ori}} | {{#babel:records-ori}} | ||
+ | |||
+ | :: ''I AM ON A BREAK to recalibrate, focus on other life priorities, regain energy to come back to this beautiful project soon.'' | ||
I am a Wikimedian, documentary filmmaker and [https://www.nationalgeographic.org/find-explorers/subhashish-panigrahi National Geographic Explorer]. I am interested in studying access, decolonization of knowledges and the free-culture movement. I have been active in language documentation with a focus on endangered languages and the use of multimedia as a democratic tool. I also have been an organizational leader and have served both in professional and volunteer-advisory roles at the Internet Society, Wikimedia Foundation, Mozilla, Centre for Internet Society, Creative Commons, Digital Language Diversity Project (DLDP), Wikitongues and now defunct ScholarlyHub. | I am a Wikimedian, documentary filmmaker and [https://www.nationalgeographic.org/find-explorers/subhashish-panigrahi National Geographic Explorer]. I am interested in studying access, decolonization of knowledges and the free-culture movement. I have been active in language documentation with a focus on endangered languages and the use of multimedia as a democratic tool. I also have been an organizational leader and have served both in professional and volunteer-advisory roles at the Internet Society, Wikimedia Foundation, Mozilla, Centre for Internet Society, Creative Commons, Digital Language Diversity Project (DLDP), Wikitongues and now defunct ScholarlyHub. | ||
− | I am very interested in publicly-owned and public-governed multimedia archives, and I put some volunteer time into action. I have contributed over [https://lingualibre.org/wiki/LinguaLibre:Stats/Speakers | + | I am very interested in publicly-owned and public-governed multimedia archives, and I put some volunteer time into action. I have contributed over [https://lingualibre.org/wiki/LinguaLibre:Stats/Speakers 68,000 pronunciation recordings On Lingua Libre] and over 4,000 sentence recordings on Mozilla Common Voice. My primary contribution to Lingua Libre is in the [[:w:Odia_language#Standardization_and_dialects|Central]] (''Mugalbandi'') and [[:w:Baleswari Odia|Baleswari]] dialects of the Odia language. |
== LinguaLibre/other pronunciation-related publications == | == LinguaLibre/other pronunciation-related publications == | ||
Line 23: | Line 27: | ||
== Personal lists == | == Personal lists == | ||
+ | * [https://w.wiki/77Hs All lexeme forms in Odia missing a pronunciation] (inspired by Adithya K's [https://w.wiki/77Hu query]) | ||
* [[List:Ory/All standard Odia|All standard Odia]] ([https://lingualibre.org/index.php?search=&search=List%3AOry all lists]) | * [[List:Ory/All standard Odia|All standard Odia]] ([https://lingualibre.org/index.php?search=&search=List%3AOry all lists]) | ||
* [[List:Ory/Baleswaria]] ([[:w:Baleswari Odia|Baleswaria]] dialect of Odia; ongoing, total #words: 1046 by June 1, 2020; [[List:Ory/Baleswaria/recording_complete|words with recording completed]]) | * [[List:Ory/Baleswaria]] ([[:w:Baleswari Odia|Baleswaria]] dialect of Odia; ongoing, total #words: 1046 by June 1, 2020; [[List:Ory/Baleswaria/recording_complete|words with recording completed]]) | ||
* TBD: [[:or:wikt:ଶ୍ରେଣୀ:ବାଲେଶ୍ୱରୀ ଶବ୍ଦ|Baleswari words from Ordia Purnachandra Bhashakosha]] | * TBD: [[:or:wikt:ଶ୍ରେଣୀ:ବାଲେଶ୍ୱରୀ ଶବ୍ଦ|Baleswari words from Ordia Purnachandra Bhashakosha]] | ||
+ | * [[List:Ori/Places of Odisha|List of Places in Odisha]] (villages, Towns, Administrative blocks, etc.)। Words collected from: | ||
+ | ** [https://kalahandi.nic.in/od/%e0%ac%97%e0%ad%8d%e0%ac%b0%e0%ac%be%e0%ac%ae-%e0%ac%93-%e0%ac%aa%e0%ac%9e%e0%ad%8d%e0%ac%9a%e0%ac%be%e0%ad%9f%e0%ac%a4/ Kalahandi district official site] | ||
+ | ** [https://malkangiri.nic.in/od/ Malkangiri] (village names missing) | ||
+ | ** [https://koraput.nic.in/od/ Koraput] (village names exist) | ||
+ | ** [https://bhadrak.nic.in/od/ Bhadrak] (village names exist) | ||
+ | ** [https://rayagada.nic.in/od/ Rayagada] (village names missing) | ||
− | == | + | == Potential bugs or required features == |
{| class="wikitable" | {| class="wikitable" | ||
|- | |- | ||
Line 82: | Line 93: | ||
|| Parsing words from any public web page | || Parsing words from any public web page | ||
|| Legally and technically, words per se are not copyrighted. Hence, parsing and creating a list of words is a great way to make way for recording words from different topics. Wikipedia categories or Wiktionary entries are not always diverse, considering their diversity scope is limited to the personal interest of active Wikimedians and/or a good amount of content don't make their way to these projects because of citation issues (not everything that is public is citable -- they might have many words in a particular topic though and hence are of interest to LL). | || Legally and technically, words per se are not copyrighted. Hence, parsing and creating a list of words is a great way to make way for recording words from different topics. Wikipedia categories or Wiktionary entries are not always diverse, considering their diversity scope is limited to the personal interest of active Wikimedians and/or a good amount of content don't make their way to these projects because of citation issues (not everything that is public is citable -- they might have many words in a particular topic though and hence are of interest to LL). | ||
+ | || | ||
+ | |- | ||
+ | || Bug | ||
+ | || All words under a dialect (e.g. Baleswari-Odia) should be listed under the language (e.g. Odia) in Statistics | ||
+ | || A language being a superset of a dialect, all words recorded under a dialect should be listed under a language as well. Right now each dialect has its own category in the Statistics page which is great. But these words do not appear in the total number of recordings its respective language name. | ||
|| | || | ||
|} | |} |
Latest revision as of 18:29, 19 November 2023
Rewards |
---|
|
Babel user information | ||
---|---|---|
| ||
Users by language |
- I AM ON A BREAK to recalibrate, focus on other life priorities, regain energy to come back to this beautiful project soon.
I am a Wikimedian, documentary filmmaker and National Geographic Explorer. I am interested in studying access, decolonization of knowledges and the free-culture movement. I have been active in language documentation with a focus on endangered languages and the use of multimedia as a democratic tool. I also have been an organizational leader and have served both in professional and volunteer-advisory roles at the Internet Society, Wikimedia Foundation, Mozilla, Centre for Internet Society, Creative Commons, Digital Language Diversity Project (DLDP), Wikitongues and now defunct ScholarlyHub.
I am very interested in publicly-owned and public-governed multimedia archives, and I put some volunteer time into action. I have contributed over 68,000 pronunciation recordings On Lingua Libre and over 4,000 sentence recordings on Mozilla Common Voice. My primary contribution to Lingua Libre is in the Central (Mugalbandi) and Baleswari dialects of the Odia language.
- Subhashish Panigrahi (2022), Building a Public Domain Voice Database for Odia, Companion Proceedings of the Web Conference 2022, Virtual Event, Lyon, France, pp. 1331–1338, DOI: 10.1145/3487553.3524931, ISBN: 978-1-4503-9130-6
- Subhashish Panigrahi (2022), Building a 50,000 pronunciation data repository in the Odia language, Diff
Things I have made/broken
- Prepare words for Lingua Libre: a tool to copy text from any source and clean up to create a list of words ready to be used in RecordWizard of Lingua Libre.
- Kathabhidhana: an open-source toolkit to record a large number of words in any language (inspired from another open project by T. Shrinivasan) (see tweet thread, coverage on Rising Voices, blog, selected talk at Wikimania 2017 and coverage on French Wikipedia newsletter RAW)
Personal lists
- All lexeme forms in Odia missing a pronunciation (inspired by Adithya K's query)
- All standard Odia (all lists)
- List:Ory/Baleswaria (Baleswaria dialect of Odia; ongoing, total #words: 1046 by June 1, 2020; words with recording completed)
- TBD: Baleswari words from Ordia Purnachandra Bhashakosha
- List of Places in Odisha (villages, Towns, Administrative blocks, etc.)। Words collected from:
- Kalahandi district official site
- Malkangiri (village names missing)
- Koraput (village names exist)
- Bhadrak (village names exist)
- Rayagada (village names missing)
Potential bugs or required features
Kind (issue/new feature request) | Summary | Context/Steps to reproduce | Response |
---|---|---|---|
Suspected issue | Words already uploaded using LL does not get removed while creating a new list |
|
Hi @Psubhashish could you please try to reproduce this issue with recordings that were not renamed? Just to be sure: the Record wizard can only remove words that the current speaker already recorded, for the moment it can't remove words recorded by other speakers (there is a ticket on phabricator asking for this feature). — WikiLucas (🖋️) 12:13, 18 August 2021 (UTC) |
Feature | LL helps remove words recorded already. But there is no way to download that word. This would help a lot in creating a list locally. |
| |
Feature | Number counter while reviewing recorded audio | While reviewing recorded audio it is not possible to see the change in the counter at the bottom. For instance, I am reviewing the recorded audio number 10 and the total number of recorded sounds is 300. I cannot see the exact number of a particular sound in the counter. | |
Issue | RecordWizard field "Spoken languages" is confusing. | Should one add all the languages/dialects they know or the one they are going to speak in the next step in a particular batch? If I am a speaker who is multilingual (which is the case for most people in South Asia), I'd prefer that the form asks me the specific dialect/language I am going to speak in a batch. I might speak six languages but they are not relevant for each word in a particular batch. | |
Issue | "Place of residence" is meaningless without the "place of language learning". | One might have learned a language in one place but might be living in another. The latter might or might not have impact on the language that they speak. However, where they learned the language is very important (in most cases). | |
Feature | Need an option to record offline and upload/sync when connected to the internet |
|
|
Potential feature | How to record words in a language with no writing system/script? |
|
|
Feature | Parsing words from any public web page | Legally and technically, words per se are not copyrighted. Hence, parsing and creating a list of words is a great way to make way for recording words from different topics. Wikipedia categories or Wiktionary entries are not always diverse, considering their diversity scope is limited to the personal interest of active Wikimedians and/or a good amount of content don't make their way to these projects because of citation issues (not everything that is public is citable -- they might have many words in a particular topic though and hence are of interest to LL). | |
Bug | All words under a dialect (e.g. Baleswari-Odia) should be listed under the language (e.g. Odia) in Statistics | A language being a superset of a dialect, all words recorded under a dialect should be listed under a language as well. Right now each dialect has its own category in the Statistics page which is great. But these words do not appear in the total number of recordings its respective language name. |