User

Titodutta/Bengali Lexeme workflow

< User:Titodutta
Revision as of 22:49, 8 November 2020 by Titodutta (talk | contribs) (Update)

This page is a simple documentation of Bengali Lexeme workflow that is used on LinguaLibre. This document will explain:

  • the teamwork and the process
  • pros and cons
  • Upcoming plans

On LinguaLibre as of 1 November we have uploaded more than 35,000 words. A large number of these uploads (around 16,000 of these uploads) are to support Wikidata Lexicographical data in Bengali language. One word often have several form such as:

  • Go → went, gone, going etc. (verb)
  • Good → Better, Best etc. (adjective)

Every language has different forms, both in type in number. An English verb mostly has 5 forms. A Bengali verb may have around 98 forms.

Now, a few Bengali community members are working on Wikidata to improve lexicographical data. So, if you see bunch of words of same root are being uploaded, it is actually to sync with the Bengali project on Wikidata.

Team work: Procedure

  • 2–3 editors are working on creating Bengali Lexemes on Wikidata. User:Bodhisattwa, Bengali Wikisource admin, is pretty active in it.
  • User:Titodutta, as of 1 November 2020, mostly uploads the word pronunciation files (we have a query that makes list of all Lexeme words without an audio, we use this query to track the list of words). I generally download in CSV format, upload on Google Sheet, and make a local list on Lingua Libre. A LL local list typically looks like this. I initially created 6–7 such lists with 1,000 words each. However now I avoid creating a new page, and use the same page by overwriting words)
  • Once the words are uploaded, and we have around 1,000 new words, User:Mahir256, a Wikidata admin, uses a script, and with help of quick statements tool adds the words on Wikidata. Note: we do not use LinguaLibre bot on Wikdiata Bengali lexeme, as of now.
  • If there is confusion with grammar, spelling or other related issues, User:Hrishikes is often approached for help/suggestion.

There is a chat group on Facebook on discussion, Telegram platform, or sometimes phone calls are also done to co-ordinate.

So, all these teamwork may not be visible when you are seeing the words only, but it is good to note the work on the other side

Calendar

  • September–October were a bit slow, as the group was focusing on other areas of work. I was uploading non-lexeme pronunciation mostly.
  • In November 2020 you'll see an increase of file uploads, related to lexeme, as the group is working actively this month to create Bengali lexemes, hence we would require Bengali pronunciation accordingly.

See also

Future plans

  • As you can see, as of 1 November around 40% of my total uploads were for Bengali lexeme, others were related to mostly Wikipedia article titles, or words from dictionary. I use all the options on LinguaLibre to generate words, including "nearby" options. I have also tried PagePile, and PetScan tool to generate word list (and used as local list). In future I can write separate stories on those works. However I am mostly interested to work on the Lexeme project on LinguaLibre.
  • Once(When?) I can do some satisfactory work with Bengali on LinguaLibre, if God allows, I actually wish to move to "Indian English" or English (In) and record another series of words for Indian English. As of 1 November 2020, English on Lingua Libre is "English", I have not seen any other difference/dialect.