Titodutta/Bengali Lexeme workflow

< User:Titodutta

This page is a simple documentation of Bengali Lexeme workflow that is used on LinguaLibre. This document will explain:

  • the teamwork and the process
  • pros and cons
  • Upcoming plans

On LinguaLibre as of 1 November we have uploaded more than 35,000 words. A large number of these uploads (around 16,000 of these uploads) are to support Wikidata Lexicographical data in Bengali language. One word often have several form such as:

  • Go → went, gone, going etc. (verb)
  • Good → Better, Best etc. (adjective)

Every language has different forms, both in type in number. An English verb mostly has 5 forms. A Bengali verb may have around 98 forms.

Now, a few Bengali community members are working on Wikidata to improve lexicographical data. So, if you see bunch of words of same root are being uploaded, it is actually to sync with the Bengali project on Wikidata.

Team work: Procedure

  • 2–3 editors are working on creating Bengali Lexemes on Wikidata. User:Bodhisattwa, Bengali Wikisource admin, is pretty active in it.
  • User:Titodutta, as of 1 November 2020, mostly uploads the word pronunciation files (we have a query (and an alternative query in 2022) that makes a list of all Lexeme words without audio, we use this query to track the list of words). I generally download in CSV format, upload on Google Sheet, and make a local list on Lingua Libre. A LL local list typically looks like this. I initially created 6–7 such lists with 1,000 words each. However now I avoid creating a new page, and use the same page by overwriting words)
  • Once the words are uploaded, and we have around 1,000 new words, User:Mahir256, a Wikidata admin, uses a script, and with help of quick statements tool adds the words on Wikidata. Note: we do not use LinguaLibre bot on Wikdiata Bengali lexeme, as of now.
  • If there is confusion with grammar, spelling or other related issues, User:Hrishikes is often approached for help/suggestion.

There is a chat group on Facebook on discussion, Telegram platform, or sometimes phone calls are also done to co-ordinate.

So, all these teamwork may not be visible when you are seeing the words only, but it is good to note the work on the other side


  • September–October were a bit slow, as the group was focusing on other areas of work. I was uploading non-lexeme pronunciation mostly.
  • In November 2020 you'll see an increase of file uploads, related to lexeme, as the group is working actively this month to create Bengali lexemes, hence we would require Bengali pronunciation accordingly.
  • February 2021: Work will be slower till 24 February, because of focus on other work
  • February last week — March end 2021: Work-speed should increase because of the planned dedicated activity around Bengali lexeme
  • 13 May 2022: The next batch of Bangla Lexeme begins with this list generated by Mahir256

Future plans

  • As you can see, as of 1 November around 40% of my total uploads were for Bengali lexeme, Update: and as of 1 February 2021, ~62% of the uploads were used on Lexeme projects, others were related to mostly Wikipedia article titles, or words from dictionary. I use all the options on LinguaLibre to generate words, including "nearby" options. I have also tried PagePile, and PetScan tool to generate word list (and used as local list). In future, I can write separate stories on those works. However I am mostly interested to work on the Lexeme project on LinguaLibre.
  • Once(When?) I can do some satisfactory work with Bengali on LinguaLibre, if God allows, I actually wish to move to "Indian English" or English (In) and record another series of words for Indian English. As of 1 November 2020, English on Lingua Libre is "English", I have not seen any other difference/dialect.

See also