User

Difference between revisions of "Titodutta/Bengali Lexeme workflow"

< User:Titodutta

Line 19: Line 19:
  
 
'''So, all these teamwork may not be visible when you are seeing the words only, but it is good to note the work on the other side'''
 
'''So, all these teamwork may not be visible when you are seeing the words only, but it is good to note the work on the other side'''
 +
 +
== Calendar ==
 +
* In November 2020 you'll see an increase of file uploads, related to lexeme, as the group is working actively this month to create Bengali lexemes, hence we would require Bengali pronunciation accordingly.
  
 
== See also ==
 
== See also ==
 
* [[:d:Wikidata:Lexicographical data/Statistics/Count of forms with IPA or audio by language|Count of forms with IPA or audio by language]] (on Wikidata)
 
* [[:d:Wikidata:Lexicographical data/Statistics/Count of forms with IPA or audio by language|Count of forms with IPA or audio by language]] (on Wikidata)

Revision as of 22:25, 8 November 2020

This page is a simple documentation of Bengali Lexeme workflow that is used on LinguaLibre. This document will explain:

  • the teamwork and the process
  • pros and cons
  • Upcoming plans

On LinguaLibre as of 1 November we have uploaded more than 35,000 words. A large number of these uploads are to support Wikidata Lexicographical data in Bengali language. One word often have several form such as:

  • Go → went, gone, going etc. (verb)
  • Good → Better, Best etc. (adjective)

Every language has different forms, both in type in number. An English verb mostly has 5 forms. A Bengali verb may have around 98 forms.

Now, a few Bengali community members are working on Wikidata to improve lexicographical data. So, if you see bunch of words of same root are being uploaded, it is actually to sync with the Bengali project on Wikidata.

Team work: Procedure

  • 2–3 editors are working on creating Bengali Lexemes on Wikidata. User:Bodhisattwa, Bengali Wikisource admin, is pretty active in it.
  • User:Titodutta, as of 1 November 2020, mostly uploads the word pronunciation files (we have a query that makes list of all Lexeme words without an audio, we use this query to track the list of words). I generally download in CSV format, upload on Google Sheet, and make a local list on Lingua Libre. A LL local list typically looks like this. I initially created 6–7 such lists with 1,000 words each. However now I avoid creating a new page, and use the same page by overwriting words)
  • Once the words are uploaded, and we have around 1,000 new words, User:Mahir256, a Wikidata admin, uses a script, and with help of quick statements tool adds the words on Wikidata. Note: we do not use LinguaLibre bot on Wikdiata Bengali lexeme, as of now.
  • If there is confusion with grammar, spelling or other related issues, User:Hrishikes is often approached for help/suggestion.

There is a chat group on Facebook on discussion, Telegram platform, or sometimes phone calls are also done to co-ordinate.

So, all these teamwork may not be visible when you are seeing the words only, but it is good to note the work on the other side

Calendar

  • In November 2020 you'll see an increase of file uploads, related to lexeme, as the group is working actively this month to create Bengali lexemes, hence we would require Bengali pronunciation accordingly.

See also