LinguaLibre
Difference between revisions of "Bot"
m (→Draft bot) |
|||
Line 212: | Line 212: | ||
repo = site.data_repository() # the Wikibase repository for given site | repo = site.data_repository() # the Wikibase repository for given site | ||
item = pywikibot.ItemPage(repo, 'Q42') # a repository item | item = pywikibot.ItemPage(repo, 'Q42') # a repository item | ||
+ | print(item) #print item content | ||
</syntaxhighlight > | </syntaxhighlight > | ||
[[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 13:36, 8 July 2021 (UTC) | [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 13:36, 8 July 2021 (UTC) |
Revision as of 13:41, 8 July 2021
Lingua Libre Bot (aka LiLiBot or LLBot) is passionate about audio recordings, languages and Wikimedia projects. Every day, it adds Lingua Libre's latest audio recordings to the relevant pages on various Wikimedia projects.
Due to the big amount of recordings added every day, it is necessary for LiLiBot to obtain the Bot status on each wiki it works on, in order to contribute to it safely. As of today, the bot is allowed and able to contribute to four Wikimedia projects.
YOU can help LiLiBot pursue its mission! Follow the guidelines on this page to request LiLiBot on your wiki!
This page serves as a request page for Lingua Libre Bot on specific wikis.
Copy and adapt this template to your needs, then paste it in a new section at the bottom of this page:
== Bot request for the {language} Wiktionary == {{Bot steps}} * '''Example pages (≥3):''' a few links to your Wiktionary's pages that are examples of the usual page structure. * '''Target section:''' title of the section in which the recordings should be listed (on the French Wiktionary, this is {{S|prononciation}}) * '''Local audio template(s) example(s):''' an example of how the audio recording template is used : e.g. {{deng|en|en-us-apple.ogg|Deng (DYA)}} * '''Local audio template(s) explained:''' explain the various parameters of that template (especially if the documentation of your template is not available in English nor in French) ** {Deng} means "audio", and take the following parameters... ** <code>en</code> is the iso639-2 of the audio. ** <code>en-us-apple.ogg</code> is the filename ** <code>Deng (DYA)</code> means audio (deng) and USA (DYA) which is the local variant or accent. * '''Edit summary text: ''' the summary text you would like to be displayed on your wiki when Lingua Libre Bot adds an audio file. * Request by: ~~~~
You can also read and publish useful information about bots in general on the current collaborative page.
Requests
Add your request below. Follow this template
== Bot request for {language} witkionary == * '''Example pages (3):''' [[:ku:wikt:Apple]], [[:ku:wikt:Pomme]] - You can see best audio integration there. * '''Target section:''' the audio file should be added at the end of the <code>==Bilêvkirin==</code> section, which means ... * '''Local audio template(s) example(s):''' {{deng|en|en-us-apple.ogg|Deng (DYA)}} * '''Local audio template(s) explained:''' ** {Deng} means "audio", and take the following parameters... ** <code>en</code> is the iso639-2 of the audio. ** <code>en-us-apple.ogg</code> is the filename ** <code>Deng (DYA)</code> means audio (deng) and USA (DYA) which is the local variant or accent. * Request by: ~~~~
Bot request for {or} witkionary
- Example pages (3): or:wikt:ଓଡ଼ିଆ, or:wikt:ଓଡ଼ିଆ - You can see best audio integration there.
- Target section: the audio file should be added at the end of the
==ଉଚ୍ଚାରଣ==
section, which means pronunciation - Local audio template(s) example(s): Template:ଅଡ଼ିଓ
- Local audio template(s) explained:
- {ଅଡ଼ିଓ} means "audio", and take the following parameters...
or
is the iso639-2 of the audio.or-ଓଡ଼ିଆ.wav
is the filenameଧ୍ୱନି (ମାନକ ଓଡ଼ିଆ)
means audio (ଧ୍ୱନି) in "ମାନକ ଓଡ଼ିଆ" (standard spoken Odia language) which is the standard spoken variant or accent.
- Request by: Subhashish (talk) 19:04, 24 June 2021 (UTC)
- Hi @Poslovitch is there anything that I need to share with you here? --Subhashish (talk) 01:33, 29 June 2021 (UTC)
- Hi @Psubhashish the request looks fine. I'm attending a hackathon this week; I probably won't get to go through all of the requests, but we will probably find a way to speed up the process. I'll keep you posted once the bot is ready to contribute on orwikt ;) --Poslovitch (talk) 13:15, 29 June 2021 (UTC)
- Hi @Poslovitch , all the best for the hackathon! :) This can wait after all. Quick question -- there are more than 6K words in the ory category on Commons which have the File:or-WORD.wav/ogg/flac format (some are ori-nor-NAME.wav and are categorized properly). 3.7K of these 6K+ files were recorded using LL but are now disconnected from the LL Wikibase (meaning LL Wikibase will have their former filenames) while renaming on Commons. Can we use your bot to insert them into the respective pages on orwikt? --Subhashish (talk) 15:07, 29 June 2021 (UTC)
- If they got renamed, the bot won't be able to find them, sadly. I'm noting we need a way to detect if a file got renamed on Commons so that the Wikibase here gets updated, which in turn would allow the bot to find the recordings. --Poslovitch (talk) 16:07, 29 June 2021 (UTC)
- @Poslovitch , indeed, linking changed files would be very helpful. Please let me know once your bot starts to insert the recorded files to the Odia Wiktionary. --Subhashish (talk) 04:14, 5 July 2021 (UTC)
- If they got renamed, the bot won't be able to find them, sadly. I'm noting we need a way to detect if a file got renamed on Commons so that the Wikibase here gets updated, which in turn would allow the bot to find the recordings. --Poslovitch (talk) 16:07, 29 June 2021 (UTC)
- Hi @Poslovitch , all the best for the hackathon! :) This can wait after all. Quick question -- there are more than 6K words in the ory category on Commons which have the File:or-WORD.wav/ogg/flac format (some are ori-nor-NAME.wav and are categorized properly). 3.7K of these 6K+ files were recorded using LL but are now disconnected from the LL Wikibase (meaning LL Wikibase will have their former filenames) while renaming on Commons. Can we use your bot to insert them into the respective pages on orwikt? --Subhashish (talk) 15:07, 29 June 2021 (UTC)
- Hi @Psubhashish the request looks fine. I'm attending a hackathon this week; I probably won't get to go through all of the requests, but we will probably find a way to speed up the process. I'll keep you posted once the bot is ready to contribute on orwikt ;) --Poslovitch (talk) 13:15, 29 June 2021 (UTC)
Bot request for ku.wiktionary
- Example pages (3): ku:wikt:beran, ku:wikt:başûr, ku:wikt:keskesor- You can see best audio integration there.
- Target section: The audio file should be added at the end of the
=== Bilêvkirin ===
section, which means "Pronunciation". If there is no=== Bilêvkirin ===
section on the page, please create one after the language section, that is== {{ziman|<lang code>}} ==
. If there is no language section, the audio file should not be added. - Local audio template(s) example(s):
{{deng|ku|LL-Q36368 (kur)-Mihemed Qers-keskesor.wav|Deng|dever=Qers}}
- Local audio template(s) explained:
- {Deng} is the template name which means "audio", and takes the following parameters...
ku
is the lang code from ISO 639-1 of the audio, ISO 639-3 and ISO 639-2 are also in use.LL-Q36368 (kur)-Mihemed Qers-keskesor.wav
is the filenameDeng
means audio, should always be present.|dever=
means place of origin, could be local variant or accent, country or city name. In the example "Qers" is the Kurdish name for the city en:Kars.
- Request by: Balyozxane (talk) 04:05, 22 February 2021 (UTC)
- Here are two examples [1], [2]. If there are multiple part of speech sections we still collect them all at the top of the page like this [3].
|dever=
parameter should fetch the Kurdish names for places from Wikidata if possible. Lingua libre uses "kur" code for Kurdish, but we use "ku" and sometimes "kmr" on ku.wikt. Even when the language code is "kmr" in language section, the lang code in {{deng|<lang code>}} should be "ku". I think that's all I can remember. Any questions? --Balyozxane (talk) 00:26, 21 February 2021 (UTC)- You can also take a look at this page [4] for guidance.--Balyozxane (talk) 00:47, 21 February 2021 (UTC)
- @Balyozxane , your last link is a diff, is it normal ? Also, can you reformat a bit your request so it follow the template above. You can also allow me to edit your text and I will happily do it. cc: user:Poslovitch. Yug (talk) 18:52, 21 February 2021 (UTC)
- @Yug The last link was for example only to show there are other varients but the first two are the desired outcome from the LiLiBot. Feel free to correct my use of the template as free as you can. Balyozxane (talk) 04:05, 22 February 2021 (UTC)
- @Balyozxane Thanks! I'll get to work ASAP. I'll notify you once I'm ready to test the bot ;) --Poslovitch (talk) 13:19, 23 February 2021 (UTC)
- Thank you!Balyozxane (talk) 08:00, 24 February 2021 (UTC)
- @Balyozxane Thanks! I'll get to work ASAP. I'll notify you once I'm ready to test the bot ;) --Poslovitch (talk) 13:19, 23 February 2021 (UTC)
- @Yug The last link was for example only to show there are other varients but the first two are the desired outcome from the LiLiBot. Feel free to correct my use of the template as free as you can. Balyozxane (talk) 04:05, 22 February 2021 (UTC)
- @Balyozxane , your last link is a diff, is it normal ? Also, can you reformat a bit your request so it follow the template above. You can also allow me to edit your text and I will happily do it. cc: user:Poslovitch. Yug (talk) 18:52, 21 February 2021 (UTC)
- You can also take a look at this page [4] for guidance.--Balyozxane (talk) 00:47, 21 February 2021 (UTC)
Connexion via Oauth and Bots for Unilex lists editing
@Olaf & Poslovitch Hello folks. I'am having some connection issues with WikiAPI (JS) code to connect to LinguaLibre. Is there some special thing to do to connect my bot to edit Lili ? As human using chrome, being connected to Commons alone doesn't connect you to LinguaLibre. We have to come here, click login, which sends a Oauth query (I guess), check my login status on Commons, then makes something so I'am loggued into both Commons and Lingualibre. I suspect some additional Oauth query is needed inside my bot. Yug (talk) 21:22, 1 March 2021 (UTC)
- Normally the login procedure here is very complicated: mw:OAuth/For_Developers, and I've never managed to implement it, however if you use a bot account, you can create a password in Special:BotPasswords, and then log in directly on Lingua Libre wiki without Commons. Alternatively you can use one of the JS frameworks to log in. Finally, if you are logged in manually in the browser, the authorization proof should be in cookies, so the JS scripts in the browser should work fine. Olaf (talk) 21:19, 1 March 2021 (UTC)
- Special:BotPasswords/Dragons_Bot. Progress underway. Thank you.
- I see :
Allowed IP ranges: 0.0.0.0/0 ::/0
- Any explanation for this ? Dragons Bot (talk) 21:30, 1 March 2021 (UTC)
Lists: approach and limits
Dragons Bot script is ready to run. A test back is visible on Special:Contributions/Dragons_Bot.
I propose the following ranges of words for lists creations :
var ranges = [ [ '00001', '00200' ], // 1) 'List:Ibo/Most_used_words,_UNILEX_1:_words_00001_to_00200' [ '00201', '01000' ], // 2) 'List:Ibo/Most_used_words,_UNILEX_2:_words_00201_to_01000' [ '01001', '02000' ], // … [ '02001', '04000' ], // 4) <←——— 1st threshold [ '05001', '10000' ], // … [ '10001', '15000' ], // [ '15001', '20000' ], // [ '20001', '25000' ], // [ '25001', '30000' ], // 9) <←——— 2nd threshold [ '30001', '35000' ], // [ '35001', '40000' ], // [ '40001', '45000' ], // [ '45001', '50000' ] // ];
I willfully create a smooth ramp approach to onboard new comers. I tested, 200 is a nice balance while we start. It is gently ambitious and about 10 mins works. It typically the kind of list-size I was looking for demoing in IRL events, with new users. Then can do just 20 if they wish. But the length alone, 200, encourage to flow it forward and to try out the Lingualibre productive flow which appears after 20~30 words but requires 50 words to "see the power" of LinguaLibre.
As for the depth, I first though of a deal :
// `corpus-limit`: // - default: x = 6000; // - active: x = 30000 // - rule: if recordings > 2000 according to https://lingualibre.org/wiki/LinguaLibre:Stats, then `active`.
With that rule, our 17 most active languages get 30,000 words via 9 files. All others get 5,000 words via 4 files.
But after some though I'am wondering if this 5,000 first limit is too small. It allows good on-boarding, but then nothing. Waiting still a bit. Yug (talk) 22:22, 4 March 2021 (UTC)
Hi everyone ! Unilex import has started, you may see it happen on recent bot edits or on Special:Contributions/Dragons_Bot. At the end :
- breath: 1001 languages will be covered
- type: with their frequency lists, from Unilex.
- amount: ~100 major languages (has iso639-2 in 2 character) will have by default a maximum of 30,000 words. Minor languages will have at maximum 6000 words.
Note: I noticed some errors after launching. Those will be fixed in my code and on Lingualibre.
These uploads are done using User:Kanasimi's Wikiapijs framework. :) Yug (talk) 10:15, 17 May 2021 (UTC)
Imported list names ?
- @Pamputt , I derivated from the IETF's column on the right a new `iso639-3` column on the left. These `iso639-3` will provide the
Iso
inList:{Iso}/{Title}{range}
. But I often didn't know what was the `iso639-3` versions so I kept the IETF tag (ceux avec les-
). Could you review languages.js and share with me possible corrective indications ? Or is it ok if I use those ? (I don't think so, the record wizard will have difficulties finding them) Yug (talk) 22:28, 4 March 2021 (UTC)- About two letters code, you can find the "equivalent" ISO 639-3 code using this Wikipedia page. For example, "ae-Latn" corresponds to "ave" on LinguaLibre. Pamputt (talk) 06:46, 5 March 2021 (UTC)
- Yes, I need to be sure I'am converting correctly from composite IETF into current conventions (iso639-3 right?). Thanks for the lead on `ave`, I will check those. I gathered below the list of items I'am confused by.
- About two letters code, you can find the "equivalent" ISO 639-3 code using this Wikipedia page. For example, "ae-Latn" corresponds to "ave" on LinguaLibre. Pamputt (talk) 06:46, 5 March 2021 (UTC)
{ 'iso639-3':'ave', file:'ae-Latn' }, { 'iso639-3':'', file:'be-tarask' }, { 'iso639-3':'', file:'blt-Latn' }, { 'iso639-3':'', file:'ca-valencia' }, { 'iso639-3':'', file:'ctd-Latn' }, { 'iso639-3':'', file:'el-Latn-u-sd-it75' }, { 'iso639-3':'', file:'gsw-u-sd-chag' }, { 'iso639-3':'', file:'gsw-u-sd-chbe' }, { 'iso639-3':'', file:'gsw-u-sd-chfr' }, { 'iso639-3':'', file:'kab-Arab' }, { 'iso639-3':'', file:'kab-Tfng' }, { 'iso639-3':'', file:'rm-puter' }, { 'iso639-3':'', file:'rm-rumgr' }, { 'iso639-3':'', file:'rm-surmiran' }, { 'iso639-3':'', file:'rm-sursilv' }, { 'iso639-3':'', file:'rm-sutsilv' }, { 'iso639-3':'', file:'rm-vallader' }, { 'iso639-3':'', file:'sr-Latn' }, { 'iso639-3':'', file:'vec-u-sd-itpd' }, { 'iso639-3':'', file:'vec-u-sd-itts' }, { 'iso639-3':'', file:'vec-u-sd-itvr' },
- Do we have a naming convention for cases like
gsw
,rm
andvec
which has several sub elements each ? Should I doList:{gsw}/u-sd-chag/{title}
? I will return here later to complete all those I can. Yug (talk)- What I know is Lingua Libre uses ISO 639-3 to identify the language in the lists. So we should use pure ISO 639-3 to name the lists. Let us talk about
rm
for example, the ISO 639-3 code is "roh". The text after the hyphen is used to discriminate the dialects. On LinguaLibre, we can create list for a given dialect, but it should be named such as "List:Roh/Puter-namelist" or maybe "List:Roh/Puter/Namelist". I have not checked what is the behaviour of the list taht would be named following the last proposal. - The same remark applied for IETF code such as "kab-Arab". It that case, "Arab" is about the script. So we could name the list such as "List:Kab/Arab/namelist" for example. Pamputt (talk) 08:59, 5 March 2021 (UTC)
- It's a more general question - how should the resulting recording files look like? For example
be-tarask
is a standard Belarussian but written with Latin script instead of Cirillic. Still most Wiktionaries considers it a separate language (example: wikt:fr:Catégorie:biélorusse_(tarashkevitsa)). If the LiLi bot is supposed to attach the recordings properly, the language should have a separate ISO code in LiLi. You can't just put it as a version ofbel
. But LiLi has only one Belarussian language code defined. sr-latin
is a Latinized version of Serbian, but the French Wiktionary puts it in the same bag with Cyrillic: wikt:fr:Catégorie:serbe, in the Polish Wiktionary we allow only the Cyrillic script in Serbian, in English Wiktionary everything is together with Croatian as the Serbo-Croatian language. Total mess. A few other codes are also different script versions of standard languages.gsw-*
, on the other hand, are various dialects of Swiss German. I believe they all are treated in Lingua Libre and Wiktionaries as dialects of German (deu
). Perhaps codegsw
could also be created here, but it isn't.rm-*
are dialects of the Romansh language. LiLi treats them as one languageroh
.- In general, if we want to have rare languages on board, they should be defined here first. It's not enough to make a list if you can't select proper language while recording. Maybe you should import only lists for the languages defined in LiLi? Olaf (talk) 09:26, 5 March 2021 (UTC)
- It's a more general question - how should the resulting recording files look like? For example
- What I know is Lingua Libre uses ISO 639-3 to identify the language in the lists. So we should use pure ISO 639-3 to name the lists. Let us talk about
- Do we have a naming convention for cases like
- Special:RecordWizard's Step 3 :
Details
(which should beList
IMHO) does 2 things:- List picking: seems to load the list via a simple research by name. The list's name (and iso prefix) does NOT influences the recordings.
- « You record words in: {pick your language} » : this defined how the words's Qid will be tagged, imported to Commons, and categorized.
- You can load List:Mar/wiktionary, and pick the language Japanese. Then your recordings Qitems will be
iso639-3 = jpn
. - So, for today case (list creation), I just need to have my list starting with recognizable
iso639-3
so they show up properly. - The question of languages is a Wikidata/LanguageImporter issue.
- I'am cognitively tired of this past coding days so I will simply not upload those composite-names languages for now. But it stays a practical question with implications and side effects (wikidata, wiktionary) to kink about. Yug (talk) 21:36, 5 March 2021 (UTC)
- Special:RecordWizard's Step 3 :
Bots ?
Let's welcome User:Babel AutoCreate (t•c), User:FuzzyBot (t•c) :D Yug (talk) 23:00, 8 March 2021 (UTC)
Bot request for Catalan Witkionary
- Example pages (3): ca:wikt:mariner, ca:wikt:activity, ca:wikt:fèr - You can see best audio integration there.
- Target section: There is no specific section. The audio file should be added under the language heading
== {{-xx-}} ==
and before the first POS section. It should be added in a new line after pronuntiation templates, if any, either{{pron|xx|...}}
,{{pronafi|xx|...}}
or{{xx-pron}}
. In these templates xx means language code ISO 639-1 or ISO 639-3. - Local audio template(s) example(s):
{{àudio|en-us-activity.ogg|lang=en|accent=EUA}}
- Local audio template(s) explained:
- {àudio} means "audio", and take the following parameters...
en-us-activity.ogg
is the filenamelang=en
is the ISO 639-1 of the language.accent=EUA
means USA which is the accent or local variant. This parameter is optional and it may be codified for Catalan as explained at ca:wikt:Template:àudio.
- Request by: Vriullop (talk) 14:41, 9 March 2021 (UTC)
API end point
Please note : Lingualibre API endpoint is atypical, and is : https://www.lingualibre.org/api.php --Yug (talk) 21:01, 25 April 2021 (UTC)
Welcome to Kanashimi & Cewbot !
Welcome to Cewbot and its bot master Kanashimi ! Kanashimi is the creator of Wikiapi.js (doc), a JS framework to create nodejs bots which edit Wikimedia's Wiki via API queries. I use this framework as well, so we are creating a little testing group and creating tools to ease juniors developers on-boarding. Yug (talk) 08:18, 27 April 2021 (UTC)
- Thank you. cewbot runs on several wikis, including enwiki. I think I may transfer some tasks already running on other wiki projects to Lingua Libre:
Nickname | Definition | Helpfulness | Difficulty to code |
---|---|---|---|
Topic list | Add topic list to discussion pages including LinguaLibre:Chat room, LinguaLibre:Administrators' noticeboard, LinguaLibre:Technical board. Topic listed here is a sample. | ? | ? |
Signature fixer. | On talk page, when signature is missing, add user's signature and date. | 2/4 | ? |
Discussions archiver. | On defined pages marked by a category via template, when a section is inactive for n days, archive it. | 3/4 | ? |
Anchors fixers. | Fixing broken anchors including those archived. | 2/4 | ? |
Sandbox cleaner. | Periodically blanks LinguaLibre:Sandbox. | 1/4 | Easy |
Welcome bot. | When user makes an edit be has an user_talk page empty, post {{subst:welcome|~~~~}} | 2/4 | Easy |
- How about these? --Kanashimi (talk) 09:52, 27 April 2021 (UTC)
Draft bot
Is there anywhere a piece of code showing how to connect a bot to Lingua Libre and how to retrieve a page. Or maybe someone can share such code? I think it would be very helpful to point to such "draft bot" that could be used as a skeleton for several tasks. Pamputt (talk) 18:25, 7 July 2021 (UTC)
- @Pamputt I also started to look for a way to have a bot working here but couldn't find simple instructions. Poslovitch gave me some advice though, to build a bot working on the same base than LLBot (i.e. not based on Pywikibot). I can send you the info if you are interested. Best — WikiLucas (🖋️) 08:24, 8 July 2021 (UTC)
- EDIT @Pamputt Thank you very much for phabricator:T286303! I will try to implement a Welcome bot soon and a bot similar to what Olafbot does (i.e. updating local lists), but based on Wikt categories (lists are easier to find a suggest to newcomers than wiki categories). All the best — WikiLucas (🖋️) 09:36, 8 July 2021 (UTC)
And so, using Pywikibot, a draft code would be:
- get a Mediawiki page
import pywikibot
site = pywikibot.Site('lingualibre:ligualibre')
page = pywikibot.Page(site, 'User:Example')
print(page) #print the wikicode
- get a Wikibase element
import pywikibot
site = pywikibot.Site('lingualibre:lingualibre')
repo = site.data_repository() # the Wikibase repository for given site
item = pywikibot.ItemPage(repo, 'Q42') # a repository item
print(item) #print item content