User

Difference between revisions of "Olafbot"

(no uppercase)
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
[[File:Wiktionary_Bots.png|thumb]]
 
[[File:Wiktionary_Bots.png|thumb]]
Bot, created by {{u|Olaf}}, updates various lists of missing audio recordings every night. Much more active in [[:pl:wikt:Specjalna:Wkład/Olafbot|Polish Wiktionary]].  
+
The bot, created by {{u|Olaf}}, continuously updates various lists of missing audio recordings. Much more active in [[:pl:wikt:Specjalna:Wkład/Olafbot|Polish Wiktionary]].  
  
 
Lists named "Lemmas-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:
 
Lists named "Lemmas-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:
 
* For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different [[:pl:wikt:Kategoria:Rankingi_brakujących_słów_według_wystąpień_w_innych_wikisłownikach|lists for Polish Wiktionary]].
 
* For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different [[:pl:wikt:Kategoria:Rankingi_brakujących_słów_według_wystąpień_w_innych_wikisłownikach|lists for Polish Wiktionary]].
 
* Titles written in wrong alphabets are removed.
 
* Titles written in wrong alphabets are removed.
* Lemmas with audio recording in Commons are also removed for this set.
+
* Titles containing uppercase letters are removed, except German, because of a [https://lingualibre.org/index.php?title=Q71505&diff=458373&oldid=356456 bug in Lingua Libre], which makes recording uppercase lemmas problematic.
* For a few languages minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms.
+
* Lemmas with audio recording in Commons are also removed from this set. Not only files created with LiLi are removed, but also other recordings found in the "pronunciation" category for a given language or in its subcategories. 
* The resulting set is sorted descending by the number of wiktionaries and limited to 5000 entries.
+
* For a few languages, minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms.
 +
* The resulting list is sorted descending by the number of dictionaries and limited to 380 entries.
 +
* The recorded words are removed from the lists every three hours.
 +
 
 +
Lists maintained (72) : {{Olafbot-wikt}}.

Revision as of 09:32, 23 May 2021

Wiktionary Bots.png

The bot, created by Olaf, continuously updates various lists of missing audio recordings. Much more active in Polish Wiktionary.

Lists named "Lemmas-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:

  • For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different lists for Polish Wiktionary.
  • Titles written in wrong alphabets are removed.
  • Titles containing uppercase letters are removed, except German, because of a bug in Lingua Libre, which makes recording uppercase lemmas problematic.
  • Lemmas with audio recording in Commons are also removed from this set. Not only files created with LiLi are removed, but also other recordings found in the "pronunciation" category for a given language or in its subcategories.
  • For a few languages, minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms.
  • The resulting list is sorted descending by the number of dictionaries and limited to 380 entries.
  • The recorded words are removed from the lists every three hours.

Lists maintained (72) : afr, ang, ara, ast, aze, bel, ben, bul, cat, ceb, ces, cmn, csb, cym, dan, deu, ekk, eng, epo, est, eus, fao, fas, fin, fra, gla, gle, glg, grc, gre, guj, hau, heb, hin, hrv, hun, hye, ido, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, khm, kor, kur, lat, lit, ltz, lvs, mal, mar, mkd, mlg, mlt, mon, msa, nld, nor, oci, pan, pnb, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tgl, tha, tur, ukr, urd, vie, wuu, yid, yue.