User
Difference between revisions of "Olafbot"
Line 5: | Line 5: | ||
* For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different [[:pl:wikt:Kategoria:Rankingi_brakujących_słów_według_wystąpień_w_innych_wikisłownikach|lists for Polish Wiktionary]]. | * For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different [[:pl:wikt:Kategoria:Rankingi_brakujących_słów_według_wystąpień_w_innych_wikisłownikach|lists for Polish Wiktionary]]. | ||
* Titles written in wrong alphabets are removed. | * Titles written in wrong alphabets are removed. | ||
− | * Lemmas with audio recording in Commons are also removed | + | * Lemmas with audio recording in Commons are also removed from this set. |
− | * For a few languages minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms. | + | * For a few languages, minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms. |
* The resulting set is sorted descending by the number of dictionaries and limited to 5000 entries. | * The resulting set is sorted descending by the number of dictionaries and limited to 5000 entries. |
Revision as of 01:29, 27 February 2021
Bot, created by Olaf, updates various lists of missing audio recordings every night. Much more active in Polish Wiktionary.
Lists named "Lemmas-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:
- For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different lists for Polish Wiktionary.
- Titles written in wrong alphabets are removed.
- Lemmas with audio recording in Commons are also removed from this set.
- For a few languages, minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms.
- The resulting set is sorted descending by the number of dictionaries and limited to 5000 entries.