Lists named "Lemmas-without-audio-sorted-by-number-of-wiktionaries" are created in the following way:
- For a given language, the bot traverses categories on all wiktionaries and a few open dictionaries and collects statistics - for each lemma it counts dictionaries that describe this word in this language. This is something the bot has been doing for 11 years, generating different lists for Polish Wiktionary.
- Titles written in wrong alphabets are removed.
- Titles containing uppercase letters are removed, except German, because of a bug in Lingua Libre, which makes recording uppercase lemmas problematic.
- Lemmas with audio recording in Commons are also removed from this set. Not only files created with LiLi are removed, but also other recordings found in the "pronunciation" category for a given language or in its subcategories.
- For a few languages, minor corrections are done, in order to extract the set of dictionary lemmas, if possible without inflected forms.
- The resulting list is sorted descending by the number of dictionaries and limited to 380 entries.
- The recorded words are removed from the lists every three hours.
Lists maintained (72) : afr, ang, ara, ast, aze, bel, ben, bul, cat, ces, cmn, cym, dan, deu, ekk, ell, eng, epo, eus, fao, fas, fin, fra, gla, gle, glg, grc, heb, hin, hrv, hun, hye, ina, ind, isl, ita, jav, jpn, kan, kat, kaz, kor, lat, lit, ltz, lvs, mar, mkd, mlg, nld, nor, oci, pan, pol, por, ron, rus, san, slk, slv, spa, sqi, swa, swe, tam, tel, tha, tur, ukr, vie, yid, yue.