Help
Difference between revisions of "Download datasets"
(→Note) |
|||
Line 42: | Line 42: | ||
− | === Using CommonsDownloadTool == | + | === Using CommonsDownloadTool === |
Be aware, lingualibre has : | Be aware, lingualibre has : | ||
- Audio average size on Lili: 100kB | - Audio average size on Lili: 100kB |
Revision as of 10:21, 5 February 2021
Requirements
Java Runtime Environment.
Ubuntu: sudo apt-get install default-jre
Install
- Open GitHub Wiki-java-tools project page.
- Find the last
Imker
release. - Download Imker_vxx.xx.xx.zip archive
- Extract the .zip file
- Run as follow :
- On Windows : start the .exe file.
- On Ubuntu, open shell then :
$java -jar imker-cli.jar -o ./myFolder/ -c 'CategoryName'
Find your target category
- Commons:Category:Lingua Libre pronunciation by user
- Commons:Category:Lingua Libre pronunciation by language
Manual
Imker -- Wikimedia Commons batch downloading tool. Usage: java -jar imker-cli.jar [options] Options: --category, -c Use the specified Wiki category as download source. --domain, -d Wiki domain to fetch from Default: commons.wikimedia.org --file, -f Use the specified local file as download source. * --outfolder, -o The output folder. --page, -p Use the specified Wiki page as download source. The download source must be ONE of the following: ↳ A Wiki category (Example: --category="Denver, Colorado") ↳ A Wiki page (Example: --page="Sandboarding") ↳ A local file (Example: --file="Documents/files.txt"; One filename per line!)
Note
There are also ways to use a category name as input, then to do API queries in order to get the list of files, download them. For a start point on API queries, see this pen.
Using CommonsDownloadTool
Be aware, lingualibre has : - Audio average size on Lili: 100kB - Audios on Lili: 300,000+ audios - Total data's size = 30GB. - Safe error margin : 5-10x Required disk space : 150~300GB.
To download all datasets as zips : - Download on your large device the script :
- create_datasets.sh - CommonsDownloadTool/commons_download_tool.py
- Read them a bit, move them where they fit the best on you computer - Edit as needed so the paths are correct, make it work. - Run successfully - Check if the number of files in the downloaded zips matches the number of files in Commons:Category:Lingua Libre pronunciation