LinguaLibre

About

Revision as of 21:08, 26 December 2018 by Yug (talk | contribs)

Other languages:
Bahasa Indonesia • ‎Bahasa Melayu • ‎Deutsch • ‎English • ‎Esperanto • ‎Türkçe • ‎brezhoneg • ‎dansk • ‎español • ‎euskara • ‎français • ‎italiano • ‎norsk bokmål • ‎occitan • ‎polski • ‎português • ‎română • ‎sicilianu • ‎svenska • ‎íslenska • ‎македонски • ‎русский • ‎עברית • ‎অসমীয়া • ‎తెలుగు • ‎日本語
Draft. To improve.

Lingua Libre is an audio recording tool as well as a sound library designed by Wikimedians to improve several Wikimedia projects (Wiktionaries, Wikipedias, Wikimedia Commons, Wikidata...).

LinguaLibre.fr is a massive open audio recording platform and web application to ease mass recording of wordslists or text into clean, well cut, well named and apps friendly audio files. It is designed from the start to ease the creation of consistent datasets of audio files. We believe it is the best tool available to create dataset from few dozens to several thousands audios files. Recording productivity can reach up to 1000 audio recordings / hour, given a clean words list and an experienced user. Lingua Libre has received kick starter funding from both Wikimedia France, the Wikimedia Foundation's Grant projects. Today, it is actively used by the Wikimedia community and maintained by passionate contributors as an open source project.


Background

  • Shtooka Recorder (2010) by Nicolas Vion - a notable desktop software which had a deep impact on the open audio recording ecosystems. Hundreds of applications use data produced by this software.
  • SWAC Recorder (2013) by Nicolas Vion - a revamp of the earlier, lesser known but easier to install, with better user experience.
  • LinguaLibre.fr v1 (2016) by Nicolas Vion - a cloud variation of the earlier versions, the project was funded by Wikimedia France (Remy Gerbet & user:Lyokoi), and create with feedbacks from local linguistic academics. The grant is associated with the project to record and preserve dying French minorities languages. In French only, this platform was demoed to the global Wikimedia community, and demonstrated the need for a v2.
  • LinguaLibre.fr v2 (2018) by 0x010C - a full rebuild using Wikibase and Oath login to better integrate with the Wikimedia ecosystem. Can be used by all communities thanks to an user interface available in several macro-languages (EN,FR,ES,...). The clean, sharp, net audio files ease the creation or enhancing of various derivative applications. Both language learning and language preservation are common use cases.

Functionalities

In order to provide very consistent, app-friendly files, the current functionality are :

  • [x] easy usage without download nor installation, via LinguaLibre.fr
  • [x] speakers' profiles, with : language, gender, age, origin and few other data recommended to us by linguists.
  • [x] wordslist support
  • [x] intuitive interface with audio curve went speaking
  • [x] on demand roll-back capability using left arrow key
  • [x] auto roll-back / do-again when saturation is detected
  • [x] consistent cut before / after the said words
  • [x] auto equalization for sound's level
  • [x] Download all audios by language, by speaker
  • [x] English User Interface, also in various languages
  • [x] OAuth login via Wikimedia account
  • [x] Auto-upload to Wikimedia Commons
  • [x] Auto-integrations to Wikimedia projects via Bots

Wishlist (secondary) :

  • [ ] Noise reduction [#29](./issues/29)
  • [ ] Fade-in / fade-out [#40](./issues/40)

Equipment (recommendation)

  • Silent room / Recording studio
  • 1 x [Scarlett2 Solo Studio Pack 2nd Generation](https://www.amazon.com/dp/B01E6T54E2/), comprising portable :
    • 1 x microphone
    • 1 x headset
    • 1 x external sound card
    • 1 x cables
  • [Microphone's addons](https://www.amazon.com/dp/B01KHMUQ2M/) :
    • 1 x Pod / Arm stand
    • 1 x Anti-pop filter
    • 1 x Anti-vibration system
  • 1 x modest PC (audio recording chain is external)
  • Internet connexion

Cost : US$250 for external audio equipments + US$300 for optional PC = 250 ~ 550US$.

<a href="https://www.amazon.com/dp/B01E6T54E2/"><img src="https://i.stack.imgur.com/dvreq.jpg" alt="Audio hardware" style="width:400px;"/></a>

Working process

  1. Data gathering : prepare a text file with a list of words/sentences, one by line.
  2. Speaker : find a willing speaker
  3. Facility : find a calm studio or room
  4. Hardware installation : install the equipment in the room so to work comfortably
  5. Software settings: connect to LinguaLibre.fr's studio, edit the settings according to your needs
  6. Recording : start your high quality massive audio recording. 800 items per hour for 2 hours on the row is fair.
  7. Applications : be creative, invent your apps ! :D

Useful links

License