LinguaLibre

Difference between revisions of "About"

m (stats update)
 
(47 intermediate revisions by 9 users not shown)
Line 1: Line 1:
 +
<div class="section gap-l">
 +
<div class="columns v-center padded-m">
 +
<div>
 
<languages/>
 
<languages/>
Draft. To improve.
 
 
<translate>
 
<translate>
<!--T:1-->
+
<!--T:12-->
'''Lingua Libre''' is an audio recording tool as well as a sound library designed by Wikimedians to improve several Wikimedia projects (Wiktionaries, Wikipedias, Wikimedia Commons, Wikidata...).
+
'''Lingua Libre''' is a project of the association '''''Wikimédia France''''' which aims to build a collaborative, multilingual, ''audiovisual corpus'' under free licence in order to:
 +
* ''Expand knowledge'' '''''about languages''''' and '''''in languages''''' in an audiovisual way on the web, on Wikimedia projects and outside ;
 +
* ''Support the development'' of '''''online language communities''''' — particularly those of poorly endowed, minority, regional, oral or signed languages — in order to help communities accessing online information and to ensure the vitality of the languages of these communities.
 +
</translate>
 +
</div>
 +
<div style="text-align: center;">
 +
[[File:Lingua libre illustration - interface.svg|frameless|440px]]
 +
</div>
 +
</div>
 +
</div>
 +
 
 +
<div class="section section-blue gap-s">
 +
<div class="columns v-center">
 +
<span style="font-size: 35px; line-height: normal;">
 +
<translate><!--T:13-->
 +
Already '''<tvar|num1>{{formatnum:2000}}</> members''' and '''<tvar|num2>{{formatnum:1250000}}</>+ recordings''' on Lingua Libre, join us</translate>
 +
</span>
 +
<div style="margin-top: -5px; text-align: center;">
 +
[[Special:RecordWizard|<span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false"><translate><!--T:20--> Record your voice</translate></span>]]
 +
</div>
 +
</div>
 +
</div>
 +
 
 +
<div class="section gap-l">
 +
<div class="columns v-center padded-m">
 +
<div style="text-align: center;">
 +
[[File:Music-technology-guitar-microphone-studio-amplifier-846852-pxhere (cropped).jpg|frameless|440px|class=shadow]]
 +
</div>
 +
<div>
 +
<translate>== How to participate? == <!--T:14-->
 +
 
 +
<!--T:21-->
 +
You can use '''Lingua Libre''' by exploring and reusing recordings, contribute to the corpus by recording words, or improve the website itself, in consultation with the community.
 +
 
 +
<!--T:15-->
 +
The '''''Record Wizard''''' tab allows to record short audios (1 word, 1 phrase), to categorize them and to publish them on '''Wikimedia Commons''' from a computer or smartphone. To do so, you will need to '''[<tvar|login>https://lingualibre.org/index.php?title=Special:UserLogin&returnto=Special%253AMyLanguage%252FLinguaLibre%253AAbout&returntoquery=title%3DSpecial%253AMyLanguage%252FLinguaLibre%253AAbout</> log in]''' or create a user account. The user guide is available on the help page.
 +
 
 +
<!--T:16-->
 +
To modify the website pages, simply log in and click on Modify. To add more pages, the process is in two steps: enter the title of the page you wish to create in the search engine, with the prefix "LinguaLibre:". A message will appear inviting you to create the page. For any substantial modification, please consult the community beforehand.
 +
</translate>
 +
</div>
 +
</div>
 +
</div>
  
'''LinguaLibre.fr''' is a massive open audio recording platform and web application to ease mass recording of wordslists or text into clean, well cut, well named and apps friendly audio files. It is designed from the start to ease the creation of consistent datasets of audio files. We believe it is the best tool available to create dataset from few dozens to several thousands audios files. Recording productivity can reach up to 1000 audio recordings / hour, given a clean words list and an experienced user. Lingua Libre has received kick starter funding from both [https://www.wikimedia.fr/ Wikimedia France], the [https://wikimediafoundation.org/ Wikimedia Foundation]'s Grant projects. Today, it is actively used by the Wikimedia community and maintained by passionate contributors as an open source project.
+
<div class="section section-grey gap-m">
 +
<div class="columns padded-m v-center">
 +
<div>
 +
<translate>==== Interact with the community ==== <!--T:17-->
  
 +
<!--T:22-->
 +
Do not hesitate to inform the team of any element that could be improved. To do so, discussions take place in the Chat Room, on the mailing list or on Discord.
 
</translate>
 
</translate>
__NOTOC__
+
</div>
== Background ==
+
<div style="text-align: right;">
* '''Shtooka Recorder''' (2010) by Nicolas Vion - a notable desktop software which had a deep impact on the open audio recording ecosystems. Hundreds of applications use data produced by this software.
+
[https://discord.gg/Bqn3yXCp89 <span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false" style="margin-right: 15px; margin-bottom: 11px;"><translate><!--T:23--> Discord</translate></span>]
* '''SWAC Recorder''' (2013) by Nicolas Vion - a revamp of the earlier, lesser known but easier to install, with better user experience.
+
[https://meta.wikimedia.org/wiki/Special:MyLanguage/Lingua_Libre <span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false" style="margin-bottom: 11px;"><translate><!--T:24--> Project on Meta</translate></span>]
* '''LinguaLibre.fr v1''' (2016) by Nicolas Vion - a cloud variation of the earlier versions, the project was funded by Wikimedia France (Remy Gerbet & [[user:Lyokoi]]), and create with feedbacks from local linguistic academics. The grant is associated with the project to record and preserve dying French minorities languages. In French only, this platform was demoed to the global Wikimedia community, and demonstrated the need for a v2.
+
<br>
* '''LinguaLibre.fr v2''' (2018) by [[user:0x010C|0x010C]] - a full rebuild using Wikibase and Oath login to better integrate with the Wikimedia ecosystem. Can be used by all communities thanks to an user interface available in several macro-languages (EN,FR,ES,...). The clean, sharp, net audio files ease the creation or enhancing of various derivative applications. Both language learning and language preservation are common use cases.
+
[https://lingualibre.org/wiki/LinguaLibre:Chat_room <span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false" style="margin-right: 15px;"><translate><!--T:25--> Chat room</translate></span>]
 +
[https://phabricator.wikimedia.org/tag/lingua_libre/ <span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false" style="margin-right: 15px;"><translate><!--T:26--> Phabricator</translate></span>]
 +
[https://github.com/lingua-libre <span class="mw-ui-button mw-ui-neutral" role="button" aria-disabled="false"><translate><!--T:27--> Github</translate></span>]
 +
</div>
 +
</div>
 +
</div>
  
== Functionalities ==
+
<div class="section section-white gap-l">
In order to provide very consistent, app-friendly files, the current functionality are :
+
<translate>
* [x] easy usage without download nor installation, via LinguaLibre.fr
+
== Why participate? == <!--T:18-->
* [x] speakers' profiles, with : language, gender, age, origin and few other data recommended to us by linguists.
 
* [x] wordslist support
 
* [x] intuitive interface with audio curve went speaking
 
* [x] on demand roll-back capability using left arrow key
 
* [x] auto roll-back / do-again when saturation is detected
 
* [x] consistent cut before / after the said words
 
* [x] auto equalization for sound's level
 
* [x] Download all audios by language, by speaker
 
* [x] English User Interface, also in various languages
 
* [x] OAuth login via Wikimedia account
 
* [x] Auto-upload to Wikimedia Commons
 
* [x] Auto-integrations to Wikimedia projects via [[Help:Bots|Bots]]
 
  
Wishlist (secondary) :
+
<!--T:28-->
* [ ] Noise reduction [#29](./issues/29)
+
Lingua Libre comes from the observation of several lacks on Wikimedia projects and on the web in general:
* [ ] Fade-in / fade-out [#40](./issues/40)
 
  
== Equipment (recommendation) ==
+
<!--T:29-->
* Silent room / Recording studio
+
* Lack of diversity: While the web is in theory open to everyone, its content is far from representing all languages proportionally. More than 50% of websites are in English; only 301 of the world's 7000+ languages have a free encyclopedia <tvar|1><sup>[https://w3techs.com/technologies/overview/content_language/all <nowiki>[1]</nowiki>]</sup></>, with a content that is inferior in quality and quantity to those of more endowed languages such as Wikipedia in English<tvar|2><sup>[https://w3techs.com/technologies/overview/content_language/all <nowiki>[1]</nowiki>]</>,<tvar|3>[https://athenaeum.libs.uga.edu/handle/10724/37877 <nowiki>[2]</nowiki>]</sup></>. In addition, these websites host content that broadly reflects and meets Western standards and needs through the medium of the written word, which explains and helps to perpetuate their lack of linguistic diversity.
* 1 x [Scarlett2 Solo Studio Pack 2nd Generation](https://www.amazon.com/dp/B01E6T54E2/), comprising portable :
 
** 1 x microphone
 
** 1 x headset
 
** 1 x external sound card
 
** 1 x cables
 
* [Microphone's addons](https://www.amazon.com/dp/B01KHMUQ2M/) :
 
** 1 x Pod / Arm stand
 
** 1 x Anti-pop filter
 
** 1 x Anti-vibration system
 
* 1 x modest PC (audio recording chain is external)
 
* Internet connexion
 
  
'''Cost :''' US$250 for external audio equipments  + US$300 for optional PC  = 250 ~ 550US$.
+
<!--T:30-->
<p align="center">
+
* Lack of orality: Although languages are essentially spoken (only 4,000 of the world's 7,000 languages have a writing system)<tvar|4><sup>[https://www.ethnologue.com/enterprise-faq/how-many-languages-world-are-unwritten-0 <nowiki>[4]</nowiki>]</sup></>, knowledge sharing and communication via new information and communication technologies (NICTs) is mainly done in writing, particularly on the web, despite the rich multimedia format it allows. This mediation of the oral through the written word raises many barriers to contribution, such as the use of Unicode characters, the culture of the written word, the orthographic standardisation of the language or the literacy rate of the community.
  <a href="https://www.amazon.com/dp/B01E6T54E2/"><img src="https://i.stack.imgur.com/dvreq.jpg" alt="Audio hardware" style="width:400px;"/></a>
 
</p>
 
  
== Working process ==
+
<!--T:31-->
# Data gathering : prepare a text file with a list of words/sentences, one by line.
+
* These lacks of diversity and orality limit the ability of Internet users to communicate and contribute online to various web platforms where they cannot find content and communities sharing their language. Among the regional minority languages that are oral or signed, they threaten in particular the poorly endowed ones, many of which are currently in danger of extinction and for whom inclusion on the web is a major challenge and opportunity.
# Speaker : find a willing speaker
 
# Facility : find a calm studio or room 
 
# Hardware installation : install the equipment in the room so to work comfortably
 
# Software settings: connect to LinguaLibre.fr's studio, edit the settings according to your needs
 
# Recording : start your high quality massive audio recording. '''800 items per hour for 2 hours on the row''' is fair.  
 
# Applications : be creative, invent your apps ! :D
 
  
== <translate><!--T:2--> Useful links</translate> ==
+
<!--T:32-->
 +
* Indeed, of the 7000 languages in existence today, it is estimated that only 2500 will survive to the next century and only 250 (less than 5%!) will make their digital ascent — i.e. be used regularly for communication purposes in the digital space by native speakers who are comfortable on the web — a factor which is yet essential for their vitality<tvar|5><sup>[https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0077056 <nowiki>[5]</nowiki>]</sup></>. Current initiatives by linguists and activists to document and share data, resources and content online in the languages to be preserved do not directly contribute to the development of a digitally-ascendant linguistic community of Internet users, and thus remain limited in their impact.
  
<translate>
+
<!--T:33-->
<!--T:3-->
+
* Lingua Libre aims to make up for this lack of support by placing itself at the service of linguistic communities wishing to insert and promote their language into the digital space by exploring alternative means of communication to the written word, in the hope that this will free up online communication in a growing number of languages. This objective favours by its very nature regional minority languages that are poorly endowed in terms of oral or signed language, but also benefits more endowed languages that wish to highlight their oral and visual aspects. To fulfil its mission, Lingua Libre offers an online solution for mass recording, leading to the publication of a collaborative multilingual audiovisual corpus under free licence, whose vocation is information through consultation, and revitalisation by triggering the contribution of new language communities on Lingua Libre and then outside.</translate>
* IRC chan : <code>#lingualibre</code> on Freenode ([https://kiwiirc.com/client/irc.freenode.net/#lingualibre To join with Kiwiirc from a web browser])
+
<center>{{2020 Coolest tool award|Lingua Libre|Diversity}}</center>
* Phabricator : https://phabricator.wikimedia.org/project/profile/3393/ for issues/bugs tracking
+
</div>
* Code : https://github.com/lingua-libre on Github
 
* Twitter : https://twitter.com/LingLibre_WMFr (mainly in French)
 
</translate>
 
  
== License ==
+
<div class="section gap-m">
* All contents under [https://creativecommons.org/licenses/by-sa/4.0/ Creative Commons CC-BY-SA-4.0]
+
<h2 style="text-align: center;"><translate><!--T:19-->
 +
Partners</translate></h2>
 +
<gallery mode="packed" heights=180>
 +
File:Ministere Culture soutient.png|link=https://www.culture.gouv.fr/Thematiques/Langue-francaise-et-langues-de-France
 +
</gallery>
 +
<gallery mode="packed" heights=150>
 +
File:lo congres.jpg|link=https://locongres.org/
 +
File:Mdlnc.png|link=https://www.mncparis.fr/
 +
File:olca.png|link=https://www.olcalsace.org/
 +
</gallery>
 +
</div>
 +
__NOTOC__
 +
__NOEDITSECTION__

Latest revision as of 18:43, 11 June 2024

Other languages:
Bahasa Indonesia • ‎Bahasa Melayu • ‎Deutsch • ‎English • ‎Esperanto • ‎Toki Pona • ‎Türkçe • ‎brezhoneg • ‎dansk • ‎español • ‎euskara • ‎français • ‎italiano • ‎norsk bokmål • ‎occitan • ‎polski • ‎português • ‎română • ‎sicilianu • ‎svenska • ‎íslenska • ‎башҡортса • ‎македонски • ‎русский • ‎עברית • ‎অসমীয়া • ‎తెలుగు • ‎ဘာသာ မန် • ‎ၽႃႇသႃႇတႆး  • ‎日本語

Lingua Libre is a project of the association Wikimédia France which aims to build a collaborative, multilingual, audiovisual corpus under free licence in order to:

  • Expand knowledge about languages and in languages in an audiovisual way on the web, on Wikimedia projects and outside ;
  • Support the development of online language communities — particularly those of poorly endowed, minority, regional, oral or signed languages — in order to help communities accessing online information and to ensure the vitality of the languages of these communities.

Lingua libre illustration - interface.svg

Already 2,000 members and 1,250,000+ recordings on Lingua Libre, join us

Music-technology-guitar-microphone-studio-amplifier-846852-pxhere (cropped).jpg

How to participate?

You can use Lingua Libre by exploring and reusing recordings, contribute to the corpus by recording words, or improve the website itself, in consultation with the community.

The Record Wizard tab allows to record short audios (1 word, 1 phrase), to categorize them and to publish them on Wikimedia Commons from a computer or smartphone. To do so, you will need to log in or create a user account. The user guide is available on the help page.

To modify the website pages, simply log in and click on Modify. To add more pages, the process is in two steps: enter the title of the page you wish to create in the search engine, with the prefix "LinguaLibre:". A message will appear inviting you to create the page. For any substantial modification, please consult the community beforehand.

Interact with the community

Do not hesitate to inform the team of any element that could be improved. To do so, discussions take place in the Chat Room, on the mailing list or on Discord.

Why participate?

Lingua Libre comes from the observation of several lacks on Wikimedia projects and on the web in general:

  • Lack of diversity: While the web is in theory open to everyone, its content is far from representing all languages proportionally. More than 50% of websites are in English; only 301 of the world's 7000+ languages have a free encyclopedia [1], with a content that is inferior in quality and quantity to those of more endowed languages such as Wikipedia in English[1],[2]. In addition, these websites host content that broadly reflects and meets Western standards and needs through the medium of the written word, which explains and helps to perpetuate their lack of linguistic diversity.
  • Lack of orality: Although languages are essentially spoken (only 4,000 of the world's 7,000 languages have a writing system)[4], knowledge sharing and communication via new information and communication technologies (NICTs) is mainly done in writing, particularly on the web, despite the rich multimedia format it allows. This mediation of the oral through the written word raises many barriers to contribution, such as the use of Unicode characters, the culture of the written word, the orthographic standardisation of the language or the literacy rate of the community.
  • These lacks of diversity and orality limit the ability of Internet users to communicate and contribute online to various web platforms where they cannot find content and communities sharing their language. Among the regional minority languages that are oral or signed, they threaten in particular the poorly endowed ones, many of which are currently in danger of extinction and for whom inclusion on the web is a major challenge and opportunity.
  • Indeed, of the 7000 languages in existence today, it is estimated that only 2500 will survive to the next century and only 250 (less than 5%!) will make their digital ascent — i.e. be used regularly for communication purposes in the digital space by native speakers who are comfortable on the web — a factor which is yet essential for their vitality[5]. Current initiatives by linguists and activists to document and share data, resources and content online in the languages to be preserved do not directly contribute to the development of a digitally-ascendant linguistic community of Internet users, and thus remain limited in their impact.
  • Lingua Libre aims to make up for this lack of support by placing itself at the service of linguistic communities wishing to insert and promote their language into the digital space by exploring alternative means of communication to the written word, in the hope that this will free up online communication in a growing number of languages. This objective favours by its very nature regional minority languages that are poorly endowed in terms of oral or signed language, but also benefits more endowed languages that wish to highlight their oral and visual aspects. To fulfil its mission, Lingua Libre offers an online solution for mass recording, leading to the publication of a collaborative multilingual audiovisual corpus under free licence, whose vocation is information through consultation, and revitalisation by triggering the contribution of new language communities on Lingua Libre and then outside.
Coolest Tool Award 2020 square logo.svg

Lingua Libre

2020 Coolest Tool
Award Winner

in the category
Diversity

Partners