LinguaLibre

Difference between revisions of "2022-2023 projection"

m
m
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
This week, Wikimedia France is establishing its budget for the period of July 2022 to June 2023.
+
#REDIRECT [[:meta:User:Adélaïde Calais WMFr/2022-2023 Lingualibre wishlist]]
 
+
''This page have been move to [[:meta:User:Adélaïde Calais WMFr/2022-2023 Lingualibre wishlist]], where the Visual Editor is available. Please contribute [https://meta.wikimedia.org/w/index.php?title=User:Adélaïde_Calais_WMFr/2022-2023_Lingualibre_wishlist&veaction=edit here].''
Please share here what you think we should get done this year on Lingua Libre. Feel free to add projects of yours that would require funding, as well as bugs and forseeable technical needs. Please remember to link phabricator tickets to the bugs and technical issues you raise. A maximum of 10 suggestions per person would be best.
 
{| class="wikitable"
 
|+Write your suggestions here
 
! colspan="2" |Submitted by
 
! colspan="3" |Definition & evaluation
 
! colspan="2" |Estimated
 
costs
 
|-
 
!Username
 
!Keep me informed ?
 
!What: project or problem
 
!Importance, Impact
 
!Priority for 2022-2023
 
!Human resources
 
!Budget
 
|-
 
|Rdrg109
 
|Yes
 
|Program for extracting sentences from any audio stream for their inclusion in Lingua Libre.
 
|Each extracted audio would correspond to a sentence. Each sentence could be added to lexemes as a [[wikidata:Property:P5831|"usage example"]]. Having [[wikidata:Property:P5831|usage examples]] with [[wikidata:Property:P443|pronunciation audios]] makes Wikidata lexicographical data more useful. With SPARQL, we could then answer questions of the style: [[wikidata:Property:P5831|Usage examples]] with [[wikidata:Property:P443|pronunciation audios]] that were retrieved from interviews where the participant is a native speaker of that language. More information about this idea in [[wikidata:User:Rdrg109/0/13#Program_for_extracting_sentences_from_any_audio_stream_for_their_inclusion_in_Lingua_Libre|this page]].
 
|
 
|3 months
 
|Unknown (I have little experience with MediaWiki development so it will be more of a learning experience)
 
|-
 
|Rdrg109
 
|Yes
 
|Interface in Lingua Libre that focuses on adding pronunciation audio that is missing (forms and [[wikidata:Property:P5831|usage examples]] that have zero pronunciation audios)
 
|There are lots of things in Wikidata lexicographical data that is missing [[wikidata:Property:P443|pronunciation audio]]. As of 2022/03/18 22:24:21 UTC, there is only 1 [[wikidata:Property:P5831|usage example]] that has a pronunciation audio. English has 129942 forms, but only 340 have pronunciation audios (i.e. ~0.0026% of English forms has pronunciation audio), the same situation happens with other languages. More statistics on this at [[wikidata:User:Rdrg109/0/13#Interface_in_Lingua_Libre_that_focuses_on_adding_pronunciation_audio_that_is_missing|this page]].
 
|
 
|2 months
 
|Unknown (I have little experience with MediaWiki development so it will be more of a learning experience)
 
|-
 
|marreromarco
 
|Yes (with feedback/ideas). I am not a programmer, but I would like to provide as much feedback as possible and report bugs.
 
|Improving the search function to make LinguaLibre useful for language learning
 
|The current user interface makes it impossible to use Lingua Libre for language learning as a competition to Forvo. Without language learners interested in the project, very few persons would be interested in contributing since audios would be stored in a database with no practical usage. LinguaLibre could be a FOSS alternative to Forvo that allows people to listen recordings easily and quickly. It is important to solve this problem in 2022-2023 to attract more contributors and expand the number of recordings. Otherwise, only few "Wikimedians" would collaborate, and the database would never grow.
 
|
 
|6 months
 
|30.000 Euros (cost of hiring a full time developer with experience)
 
|-
 
|marreromarco
 
|Yes (with feedback/ideas)
 
|Public Relations (PR) Campaign
 
|LinguaLibre is essentially unknown among language learners. In its current state, the project has no way to attract learners because it lacks an efficient “Search Functionality”. If LinguaLibre could hire a developer to improve the search function, afterwards it would be necessary to promote the website to attract language learners (and new contributors). An efficient way to promote the website is to write posts on blogs,social media, magazines, newspapers, create YouTube videos, etc. A PR Campaign is necessary in 2022-2023 to increase the number of active contributors and become a viable FOSS alternative to Forvo.
 
|
 
|6 months
 
|6.000 Euros (Cost of hiring an intern to work at WikimediaFrance Headquarters)
 
|-
 
|marreromarco
 
|Yes (with feedback/ideas).
 
|Anki Integration with LinguaLibre
 
|An Anki Add-on would be helpful for language learners
 
|
 
|3 months
 
|15.000 Euros (depends on the number of hours that a developer would have to invest)
 
|-
 
|marreromarco
 
|Yes (with feedback/ideas).
 
|Add function to "Request" a Pronunciation to Native Speakers
 
|It is very useful for language learners to request the specific word/phrase in which they have doubts about the Pronunciation. Forvo allows such function and users make very creative requests. It is also helpful specially for technical terms and proper names
 
|
 
|3 months
 
|15.000 Euros (depends on the number of hours that a developer would have to invest)
 
|-
 
|marreromarco
 
|Yes
 
|Establish a “Month of Voices” on Wikipedia
 
|Propose to Wikimedia Headquarters the development of a "Month of Voices" in which LinguaLibre would be promoted on Wikipedia Articles in the Section of "Languages" at the left side of the Main Page. The idea was discussed previously: https://lingualibre.org/wiki/LinguaLibre:Events/Winter_2021-2022_Public_Relations_Campaign
 
|
 
|6 months
 
|6000 Euros (Payment of an Intern in charge of the PR Campaign)
 
|-
 
|Poslovitch
 
|Yes (can actually do this)
 
|Improve the Datasets page
 
|The Datasets index is unsightly and at best offputting for people wanting to re-use our recordings through the datasets. We could get some inspiration from [https://commonvoice.mozilla.org/fr/datasets CommonVoice's], especially regarding statistics for each dataset
 
|
 
|1 week
 
|< 400 € (both if we rely on a pro or volunteer dev)
 
|-
 
|0x010C
 
|Yes
 
|New tools for our power-users
 
|Lingua Libre laks a couple of tools to help experienced users to do a bunch of maintenance tasks:
 
- patrolling
 
- batch-editing metadatas
 
- batch importing records (like the one we had on LinguaLibre v1)
 
- ...
 
Those tools could be directly integrated as new special pages into the RecordWizard MediaWiki-extension.
 
|
 
|~3 months
 
|15000 €
 
|-
 
|0x010C
 
|Yes
 
|Allow users to easily explore our fantastic audio-database
 
|Since we launched the v2 of this website in july 2018, hardly everything has changed with a major exception: QueryViz, the extension used to display SPARQL queries inside wikipages. Now that Lingua Libre has almost 700,000 audio recordings in its database, it would be good to take the time to improve this extension to allow everyone to explore our dataset in an easy to use, responsive and powerful online interface. This will have the side effect of attracting more people to the website, thereby increasing public awareness of the tool and the number of contributors.
 
|
 
|~3 months
 
|15000€
 
|-
 
|0x010C
 
|Yes
 
|Global MediaWiki upgrade
 
|Time goes by and MediaWiki versions increase. If the schedule is respected, the future LTS version (1.39) will be released in November 2022. At this time we will have to think about migrating to stay up to date and keep our users safe. This will involve small but numerous adjustments in LinguaLibre-specific extensions.
 
Beyond that, there are still many possible improvements to be made to increase user experience on our MediaWiki: the main search bar, the lack of a Visual Editor, Special pages and wikicode-editing UI ([[Special:Search]], [[Special:Recent changes]],...), etc.
 
|
 
|~1.5 month
 
|8000€
 
|-
 
|0x010C
 
|Yes
 
|RecordWizard improvements
 
|The mitigation of several major bugs, upgrading the word lists generator capabilities, enhancing the first step or adding new features like automatic audio-corrections, quality checks, support of URL parameters, there are plenty of work to do on the RecordWizard to improve its user experience.
 
|
 
|~4 months
 
|20000€
 
|-
 
|Poslovitch
 
|Yes (can do this, but would need help from Micka)
 
|Update to Mediawiki 1.35.5
 
|MediaWiki 1.35 has received a few security releases since last year. Proper mitigations were applied in due time, but mitigations are never more foolproof than appropriately upgrading!
 
|
 
|1 week
 
|Unknown
 
|-
 
|Poslovitch
 
|Yes (can do this, but would need help from Micka)
 
|Update MLEB
 
|'''M'''ediaWiki '''L'''anguage '''E'''xtension '''B'''undle is a pack of extensions that should be updated "as a group" and not individually (and attempting to do so in December did not yield any success). As brought by [[phab:T295250|T295250]], updating the MLEB would allow the use of a "tvar" syntax (which I'm unfamiliar with)
 
|
 
|1 week
 
|Unknown
 
|-
 
|Poslovitch
 
|Yes (but would need help from Micka)
 
|Set up a live testing environment for development and testing of new features in real conditions of use
 
|Started back in July 2021, this has not yet been concluded. This testing environment would allow the Tech Team to make sure changes to the RecordWizard (and other extensions) do not risk to cause issues downstream.
 
|
 
|1 month
 
|Unknown
 
|-
 
|Poslovitch
 
|Yes
 
|Implement the Lists suggestions from July 2021's Hackathon
 
|Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability
 
|
 
|3 months
 
|Unknown
 
|-
 
|Languageseeker
 
|Yes
 
|Pull common linguistical data from Wiktionaries to Wikidata
 
|Some linguistical data (part of speech, pronunciation, conjugation, etc) is universal and would be useful to be able pull from Wikidata. However, most of it is currently manually entered on Wiktionaries. This would pull these common bits into Wikidata. Part of this project would involve developing a system for representing linguistical data in Wikidata. It will enable the disambiguation of heteronyms.
 
|
 
|3 months
 
|Unknown.
 
|}
 

Latest revision as of 08:28, 2 June 2022

This page have been move to meta:User:Adélaïde Calais WMFr/2022-2023 Lingualibre wishlist, where the Visual Editor is available. Please contribute here.