LinguaLibre

Difference between revisions of "2022-2023 projection"

m
Line 48: Line 48:
 
|-
 
|-
 
| Poslovitch || Implement the Lists suggestions from July 2021's Hackathon || Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability || 3 months || Unknown || Yes
 
| Poslovitch || Implement the Lists suggestions from July 2021's Hackathon || Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability || 3 months || Unknown || Yes
 +
|-
 +
| Languageseeker || Pull common linguistical data from Wiktionaries to Wikidata || Some linguistical data (part of speech, pronunciation, conjugation, etc) is universal and would be useful to be able pull from Wikidata. However, most of it is currently manually entered on Wiktionaries. This would pull these common bits into Wikidata. Part of this project would involve developing a system for representing linguistical data in Wikidata. It will enable the disambiguation of heteronyms.  || 3 months || Unknown || Yes
 
|}
 
|}

Revision as of 17:59, 21 May 2022

This week, Wikimedia France is establishing its budget for the period of July 2022 to June 2023.

Please share here what you think we should get done this year on Lingua Libre. Feel free to add projects of yours that would require funding, as well as bugs and forseeable technical needs. Please remember to link phabricator tickets to the bugs and technical issues you raise. A maximum of 10 suggestions per person would be best.

Write your suggestions here
Username Project or problem Reason why this is important to accomplish or solve in 2022-2023 Approximate human time needed Estimated budget Would you like to be involved in this ?
Rdrg109 Program for extracting sentences from any audio stream for their inclusion in Lingua Libre. Each extracted audio would correspond to a sentence. Each sentence could be added to lexemes as a "usage example". Having usage examples with pronunciation audios makes Wikidata lexicographical data more useful. With SPARQL, we could then answer questions of the style: Usage examples with pronunciation audios that were retrieved from interviews where the participant is a native speaker of that language. More information about this idea in this page. 3 months Unknown (I have little experience with MediaWiki development so it will be more of a learning experience) Yes
Rdrg109 Interface in Lingua Libre that focuses on adding pronunciation audio that is missing (forms and usage examples that have zero pronunciation audios) There are lots of things in Wikidata lexicographical data that is missing pronunciation audio. As of 2022/03/18 22:24:21 UTC, there is only 1 usage example that has a pronunciation audio. English has 129942 forms, but only 340 have pronunciation audios (i.e. ~0.0026% of English forms has pronunciation audio), the same situation happens with other languages. More statistics on this at this page. 2 months Unknown (I have little experience with MediaWiki development so it will be more of a learning experience) Yes
marreromarco Improving the search function to make LinguaLibre useful for language learning The current user interface makes it impossible to use Lingua Libre for language learning as a competition to Forvo. Without language learners interested in the project, very few persons would be interested in contributing since audios would be stored in a database with no practical usage. LinguaLibre could be a FOSS alternative to Forvo that allows people to listen recordings easily and quickly. It is important to solve this problem in 2022-2023 to attract more contributors and expand the number of recordings. Otherwise, only few "Wikimedians" would collaborate, and the database would never grow. 6 months 30.000 Euros (cost of hiring a full time developer with experience) Yes (with feedback/ideas). I am not a programmer, but I would like to provide as much feedback as possible and report bugs.
marreromarco Public Relations (PR) Campaign LinguaLibre is essentially unknown among language learners. In its current state, the project has no way to attract learners because it lacks an efficient “Search Functionality”. If LinguaLibre could hire a developer to improve the search function, afterwards it would be necessary to promote the website to attract language learners (and new contributors). An efficient way to promote the website is to write posts on blogs,social media, magazines, newspapers, create YouTube videos, etc. A PR Campaign is necessary in 2022-2023 to increase the number of active contributors and become a viable FOSS alternative to Forvo. 6 months 6.000 Euros (Cost of hiring an intern to work at WikimediaFrance Headquarters) Yes (with feedback/ideas)
marreromarco Anki Integration with LinguaLibre An Anki Add-on would be helpful for language learners 3 months 15.000 Euros (depends on the number of hours that a developer would have to invest) Yes (with feedback/ideas).
marreromarco Add function to "Request" a Pronunciation to Native Speakers It is very useful for language learners to request the specific word/phrase in which they have doubts about the Pronunciation. Forvo allows such function and users make very creative requests. It is also helpful specially for technical terms and proper names 3 months 15.000 Euros (depends on the number of hours that a developer would have to invest) Yes (with feedback/ideas).
marreromarco Establish a “Month of Voices” on Wikipedia Propose to Wikimedia Headquarters the development of a "Month of Voices" in which LinguaLibre would be promoted on Wikipedia Articles in the Section of "Languages" at the left side of the Main Page. The idea was discussed previously: https://lingualibre.org/wiki/LinguaLibre:Events/Winter_2021-2022_Public_Relations_Campaign 6 months 6000 Euros (Payment of an Intern in charge of the PR Campaign) Yes
Poslovitch Improve the Datasets page The Datasets index is unsightly and at best offputting for people wanting to re-use our recordings through the datasets. We could get some inspiration from CommonVoice's, especially regarding statistics for each dataset 1 week < 400 € (both if we rely on a pro or volunteer dev) Yes (can actually do this)
0x010C New tools for our power-users Lingua Libre laks a couple of tools to help experienced users to do a bunch of maintenance tasks:
- patrolling
- batch-editing metadatas
- batch importing records (like the one we had on LinguaLibre v1)
- ...
Those tools could be directly integrated as new special pages into the RecordWizard MediaWiki-extension.
~3 months 15000 € Yes
0x010C Allow users to easily explore our fantastic audio-database Since we launched the v2 of this website in july 2018, hardly everything has changed with a major exception: QueryViz, the extension used to display SPARQL queries inside wikipages. Now that Lingua Libre has almost 700,000 audio recordings in its database, it would be good to take the time to improve this extension to allow everyone to explore our dataset in an easy to use, responsive and powerful online interface. This will have the side effect of attracting more people to the website, thereby increasing public awareness of the tool and the number of contributors. ~3 months 15000€ Yes
0x010C Global MediaWiki upgrade Time goes by and MediaWiki versions increase. If the schedule is respected, the future LTS version (1.39) will be released in November 2022. At this time we will have to think about migrating to stay up to date and keep our users safe. This will involve small but numerous adjustments in LinguaLibre-specific extensions.
Beyond that, there are still many possible improvements to be made to increase user experience on our MediaWiki: the main search bar, the lack of a Visual Editor, Special pages and wikicode-editing UI (Special:Search, Special:Recent changes,...), etc.
~1.5 month 8000€ Yes
0x010C RecordWizard improvements The mitigation of several major bugs, upgrading the word lists generator capabilities, enhancing the first step or adding new features like automatic audio-corrections, quality checks, support of URL parameters, there are plenty of work to do on the RecordWizard to improve its user experience. ~4 months 20000€ Yes
Poslovitch Update to Mediawiki 1.35.5 MediaWiki 1.35 has received a few security releases since last year. Proper mitigations were applied in due time, but mitigations are never more foolproof than appropriately upgrading! 1 week Unknown Yes (can do this, but would need help from Micka)
Poslovitch Update MLEB MediaWiki Language Extension Bundle is a pack of extensions that should be updated "as a group" and not individually (and attempting to do so in December did not yield any success). As brought by T295250, updating the MLEB would allow the use of a "tvar" syntax (which I'm unfamiliar with) 1 week Unknown Yes (can do this, but would need help from Micka)
Poslovitch Set up a live testing environment for development and testing of new features in real conditions of use Started back in July 2021, this has not yet been concluded. This testing environment would allow the Tech Team to make sure changes to the RecordWizard (and other extensions) do not risk to cause issues downstream. 1 month Unknown Yes (but would need help from Micka)
Poslovitch Implement the Lists suggestions from July 2021's Hackathon Ideas from July 2021's Hackathon would improve the UX for lists and improve their discoverability 3 months Unknown Yes
Languageseeker Pull common linguistical data from Wiktionaries to Wikidata Some linguistical data (part of speech, pronunciation, conjugation, etc) is universal and would be useful to be able pull from Wikidata. However, most of it is currently manually entered on Wiktionaries. This would pull these common bits into Wikidata. Part of this project would involve developing a system for representing linguistical data in Wikidata. It will enable the disambiguation of heteronyms. 3 months Unknown Yes