LinguaLibre

Weekly updates

Week 24

I've continued the work on linking Lingua Libre to the WM projects:

  • Finalize the Wikimedia Category word generator;
  • Develop a Lingua Libre Bot connector for the French Wiktionary
  • Ask the bot flag on the French Wiktionary and discussions on the bot's implementation

Week 23

This week, I've focused on the reuse of sounds from Lingua Libre on WM wikis:

  • Create a new generator to extract word lists from WP / WT categories
  • Fix a categorization issue on the template on Commons
  • Bot:
    • Creation of Lingua Libre Bot
    • Create a new Git repository for it
    • Create a main structure, which manage command-line arguments, records fetching and dispatching
    • Develop a wikidata connector
    • First discussions with the frwiktionary community to setup a bot connection for this wiki

Week 22

  • Create a new phabricator project
    • Issue centralisation and sorting
  • Enhance the UX of the publish step:
    • Replace the html5 <audio> player by a custom button
    • Add a remove button next-to each records
    • Reword the next button in publish on Commons to be clearer
  • Support #-separated lists as input inside the word input field, in the details step
  • Manage homograph words by allowing contributors to add qualifiers inside brackets after the transcription of the word
  • Many tests and bug chasing

Week 21

I attended this week-end to the Wikimedia Hackathon in Barcelona.

  • Outreach:
    • Introduce Lingua Libre to many people, including WMDe and WMF teams;
    • Talked about the best way to integrate Lingua Libre with the incoming Lexeme on Wikidata and Structured Data on Commons;
    • Did ~500 records of words in Catalan, French, Occitan and Fon-gbe languages;
    • Introduce Lingua Libre during the showcase;
  • Tech:
    • Use internally Wikidata Qid as language identifier rather than ISO 693-3 codes;
    • Fix the first bugs repported.
    • Add a favicon
    • Enhance the Template:Lingua Libre record on commons; add i18n support and machine-readable encoding
  • Move legacy version of Lingua Libre to https://v1.lingualibre.fr

Week 20

This week was focus on the preparations for the launch of the beta version.

  • Improve the help pages structure, and create the main ones;
  • Finish the microphone checker in the RecordWizard;
  • Setup content translation of the pages which needs it;
  • Setup a new production environment
    • Duplicate the code-base and configuration files;
    • Duplicate the database and do a bit cleanup of all tests inside it;
    • Import ~400 languages from wikidata (more to come later);
  • Communication on Lingua Libre's social networks;

The first public beta version of Lingua Libre is out!

Week 19

  • Many tests and bugfix;
  • OAuth:
    • R&d and enhancements around its configuration;
    • Ask for the needed tokens, to be able to connect and upload files on Wikimedia Commons;
  • Turn the first step of the RecordWizard in a microphone checker.

Week 18

This week was focused on code quality improvement (in anticipation of the public beta release):

  • Document more the code I've written so far;
  • Switch JS scripts to strict mode;
  • Plug in ESLint to the project and fix all the lint errors.

Week 17

  • Some LinguaLibre outreach at the WikiWorkshop (holding place during the web conference in Lyon);
  • Many minor server configuration improvements;
  • Meeting with Remy from Wikimédia France to start thinking about a communication plan.


Week 16

  • Some LinguaLibre outreach at the Wikimedia pre-hackathon in Montpellier;
  • Create a wikidata item selector;
    • Switch all location fields of the RecordWizard to use wikidata items;
  • Enhance the sparql endpoint configuration.

Week 15

  • Turn the query display script into a MediaWiki extension, to improve performance and reusability:
    • Code available here: QueryViz on github;
    • Add a new <query> tag to the parser;
    • See a usage example here (code).
    • Add some userfriendly filtering capabilities to the results, using an optional in-query syntax, for example:
      #extra:{"type": "wikibase-item", "filter":"Q3", "label": "P5", "multiple": true} ?record prop:P5 entity:[EXTRA] .
      
  • Create more turnkey sparql queries, see DataViz:Stats.

Week 14

  • Setup a blazegraph sparql endpoint, using the following documentation pages: [1], [2] and [3];
prefixes.conf
PREFIX entity: <https://v2.lingualibre.fr/entity/>
PREFIX prop: <https://v2.lingualibre.fr/prop/direct/>
PREFIX statement: <https://v2.lingualibre.fr/entity/statement/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX llp: <https://v2.lingualibre.fr/prop/>
PREFIX llv: <https://v2.lingualibre.fr/prop/statement/>
PREFIX llq: <https://v2.lingualibre.fr/prop/qualifier/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>

Week 13

  • Small bugfixes on the RecordWizard;
  • Start a testing period of the RecordWizard:
  • Back to the skin:
    • Switch from timeless to Foreground, which is also responsive but lighter, closer to the visual identity we want and easier to customize;
    • Start to customize it to LinguaLibre's colors, and tweak the navigation bar.

Week 12

A big week to compensate for week 11. The RecordWizard has reached a first stable version at the end of this week. That will allow us next week to launch a first testing period on a small group.

  • create a second generator, based on lists in a new List: namespace;
  • deep reorganization of the step 2 and 3:
    • We had before a Details step, which contained both the speaker's information and the record parameters; and a License step, which contained only a license agreement;
    • This caused many issues, in term of UX (we had to disable some fields the time the user fills the previous ones), performance/code complexity (each interactions of the users needed to be tracked to update/disable some other fields) and code readability for the Details step (way too many elements in one class);
    • This has changed to have a first Locutor step, which contains only the details about the speaker and his license agreement; and a Details step, which contains the record parameters. This organization is way more balanced, logical and easy to understand for the end user;
  • Add an optional list randomizer;
  • Add a license picker, using the standard Mediawiki:Licenses as a configuration page;
  • implement the locutor profile and add a profile picker;
  • consolidate the wikibase item saving process:
    • manage secondary locutors;
    • smart update if an item already exists (for the locutor and the records);
    • factorization and cleaning of some wikibase API interactions;
  • add a placeholder for the Tutorial step;
  • Translate all messages into French;
  • Format correctly output to Wikimedia Commons (file titles and descriptions);
  • many small bug fixes during the whole week.

Week 11

Due to personal reason, the work this week was a little shortened, this has been corrected on week 12. But the work on the RecordWizard still continues:

  • r&d on the internal structure and mechanisms of wikibase;
  • continue and finish the work around data search in wikibase;
  • a wikibase item is created for each locutor and record (but it will not update yet if it already exists);
  • create some needed structural wikibase items.

Week 10

The work on the RecordWizard continues:

  • enrich the word-list structure to be able to store word-related metadatas;
  • add LocalSettings options to let sysadmin configure the RecordWizard, instead of using hardcoded values;
  • start the connexion with the wikibase by pulling some datas;
  • create a locutor profile structure and save it to the user preferences.

Week 9

The work on the RecordWizard continues:

  • UI research and implementation tests for the Details step;
  • Develop the concept of generator;
    • Technical design:
      • Generators are JS components intended to dynamically provide a list of words to record inside the RecordWizard;
      • They should inherit the generic mw.recordWizard.generator.Generator class;
      • They can be registered from anywhere, including site-wide scripts like MediaWiki:Common.js or gadgets, by adding it to the mw.recordWizard.generator object;
      • Each one can ask for user inputs inside a dialog box.
    • Implemented it inside the Details step;
    • Create a first generator to show of capacities, Nearby items, which fetch words from wikidata items geographically around the user.

Week 8

The work on the RecordWizard continues:

  • Implement a controller and a ui to the studio step;
  • Plug the UploadManager class and the new Upload2Commons extension to the studio;
  • Improve the step managment;
  • Improve the UX in particular through:
    • an auto-scroll
    • small voice amplitude graphs
    • keyboard controllable actions (start/stop with the spacebar, move to the next/previous word with the arrows,...);
  • Add a full warning and error handling;
  • See the last 24 commits on the RecordWizard repository.

Week 7

  • Start to develop the core of the RecordWizard:
    • Create an UploadManager;
    • Setup a step controller and a step ui parent class (inspired from what the UploadWizard does);
    • Initialize 6 empty steps: tutorial, details, license, studio, confirm, thanks.

Week 6

  • Develop another extension, Upload2Commons, to add a new module to the API, which allow a user to upload a local file to a remote wiki through OAuth:
Doc from Special:ApiHelp/upload-to-commons

action=upload-to-commons

(main | upload-to-commons)
  • This module requires read rights.
  • This module requires write rights.
  • This module only accepts POST requests.
  • Source: Upload2Commons
  • License: GPL-2.0+

Upload a local wiki file to a remote wiki using OAuth.

The file must already be uploaded on the local wiki. Several methods are available:

  • Upload from an on-wiki file, using the localfilename parameter.
  • Upload from a stashed file, using the filekey parameter.

Note that the filename, comment, tags, text and ignorewarnings parameters are the same than in action=upload.

Parameters:
localfilename

Name of a file to upload (without the "File:" namespace).

filekey

Filekey of a stashed file to upload.

filename

Target filename.

comment

Upload comment.

tags

Change tags to apply to the upload log entry and file page revision on the remote wiki.

Separate values with | or alternative.
Maximum number of values is 50 (500 for clients allowed higher limits).
text

Initial page text for new files. If not specified, the page text of the local file will be used instead.

ignorewarnings

Ignore any warnings on the remote wiki.

Type: boolean (details)
removeafterupload

Remove the stashed file if the upload succeeded (doesn't delete any already on-wiki files, see action=delete for that).

Type: boolean (details)
logtags

Set custom tags to the remoteupload log entry.

Values (separate with | or alternative): manual-remote-upload, record-wizard
token

A "csrf" token retrieved from action=query&meta=tokens

This parameter is required.
  • add a feature in the oauthclient-php library: T186739

Week 5

  • Many enhancements to LinguaRecorder:
    • Add new export methods (Wav-encoded blob, client-download, <audio> element,...);
    • Comment the code and document the whole library;
    • Create a sandbox to easily demonstrate all the features;
    • Test and add support to old browsers (Firefox 25+, Chrome 22+,... see the complete list here).
  • Present LinguaLibre during the monthly Wikimedia Foundation metrics and activities meetings

Week 4

Week 3

  • Setup the server with a fresh primary MediaWiki installation to use as a development environment, accessible at https://v2.lingualibre.fr.
  • Install the OAuthAuthentication extension to delegate the login to an other wiki. Here are the settings currently in use inside the LocalSettings.php file:
LocalSettings.php configuration
# Remove the default TemporaryPassword and LocalPassword authentication provider
# to let OAuth as the only authentication provider usable.
$wgAuthManagerAutoConfig['primaryauth'] = [];

# Activate the OAuthAuthentication extension
wfLoadExtension( 'OAuthAuthentication' );

$wgOAuthAuthenticationUrl = 'https://oauth.0x010c.fr/index.php?title=Special:OAuth';
$wgOAuthAuthenticationConsumerKey = '<consumer_key>';
$wgOAuthAuthenticationConsumerSecret = '<consumer_secret>';
$wgOAuthAuthenticationCanonicalUrl = 'https://oauth.0x010c.fr';
$wgOAuthAuthenticationRemoteName = 'OauthWiki';
$wgOAuthAuthenticationAllowLocalUsers = false;
$wgOAuthAuthenticationReplaceLoginLink = true;
  • Setup a secondary MediaWiki installation to be used by the first one as a remote authentication provider (to replace Wikimedia Commons during the development phase), accessible at https://oauth.0x010c.fr.
  • Install the Wikibase extension. Here is it's LocalSettings.php configuration:
LocalSettings.php configuration
# Activate the Wikibase Repository extension
$wgEnableWikibaseRepo = true;
$wgEnableWikibaseClient = false;
require_once "$IP/extensions/Wikibase/repo/Wikibase.php";
require_once "$IP/extensions/Wikibase/repo/ExampleSettings.php";

# Create a new namespace to host properties
define( 'WB_NS_PROPERTY', 102 );
define( 'WB_NS_PROPERTY_TALK', 103 );

$wgExtraNamespaces[WB_NS_PROPERTY] = 'Property';
$wgExtraNamespaces[WB_NS_PROPERTY_TALK] = 'Property_talk';

# Store the items in the main namespace, the properties in their newly created one
$wgWBRepoSettings['entityNamespaces']['item'] = NS_MAIN;
$wgWBRepoSettings['entityNamespaces']['property'] = WB_NS_PROPERTY;

# We don't need sitelinks
$wgWBRepoSettings['siteLinkGroups'] = array();

# see https://github.com/wikimedia/mediawiki-extensions-Wikibase/blob/master/docs/options.wiki
$wgWBRepoSettings['formatterUrlProperty'] = 'P20';
  • Draft the datastructure and create the first property/items needed.
  • Install the timeless skin.
  • First commit to initialize the new RecordWizard extension.
  • Split the recording studio out of the LinguaLibre repository, to it's own new repo, LinguaRecorder.

See also

Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.jsLiLiZip.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports