
Difference between revisions of "SPARQL (intermediate)"

Help:SPARQL 2 will explore federated queries fetching data from both LinguaguaLibre and Wikidata's endpoints, then Wikidata Lexemes, an emerging source of lexicographic data. The duo can be a solid combo to provide lexicographic and multimedia (audio recordings and images) for either Wikimedia modules or web developers.

Line 14: Line 14:
* To query Wikidata from LinguaLibre, use <code><nowiki>SERVICE <></nowiki></code>.
* To query Wikidata from LinguaLibre, use <code><nowiki>SERVICE <></nowiki></code>.
== Retrieve data of LinguaLibre from Wikidata ==
=== Retrieve data of LinguaLibre from Wikidata ===
The following query shows a simple example of retrieving data of LinguaLibre from [ Wikidata Query Service]. It lists the existing levels in LinguaLibre.
The following query shows a simple example of retrieving data of LinguaLibre from [ Wikidata Query Service]. It lists the existing levels in LinguaLibre.

Revision as of 01:59, 22 January 2022

Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg

This page is a work in progress.


Lexemes Queries Generator

SPARQL to persitent data

Some SPARQL queries are meaningful but heavy and overly slow. This administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <0.1 second. Multiple data can also be merged via a common property if any.

Federated queries

  • To query Lingualibre from Wikidata, use SERVICE <>.
  • To query Wikidata from LinguaLibre, use SERVICE <>.

Retrieve data of LinguaLibre from Wikidata

The following query shows a simple example of retrieving data of LinguaLibre from Wikidata Query Service. It lists the existing levels in LinguaLibre.

PREFIX prop: <>
PREFIX entity: <>

  SERVICE <> {
      ?item prop:P2 entity:Q5.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }

Number of audios per each dead and extinct languages that exist in LinguaLibre

It obtains the list of dead and extinct languages from Wikidata. This query is expected to be run in Lingua Libre SPARQL endpoint. It shouldn't be run in the SPARQL endpoint of Wikidata or Wikidata Query Service.

  SELECT DISTINCT ?deadLanguage {
    SERVICE <> {
      { ?deadLanguage wdt:P31/wdt:P279* wd:Q45762. }
      { ?deadLanguage wdt:P31/wdt:P279* wd:Q38058796. }
} AS %deadLanguage
  SELECT ?deadLanguageLinguaLibre {
    INCLUDE %deadLanguage.

    BIND(REPLACE(STR(?deadLanguage), '.*/', '') AS ?deadLanguageQid)

      prop:P2 entity:Q4;
      prop:P12 ?deadLanguageQid.
} AS %deadLanguageLinguaLibre
    (COUNT(*) AS ?count)
    INCLUDE %deadLanguageLinguaLibre.

      prop:P2 entity:Q2;
      prop:P4 ?deadLanguageLinguaLibre.
  GROUP BY ?deadLanguageLinguaLibre
} AS %count
  INCLUDE %deadLanguageLinguaLibre.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

Notable elements

LinguaLibre endpoint Wikidata endpoint


✅ Language () → List of LL languages with wd speaker population

❌ Language (Marathi (Q34)) → Wikidata Qid(s) → Geo-coordinates

PREFIX wd: <>
PREFIX wdt: <>
PREFIX ll: <> 
PREFIX llt: <>
PREFIX lltn: <>

select distinct ?record ?transcription ?languageLabel ?wdQid ?wdQidLabel ?wdLabel ?coord
where {
  ?record llt:P2 ll:Q2 . # Filter: P2 'instance of' is Q2 'record'
  ?record llt:P4 ll:Q34 .          # Filter: record's P4 'language' is Q34 'Marathi'
  ?record llt:P4 ?language .       # Assign value: record's P4 'language' to variable ?language
  ?record llt:P7 ?transcription .  # Assign value: record's P7 'transcription' to variable ?transcription
  ?record lltn:P12 ?wdQid . # Assign value: record's P12 'wikidata id' to variable ?wikidataItem
  SERVICE <> {
    OPTIONAL { ?wdQid wdt:P625 ?coord . } # Assign value: wikidata item's wd:P625 'coordinates' to variable ?coord
      ?wdQid rdfs:label ?wdLabel . # Assign value: wikidata item's label to variable ?wikidataLabel
     FILTER (LANG(?wdLabel) = "en") . # Filter: default language, else English

  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
Result type:
record	transcription	languageLabel	wdQid	wdQidLabel	wdLabel	coord
Q196212	Tathavade	Marathi	Q2719024	Q2719024	Tathavade	Point(73.74 18.62)
Q428904	Jambavade	Marathi	Q24894740	Q24894740	Jambavade	Point(73.85 18.51)
Q428900	Dhangavhan	Marathi	Q24885008	Q24885008	Dhangavhan	Point(73.85 18.52)


✅ Language (d:Q12107) → List of wd lexemes

Example : Q12107 breton.

✅ Language () → List of wd lexemes with LL audio

✅ Language () → List of wd lexemes with LL audio and wd translation (d:Q150)

✅ Language () → List of wd lexemes (d:Q150)

Strange query from User:VIGNERON/common.js
SELECT DISTINCT ?lexemeLabel ?lexeme
  SELECT ?lexeme ?lexemeLabel ?lexical_category WHERE {
    ?lexeme a ontolex:LexicalEntry ;
            dct:language wd:Q12107 ; 
            wikibase:lemma ?lexemeLabel .
      ?lexeme wikibase:lexicalCategory ?lexical_category .
} AS %results
  INCLUDE %results
  OPTIONAL {        
    ?lexical_category rdfs:label ?lexical_categoryLabel .
    FILTER (LANG(?lexical_categoryLabel) = "en")


✅ Speakers → Largest number of languages recorded and known

#Title: Speakers with recordings largest number of languages and known languages
SELECT ?speaker ?speakerLabel ?count ?languages
# Get audios, language, speaker triplet
  SELECT DISTINCT ?speaker ?language {
    ?audio prop:P4 ?language;
           prop:P5 ?speaker.
} AS %speakers
# Get the count of languages per each speaker
  SELECT ?speaker (COUNT(?speaker) AS ?count) {
    INCLUDE %speakers.
  GROUP BY ?speaker
  ORDER BY DESC(?count)
} AS %countOfLanguagesRecordedPerSpeaker
# Get the maximum number of languages per each speaker
  SELECT (MAX(?count) AS ?maxNumberOfLanguagesRecorded) {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
} AS %maxNumberOfLanguagesRecorded
# Get those speakers whose count equals the maximum number of languages
  SELECT ?speaker ?count {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
    INCLUDE %maxNumberOfLanguagesRecorded.
    FILTER(?count = ?maxNumberOfLanguagesRecorded).
} AS %speakersWithMostNumberOfLanguagesRecorded
# Get the languages of those speakers that have recorded audios in the
# most number of languages
  SELECT ?speaker (GROUP_CONCAT(?languageLabel; SEPARATOR = ", ") AS ?languages) {
    INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
    ?speaker prop:P4 [
        rdfs:label ?languageLabel
    FILTER(LANG(?languageLabel) = "en").
  GROUP BY ?speaker
} AS %languagesOfSpeakersWithMostNumberOfLanguagesRecorded
  INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
  INCLUDE %languagesOfSpeakersWithMostNumberOfLanguagesRecorded.
  ?speaker rdfs:label ?speakerLabel.
  FILTER(LANG(?speakerLabel) = "en")
... Loading ...

✅ Speakers → Countries with most speakers

SELECT ?country ?continentLabel ?ISO3 ?countryLabel (COUNT(?country) AS ?count)
  SELECT DISTINCT ?speaker {
    ?speaker prop:P2 entity:Q3;
} AS %speakers
    INCLUDE %speakers.
    ?speaker prop:P14 ?residence.
    # Avoids weird errors.
    FILTER(REGEX(?residence, "^Q[0-9]+$"))
    BIND(IRI(CONCAT('', ?residence)) AS ?residenceId)
    # Get country from wikidata
    SERVICE <> {
      ?residenceId wdt:P17 ?country.
      ?country rdfs:label ?countryLabel;
               wdt:P298 ?ISO3;
               wdt:P30 ?continent.
      ?continent rdfs:label ?continentLabel.
      FILTER(LANG(?countryLabel) = "en").
      FILTER(LANG(?continentLabel) = "en").
} AS %speakersWithCountries
  INCLUDE %speakersWithCountries.
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
... Loading ...

Speakers → Map of speakers by place

PREFIX ll: <>
PREFIX llt: <>

  SELECT ?lLabel ?loc WHERE {
    SERVICE <> { 
      select DISTINCT ?lLabel ?loc { 
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
        ?l llt:P2 ll:Q3 ;
           llt:P14 ?loc . 
        ?record llt:P5 ?l.   
        FILTER (regex(?loc, '^Q')) 
} AS %i
  BIND (URI(CONCAT("", ?loc)) AS ?locURL)
  SERVICE <> { 
    select * { 
      ?locURL wdt:P625 ?coord . 
... Loading ...

See also

Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.jsLiLiZip.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports