Help

Difference between revisions of "SPARQL (intermediate)"

Help:SPARQL 2 will explore federated queries fetching data from both LinguaguaLibre and Wikidata's endpoints, then Wikidata Lexemes, an emerging source of lexicographic data. The duo can be a solid combo to provide lexicographic and multimedia (audio recordings and images) for either Wikimedia modules or web developers.

Line 7: Line 7:
  
 
=== SPARQL to persitent data ===
 
=== SPARQL to persitent data ===
''Some SPARQL queries are meaningful but heavy and overly slow. This tool is used to store and update the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <1 second via a variation of <code>mw.loader.load('/index.php?title=MediaWiki:Mydata.js&action=raw&ctype=text/javascript');</code>.
+
''Some SPARQL queries are meaningful but heavy and overly slow. His administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <1 second via a variation of <code>mw.loader.load('/index.php?title=MediaWiki:Mydata.js&action=raw&ctype=text/javascript');</code>.
 
{{Sparql2data}}
 
{{Sparql2data}}
  

Revision as of 07:58, 18 January 2022


Draft
Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg

This page is a work in progress.

Tools

Lexemes Queries Generator


SPARQL to persitent data

Some SPARQL queries are meaningful but heavy and overly slow. His administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <1 second via a variation of mw.loader.load('/index.php?title=MediaWiki:Mydata.js&action=raw&ctype=text/javascript');.


Federate queries

  • To query Lingualibre from Wikidata, use SERVICE <https://lingualibre.org/sparql>.
  • To query Wikidata from LinguaLibre, use SERVICE <https://query.wikidata.org/sparql>.

The following query shows a simple example of retrieving data of LinguaLibre from Wikidata Query Service. It lists the existing levels in LinguaLibre.

PREFIX prop: <https://lingualibre.org/prop/direct/>
PREFIX entity: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item ?itemLabel {
      ?item prop:P2 entity:Q5.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    }
  }
}

Languages

✅ Language () → List of LL languages with wd speaker population

Lexemes

✅ Language (d:Q12107) → List of wd lexemes

Example : Q12107 breton.

✅ Language () → List of wd lexemes with LL audio

✅ Language () → List of wd lexemes with LL audio and wd translation (d:Q150)

✅ Language () → List of wd lexemes (d:Q150)

Strange query from User:VIGNERON/common.js
SELECT DISTINCT ?lexemeLabel ?lexeme
WITH {
  SELECT ?lexeme ?lexemeLabel ?lexical_category WHERE {
    ?lexeme a ontolex:LexicalEntry ;
            dct:language wd:Q12107 ; 
            wikibase:lemma ?lexemeLabel .
    OPTIONAL {
      ?lexeme wikibase:lexicalCategory ?lexical_category .
    }
  }
} AS %results
WHERE {
  INCLUDE %results
  OPTIONAL {        
    ?lexical_category rdfs:label ?lexical_categoryLabel .
    FILTER (LANG(?lexical_categoryLabel) = "en")
  }
}

Speakers

✅ Speakers → Largest number of languages recorded and known

#Title: Speakers with recordings largest number of languages and known languages
SELECT ?speaker ?speakerLabel ?count ?languages
# Get audios, language, speaker triplet
WITH {
  SELECT DISTINCT ?speaker ?language {
    ?audio prop:P4 ?language;
           prop:P5 ?speaker.
  }
} AS %speakers
# Get the count of languages per each speaker
WITH {
  SELECT ?speaker (COUNT(?speaker) AS ?count) {
    INCLUDE %speakers.
  }
  GROUP BY ?speaker
  ORDER BY DESC(?count)
} AS %countOfLanguagesRecordedPerSpeaker
# Get the maximum number of languages per each speaker
WITH {
  SELECT (MAX(?count) AS ?maxNumberOfLanguagesRecorded) {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
  }
} AS %maxNumberOfLanguagesRecorded
# Get those speakers whose count equals the maximum number of languages
WITH {
  SELECT ?speaker ?count {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
    INCLUDE %maxNumberOfLanguagesRecorded.
    FILTER(?count = ?maxNumberOfLanguagesRecorded).
  }
} AS %speakersWithMostNumberOfLanguagesRecorded
# Get the languages of those speakers that have recorded audios in the
# most number of languages
WITH {
  SELECT ?speaker (GROUP_CONCAT(?languageLabel; SEPARATOR = ", ") AS ?languages) {
    INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
    ?speaker prop:P4 [
        rdfs:label ?languageLabel
      ]
    FILTER(LANG(?languageLabel) = "en").
  }
  GROUP BY ?speaker
} AS %languagesOfSpeakersWithMostNumberOfLanguagesRecorded
{
  INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
  INCLUDE %languagesOfSpeakersWithMostNumberOfLanguagesRecorded.
  ?speaker rdfs:label ?speakerLabel.
  FILTER(LANG(?speakerLabel) = "en")
}
... Loading ...


✅ Speakers → Countries with most speakers

SELECT ?country ?continentLabel ?ISO3 ?countryLabel (COUNT(?country) AS ?count)
WITH {
  SELECT DISTINCT ?speaker {
    ?speaker prop:P2 entity:Q3;
  }
} AS %speakers
WITH {
  SELECT DISTINCT
    ?speaker
    ?country
    ?countryLabel
    ?ISO3
    ?continentLabel
  {
    INCLUDE %speakers.
    ?speaker prop:P14 ?residence.
    # Avoids weird errors.
    FILTER(REGEX(?residence, "^Q[0-9]+$"))
    BIND(IRI(CONCAT('http://www.wikidata.org/entity/', ?residence)) AS ?residenceId)
    
    # Get country from wikidata
    SERVICE <https://query.wikidata.org/sparql> {
      ?residenceId wdt:P17 ?country.
      ?country rdfs:label ?countryLabel;
               wdt:P298 ?ISO3;
               wdt:P30 ?continent.
      ?continent rdfs:label ?continentLabel.
      FILTER(LANG(?countryLabel) = "en").
      FILTER(LANG(?continentLabel) = "en").
    }
  }
} AS %speakersWithCountries
{
  INCLUDE %speakersWithCountries.
}
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
ORDER BY DESC(?count)
... Loading ...

Speakers → Map of speakers by place

PREFIX ll: <https://lingualibre.org/entity/>
PREFIX llt: <https://lingualibre.org/prop/direct/>

SELECT DISTINCT ?lLabel ?coord WITH {
  SELECT ?lLabel ?loc WHERE {
    SERVICE <https://lingualibre.org/sparql> { 
      select DISTINCT ?lLabel ?loc { 
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
        ?l llt:P2 ll:Q3 ;
           llt:P14 ?loc . 
        ?record llt:P5 ?l.   
        FILTER (regex(?loc, '^Q')) 
      } 
    }
  }
} AS %i
WHERE {
  INCLUDE %i
  BIND (URI(CONCAT("http://www.wikidata.org/entity/", ?loc)) AS ?locURL)
  SERVICE <https://query.wikidata.org/sparql> { 
    select * { 
      ?locURL wdt:P625 ?coord . 
    } 
  }
}
... Loading ...

See also

Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.jsLiLiZip.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports