Help
Difference between revisions of "SPARQL"
Help:SPARQL gather a list of basic SPARQL queries in the context of Lingua Libre, demoed and ready to test, together with beginners-friendly knowledges, inline-comments, introductions to concepts, code snippets and few tools. This page allows users not familiar with SPARQL to rapidly learn the basics of SPARQL, query the LinguaLibre database, and to download or directly feed that data to an application. To fit the with most frequent usages, the case of a web developper with basic Javascripts skill is taken.
Line 289: | Line 289: | ||
== Languages == | == Languages == | ||
− | === ?✅ Language name(s) in English → Language Qid(s) === | + | === ?✅ Language's name(s) in English → Language Qid(s) === |
<!-- Q: Change exact match by CONTAINS() --> | <!-- Q: Change exact match by CONTAINS() --> | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 309: | Line 309: | ||
WHERE { | WHERE { | ||
VALUES ?languageName { "Marathi" "Breton" "Atikamekw" "Central Bikol" } # One or multiple values | VALUES ?languageName { "Marathi" "Breton" "Atikamekw" "Central Bikol" } # One or multiple values | ||
− | BIND ( STRLANG(?languageName, "en") AS ?languageLabel ) | + | ?languageId |
− | + | prop:P2 entity:Q4 ; # Filter: P2 'instance of' is Q4 'language' AND | |
− | + | rdfs:label ?languageLabel . # Assign value label into ?languageLabel | |
+ | BIND ( STRLANG(?languageName, "en") AS ?languageLabel ) # Bind filter by English | ||
} | } | ||
</query> | </query> |
Revision as of 20:54, 9 December 2021
Base
- – paste the SPARQL queries there to run, test them, and download the data as json, csv or tsv.
- Special:ListProperties – exhaustive list of LinguaLibre's Wikibase properties.
- LinguaLibre:List of languages – exhaustive list of LinguaLibre's languages
- DataViz:Speakers
- DataViz:Records
Code snippets
Fetch data using SPARQL
LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.
- For code snippet in your language : open query.wikidata.org (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
- For downloading data, click "Download".
Javascript:
At least 3 methods exists (code snippet), example:
Query | Result's basic unit |
---|---|
SPARQL:SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10
|
{ … },
{
"item": {
"type": "uri",
"value": "https://lingualibre.org/entity/Q12"
},
"itemLabel": {
"xml:lang": "en",
"type": "literal",
"value": "beginner"
}
},
{ … }
|
Javascript:
var endpoint = 'https://lingualibre.org/sparql';
var sparql = 'SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10';
$.getJSON(endpoint,
{ query: sparql, format: 'json' },
function(data){ console.log('JQuery: ',data)}
);
|
Merging data
Advanced SPARQL queries with COUNT()
and others are often slow (>3secs, sometime >100secs). You are encouraged to do multiple smaller SPARQL queries to then merge their responded data. By example, the complementary Javascript snippet below would help web developers to do so.
// Data from 3 sparql queries.
// Important: One key must be similar in all datasets, here: 'qid'
const langs = [{ qid: 'Q209', label: 'Breton', iso:'bre' }, { qid: 'Q21', label: 'French', iso: 'fra' }],
speakersFemales = [{ qid: 'Q209', genderF: 3, recordsF: 60 }, { qid: 'Q21', genderF: 21, recordsF:15046 }],
speakersMales = [{ qid: 'Q209', genderM: 7, recordsM: 112 }, { qid: 'Q21', genderM: 85, recordsM:82964 }];
// Toolbox for merging data by same id
var merge2ArraysBySameId = function(arr1,arr2,id1){
return arr1.map( item1 => {
var identical = arr2.find(obj => obj[id1] === item1[id1]);
return Object.assign(identical, item1)
} );
}
// Mergings
var step1 = merge2ArraysBySameId(langs,speakersFemales,'qid');
var step2 = merge2ArraysBySameId(step1,speakersMales,'qid');
alert(JSON.stringify(step2))
Lingualibre's ground
✅ Is Language (language/dialect (Q4)) → List existing languages
SELECT ?lang ?iso ?langLabel WHERE {
?lang prop:P2 entity:Q4 .
?lang prop:P13 ?iso .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅🇶 Is Speaker (speaker (Q3)) → List existing speakers
SELECT ?speaker ?speakerLabel
WHERE {
?speaker prop:P2 entity:Q3 . # Condition 1, P2 'instance of' is Q3 'speaker'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Is Language level (language level (Q5)) → List existing levels
SELECT ?item ?itemLabel
WHERE {
?item prop:P2 entity:Q5 # Condition 1, P2 'instance of' is Q5 'language level'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Is Sex or Gender (sex or gender (Q7)) → List existing sexes or genders
SELECT ?item ?itemLabel
WHERE {
?item prop:P2 entity:Q7 # Condition 1, P2 'instance of' is Q7 'sex or gender'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
Speaker
✅ Speaker name(s) → Speaker Qid(s)
SELECT ?speakerName ?speakerId
WHERE {
VALUES ?speakerName { "Yug" "VIGNERON" } # One or multiple values
BIND ( STRLANG(?speakerName, "en") AS ?speakerLabel )
# P2: instance of; Q3: speaker.
?speakerId prop:P2 entity:Q3 ; rdfs:label ?speakerLabel .
}
|
|
✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data, all
# Get Q42 (User:0x010C)'s data
SELECT ?predicate ?object ?objectLabel
WHERE {
entity:Q42 ?predicate ?object .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅🇶 Speaker Qid (0x010C (Q42)) → Speaker languages (P4)
SELECT ?languages ?languagesLabel
WHERE {
entity:Q42 prop:P4 ?languages .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Speaker Qid + language Qid → List associated audios
SELECT ?audio ?audioLabel
WHERE {
?audio prop:P5 entity:Q42 . # Condition 1, P5 Speaker is Q42 User:0x010C
?audio prop:P4 entity:Q21 . # Condition 2, P4 language is Q21 French
# Labels
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
Languages
?✅ Language's name(s) in English → Language Qid(s)
SELECT ?languageId ?languageName
WHERE {
VALUES ?languageName { "Marathi" "Breton" "Atikamekw" "Central Bikol" } # One or multiple values
BIND ( STRLANG(?languageName, "en") AS ?languageLabel )
# P2: instance of; Q4: language.
?languageId prop:P2 entity:Q4 ; rdfs:label ?languageLabel .
}
|
|
✅ Language LL Qid (Q21) → Count items
SELECT ?language (COUNT(?audio) AS ?nbAudio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P4 ?language .
}
GROUP BY ?language
|
|
✅ Language LL Qid (Q21) → Count records
SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P2 entity:Q2 . # P2 'instance of' is Q2 'record'
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
}
GROUP BY ?language
|
|
?✅ Language LL Qid (Q21) → Count unique words
SELECT ?language (COUNT(?audio) AS ?Audios) (COUNT(DISTINCT(?itemLabel)) AS ?Words)
(ROUND(10000*?Words/?Audios)/100 AS ?Percent)
WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P2 entity:Q2 . # P2 'instance of' is Q2 'record'
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
?audio rdfs:label ?itemLabel. # Assign value: label to ?itemLabel
}
GROUP BY ?language
|
|
✅ Language LL Qid (Q21) → Count speakers
SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P2 entity:Q3 . # P2 'instance of' is Q3 'speaker'
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
}
GROUP BY ?language
|
|
✅ Language LL Qid (Q209) → List speakers
SELECT ?language ?speaker ?speakerLabel WHERE {
VALUES ?language { entity:Q209 }
?speaker prop:P2 entity:Q3 . # P2 'instance of' is Q3 'speaker'
?speaker prop:P4 ?language . # P4 'language' is Q21 'French'
# Labels
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records
SELECT ?language ?speakerLabel (COUNT(?audio) AS ?audio)
WHERE {
VALUES ?language { entity:Q21 }
VALUES ?speaker { entity:Q42 }
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
?audio prop:P2 entity:Q2 . # P2 'instance of' is Q2 'record'
?audio prop:P5 ?speaker . # P5 'speaker' is Q42 '0x010C'
# Labels
SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"}
}
GROUP BY ?language ?speakerLabel
|
|
✅ Languages → List existing languages' iso-639-3
SELECT * WHERE {
?lang prop:P13 ?code .
}
|
|
✅ Language's ISO-639-3 → Language LL Qid
SELECT ?langIso ?langId
WHERE {
VALUES ?langIso { "ban" "bre" } # One or multiple values
# P2 'instance of'; Q4 'language'; P13 'ISO 639-3 code'
?langId prop:P2 entity:Q4 ; prop:P13 ?langIso .
}
|
|
✅ Language's ISO-639-3 → Language Wikidata Qid
SELECT * # print all variables, synonymous to ?langId ?langIso ?langWDQid
WHERE {
VALUES ?langIso { "ban" "bre" } # One or multiple values
?langId
prop:P2 entity:Q4 ; # Filter P2 'instance of' is Q4 'language' AND
prop:P13 ?langIso ; # Assign value: P13 'Iso-639-3' to ?langIso AND
prop:P12 ?langWDQid . # Assign value: P12 'Iso-639-3' to ?langWDQid
}
|
|
✅ Language LL Qid (Breton (Q209)) → Language data, all
'Case: Get for language Q209 'Breton' all its data.
SELECT * WHERE {
# Given Q209 'Breton language', get all properties and values
entity:Q209 ?predicate ?object .
}
|
|
✅ Language LL Qid (Breton (Q209)) → Language data, core
'Case: Get for language Q209 'Breton' all its CORE data.
SELECT * WHERE {
# Given Q209 'Breton language', get all properties and values
entity:Q209 ?predicate ?object .
?predicate rdf:type owl:DatatypeProperty .
}
|
|
✅ Language WD Qid → Language data, core
SELECT * WHERE {
?lang prop:P12 "Q12107" . # P12 'Wikidata id' is Wikidata's "Q12107"
?lang ?predicate ?object . #
?predicate rdf:type owl:DatatypeProperty .
}
|
|
Records
✅ Item name + language → Qid(s)
SELECT ?itemLabel ?item
WHERE {
?item prop:P2 entity:Q2 . # Filter: P2 'instance of' Q3 'record'
?item rdfs:label ?itemLabel. # Assign value: label to ?itemLabel
FILTER(CONTAINS(?itemLabel, "apple"@en)).
} limit 10
|
|
Audio Qid → Audio data
✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid
Case: Search in Breton language, with speaker 'ThonyVezbe',
SELECT ?audio
WHERE {
?audio prop:P4 entity:Q209 . # P4 'language' is Q209 'Breton'
?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
?audio rdfs:label ?word . #word
FILTER ( STR(?word) = "ni" ) # word = 'ni'
}
|
|
Langue + speaker + word → Audio's Commons url pointeur (P3)
SELECT ?word ?audio ?url (STR(?url) AS ?urlStr)
WHERE {
?audio prop:P4 entity:Q209 . # P4 'language' is Q209 'Breton'
?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
?audio rdfs:label ?word . # word
FILTER ( STR(?word) = "achantour" ) # Filter: word is 'achantour'
?audio prop:P3 ?url
}
|
|
Heavy queries
Queries below are too large to run on LinguaLibre's wikipages, or even on Lingualibre Query Service).
To do: do smaller sub-queries, with one COUNT()
function.
❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders
Query | Result |
---|---|
SELECT ?languageQidLabel ?wdQid ?languageQid ?isoCode
(COUNT(DISTINCT(?record)) AS ?recordCount)
(COUNT(DISTINCT(?speakerLangM)) AS ?speakerM)
(COUNT(DISTINCT(?speakerLangF)) AS ?speakerF)
wWHERE{
?record prop:P2 entity:Q2 . # Filter: items where P2 'instance of' is Q2 'record'
?record prop:P4 ?languageQid . # Assign value: P4 'language' into variable ?language
?languageQid prop:P12 ?wdQid . # Assign value: P12 'wikidata id' into variable ?WD
?languageQid prop:P13 ?isoCode. # Assign value: P13 'iso639-3' into ?isoCode
#?record prop:P5 ?speakerQidM . # Assign value: P5 'speaker' into variable ?speakerQidM
#?speakerQidM prop:P8 entity:Q16 . # Filter: P8 'sex or gender' is Q16 'male
#?speakerQidM prop:P4 ?speakerLangM . # Assign value: P4 'language' into variable ?spakerLangM
?record prop:P5 ?speakerQidF . # Assign value: P5 'speaker' into variable ?speakerQidF
?speakerQidF prop:P8 entity:Q17 . # Filter: P8 'sex or gender' is Q17 'female
?speakerQidF prop:P4 ?speakerLangF . # Assign value: P4 'language' into variable ?spakerLangF
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?languageQidLabel ?languageQid ?wdQid ?isoCode
ORDER BY DESC(?recordCount)
|
languageQidLabel wdQid languageQid isoCode recordCount speakerM speakerF French Q150 Q21 fra 16761 0 18 Marathi Q1571 Q34 mar 13153 0 5 Polish Q809 Q298 pol 11686 0 1 … |
❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers
SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
?language prop:P2 entity:Q4 .
?audio prop:P4 ?language .
?speaker prop:P4 ?language .
}
GROUP BY ?language
Tools
- LinguaLibre Query Service – run SPARQL Queries to LinguaLibre here.
- Special:ApiSandbox – API queries generator for Lingualibre wikipage and wikibase contents.
Wikidata lexemes
- You may [<tvar|hackme>https://jsfiddle.net/hugolpz/rygo9s5b/</> Hack me !]
It is possible to use Wikidata & Wiktionary to extract lexicographical information. Developer <tvar|sina_ahm>@sina_ahm</> created a SPARQL query generator, helping us to search words in both Wikidata & Dbnary. See demo there : <tvar|1>https://sinaahmadi.github.io/posts/sparql-query-generator-for-lexicographical-data.html</>