Help
Difference between revisions of "SPARQL"
Line 5: | Line 5: | ||
* [[DataViz:Records]] | * [[DataViz:Records]] | ||
− | == Fetch SPARQL | + | == Fetch data using SPARQL == |
− | + | LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats. | |
+ | * For code snippet in your language : open [https://query.wikidata.org query.wikidata.org] (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations. | ||
+ | * For downloading data, click "Download". | ||
'''Javascript:'''<br> | '''Javascript:'''<br> | ||
Line 46: | Line 48: | ||
|} | |} | ||
− | == ✅ Is Language level ([[Q5]]) → list possible values == | + | == Lingualibre descriptors == |
+ | === ✅ Is Language level ([[Q5]]) → list possible values === | ||
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 71: | Line 74: | ||
|} | |} | ||
− | == ✅ Is Sex or Gender([[Q7]]) → list possible values == | + | === ✅ Is Sex or Gender ([[Q7]]) → list possible values === |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 96: | Line 99: | ||
|} | |} | ||
− | == ✅🇶 Is Speaker ([[Q3]]) → list | + | === ✅🇶 Is Speaker ([[Q3]]) → list possible speakers === |
<!-- Q: add grouping per language ?--> | <!-- Q: add grouping per language ?--> | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 122: | Line 125: | ||
|} | |} | ||
− | + | == Speaker == | |
− | == | + | === ✅ Speaker name(s) → Speaker Qid(s) === |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | == ✅ Speaker name(s) → Speaker Qid(s) == | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 174: | Line 153: | ||
|} | |} | ||
− | == ✅🇶 Speaker Qid ([[Q42]]) → Speaker data == | + | === ✅🇶 Speaker Qid ([[Q42]]) → Speaker data === |
<!-- Q: alternative words for "predicate" and "object". "property" and "value" ?--> | <!-- Q: alternative words for "predicate" and "object". "property" and "value" ?--> | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 201: | Line 180: | ||
|} | |} | ||
− | == ✅🇶 Speaker Qid ([[Q42]]) → Speaker data → Speaker languages ([[Property:P4|P4]]) == | + | === ✅🇶 Speaker Qid ([[Q42]]) → Speaker data → Speaker languages ([[Property:P4|P4]]) === |
<!-- Q: Add languages iso P:13 --> | <!-- Q: Add languages iso P:13 --> | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 227: | Line 206: | ||
|} | |} | ||
− | == ✅ Speaker Qid + language → list of all associated audios == | + | === ✅ Speaker Qid + language → list of all associated audios === |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 256: | Line 235: | ||
|} | |} | ||
− | == | + | == Languages == |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | === ✅ Language LL Qid (Q21) → Count items === | |
− | |||
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 290: | Line 257: | ||
</query> | </query> | ||
|} | |} | ||
− | + | === ✅ Language LL Qid (Q21) → Count records === | |
− | |||
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 314: | Line 280: | ||
|} | |} | ||
− | + | === <!--✅-->Language LL Qid (Q21) → Count unique words === | |
− | + | === ✅ Language LL Qid (Q21) → Count speakers === | |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 339: | Line 305: | ||
|} | |} | ||
− | + | === ✅ Language LL Qid (Q209) → List speakers === | |
− | |||
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 368: | Line 333: | ||
|} | |} | ||
− | + | === ✅ Language LL Qid ([[Q21]]) + Speaker ([[Q42]]) → Count records === | |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 405: | Line 370: | ||
|} | |} | ||
− | == Isolang → Language LL Qid == | + | === Isolang → Language LL Qid === |
{| style="width:100%" | {| style="width:100%" | ||
Line 423: | Line 388: | ||
|} | |} | ||
− | == ✅ Isolang → Language WD Qid == | + | === ✅ Isolang → Language WD Qid === |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 445: | Line 410: | ||
|} | |} | ||
− | == ✅ Language WD Qid → Language data == | + | === ✅ Language WD Qid → Language data, all === |
{| style="width:100%" | {| style="width:100%" | ||
|- style="vertical-align:top;" | |- style="vertical-align:top;" | ||
Line 464: | Line 429: | ||
|} | |} | ||
− | == ✅ Language LL Qid ([[Q209]]) → Language data == | + | === ✅ Language LL Qid ([[Q209]]) → Language data, all === |
'''Case:'' Get for language Q209 'Breton' all its data. | '''Case:'' Get for language Q209 'Breton' all its data. | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 483: | Line 448: | ||
|} | |} | ||
− | == ✅ Language LL Qid ([[Q209]]) → | + | === ✅ Language LL Qid ([[Q209]]) → Language data, core === |
'''Case:'' Get for language Q209 'Breton' all its CORE data. | '''Case:'' Get for language Q209 'Breton' all its CORE data. | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 504: | Line 469: | ||
|} | |} | ||
− | == ✅ Language ([[Q209]]) + speaker ([[Q584098]]) + word (ni) → Audio's Qid == | + | === ✅ Language ([[Q209]]) + speaker ([[Q584098]]) + word (ni) → Audio's Qid === |
'''Case:''' Search in Breton language, with speaker 'ThonyVezbe', | '''Case:''' Search in Breton language, with speaker 'ThonyVezbe', | ||
{| style="width:100%" | {| style="width:100%" | ||
Line 530: | Line 495: | ||
|} | |} | ||
− | == Audio Qid → Audio data == | + | == Records == |
− | == ✅ Langue + speaker + word → Audio's Commons url == | + | === ✅ Item name → Qid(s) === |
+ | {| style="width:100%" | ||
+ | |- style="vertical-align:top;" | ||
+ | |style="padding: 0 3em;width:60%"| | ||
+ | <syntaxhighlight lang="sparql"> | ||
+ | SELECT ?item ?itemLabel | ||
+ | WHERE { | ||
+ | ?item rdfs:label ?itemLabel. | ||
+ | FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)). | ||
+ | } limit 10 | ||
+ | </syntaxhighlight> | ||
+ | || | ||
+ | <query _pagination="5"> | ||
+ | SELECT ?item ?itemLabel | ||
+ | WHERE { | ||
+ | ?item rdfs:label ?itemLabel. | ||
+ | FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)). | ||
+ | } limit 10 | ||
+ | </query> | ||
+ | |} | ||
+ | === Audio Qid → Audio data === | ||
+ | |||
+ | === <!--✅--> Langue + speaker + word → Audio's Commons url === | ||
− | == Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders == | + | == Heavy queries == |
+ | === ❌ Is Language ([[Q3]]) → list all languages with number of unique words and speakers === | ||
+ | Too large to run (not even on [https://lingualibre.org/bigdata/#query Lingualibre Query]). | ||
+ | <syntaxhighlight lang="sparql"> | ||
+ | SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE { | ||
+ | ?language prop:P2 entity:Q4 . | ||
+ | ?audio prop:P4 ?language . | ||
+ | ?speaker prop:P4 ?language . | ||
+ | } | ||
+ | GROUP BY ?language | ||
+ | </syntaxhighlight> | ||
+ | |||
+ | To do: do smaller sub-queries. For now, works only for one counter and one language at a time: | ||
+ | === ❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders === | ||
{| style="width:100%" | {| style="width:100%" | ||
|- | |- |
Revision as of 16:23, 8 December 2021
Base
Fetch data using SPARQL
LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.
- For code snippet in your language : open query.wikidata.org (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
- For downloading data, click "Download".
Javascript:
At least 3 methods exists (code snippet), example:
Query | Result's basic unit |
---|---|
SPARQL:SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10
|
{ … },
{
"item": {
"type": "uri",
"value": "https://lingualibre.org/entity/Q12"
},
"itemLabel": {
"xml:lang": "en",
"type": "literal",
"value": "beginner"
}
},
{ … }
|
Javascript:
var endpoint = 'https://lingualibre.org/sparql';
var sparql = 'SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10';
$.getJSON(endpoint,
{ query: sparql, format: 'json' },
function(data){ console.log('JQuery: ',data)}
);
|
Lingualibre descriptors
✅ Is Language level (language level (Q5)) → list possible values
SELECT ?item ?itemLabel
WHERE {
?item prop:P2 entity:Q5 # Condition 1, P2 'instance of' is Q5 'language level'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Is Sex or Gender (sex or gender (Q7)) → list possible values
SELECT ?item ?itemLabel
WHERE {
?item prop:P2 entity:Q7 # Condition 1, P2 'instance of' is Q7 'sex or gender'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅🇶 Is Speaker (speaker (Q3)) → list possible speakers
SELECT ?speaker ?speakerLabel
WHERE {
?speaker prop:P2 entity:Q3 . # Condition 1, P2 'instance of' is Q3 'speaker'.
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
Speaker
✅ Speaker name(s) → Speaker Qid(s)
SELECT ?speakerName ?speakerId
WHERE {
VALUES ?speakerName { "Yug" "VIGNERON" } # One or multiple values
BIND ( STRLANG(?speakerName, "en") AS ?speakerLabel )
# P2: instance of; Q3: speaker.
?speakerId prop:P2 entity:Q3 ; rdfs:label ?speakerLabel .
}
|
|
✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data
# Get Q42 (User:0x010C)'s data
SELECT ?predicate ?object ?objectLabel
WHERE {
entity:Q42 ?predicate ?object .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data → Speaker languages (P4)
SELECT ?languages ?languagesLabel
WHERE {
entity:Q42 prop:P4 ?languages .
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Speaker Qid + language → list of all associated audios
SELECT ?audio ?audioLabel
WHERE {
?audio prop:P5 entity:Q42 . # Condition 1, P5 Speaker is Q42 User:0x010C
?audio prop:P4 entity:Q21 . # Condition 2, P4 language is Q21 French
# Labels
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
Languages
✅ Language LL Qid (Q21) → Count items
SELECT ?language (COUNT(?audio) AS ?nbAudio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P4 ?language .
}
GROUP BY ?language
|
|
✅ Language LL Qid (Q21) → Count records
SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P2 entity:Q2 . # P2 'instance of' is Q2 'record'
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
}
GROUP BY ?language
|
|
Language LL Qid (Q21) → Count unique words
✅ Language LL Qid (Q21) → Count speakers
SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
VALUES ?language { entity:Q21 }
?audio prop:P2 entity:Q3 . # P2 'instance of' is Q3 'speaker'
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
}
GROUP BY ?language
|
|
✅ Language LL Qid (Q209) → List speakers
SELECT ?language ?speaker ?speakerLabel WHERE {
VALUES ?language { entity:Q209 }
?speaker prop:P2 entity:Q3 . # P2 'instance of' is Q3 'speaker'
?speaker prop:P4 ?language . # P4 'language' is Q21 'French'
# Labels
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
|
✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records
SELECT ?language ?speakerLabel (COUNT(?audio) AS ?audio)
WHERE {
VALUES ?language { entity:Q21 }
VALUES ?speaker { entity:Q42 }
?audio prop:P4 ?language . # P4 'language' is Q21 'French'
?audio prop:P2 entity:Q2 . # P2 'instance of' is Q2 'record'
?audio prop:P5 ?speaker . # P5 'speaker' is Q42 '0x010C'
# Labels
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
GROUP BY ?language ?speakerLabel
|
|
Isolang → Language LL Qid
SELECT * WHERE {
?lang prop:P13 ?code .
}
|
|
✅ Isolang → Language WD Qid
SELECT ?langIso ?langId
WHERE {
VALUES ?langIso { "ban" "bre" } # One or multiple values
# P2 'instance of'; Q4 'language'; P13 'ISO 639-3 code'
?langId prop:P2 entity:Q4 ; prop:P13 ?langIso .
}
|
|
✅ Language WD Qid → Language data, all
SELECT * WHERE {
?lang prop:P12 "Q12107" . # P12 'Wikidata id' is Wikidata's "Q12107"
?lang ?predicate ?object . #
}
|
|
✅ Language LL Qid (Breton (Q209)) → Language data, all
'Case: Get for language Q209 'Breton' all its data.
SELECT * WHERE {
# Given Q209 'Breton language', get all properties and values
entity:Q209 ?predicate ?object .
}
|
|
✅ Language LL Qid (Breton (Q209)) → Language data, core
'Case: Get for language Q209 'Breton' all its CORE data.
SELECT * WHERE {
# Given Q209 'Breton language', get all properties and values
entity:Q209 ?predicate ?object .
?predicate rdf:type owl:DatatypeProperty .
}
|
|
✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid
Case: Search in Breton language, with speaker 'ThonyVezbe',
SELECT ?audio
WHERE {
?audio prop:P4 entity:Q209 . # P4 'language' is Q209 'Breton'
?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
?audio rdfs:label ?word . #word
FILTER ( STR(?word) = "ni" ) # word = 'ni'
}
|
|
Records
✅ Item name → Qid(s)
SELECT ?item ?itemLabel
WHERE {
?item rdfs:label ?itemLabel.
FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)).
} limit 10
|
|
Audio Qid → Audio data
Langue + speaker + word → Audio's Commons url
Heavy queries
❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers
Too large to run (not even on Lingualibre Query).
SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
?language prop:P2 entity:Q4 .
?audio prop:P4 ?language .
?speaker prop:P4 ?language .
}
GROUP BY ?language
To do: do smaller sub-queries. For now, works only for one counter and one language at a time:
❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders
Query | Result |
---|---|
SELECT ?languageQidLabel ?wdQid ?languageQid ?isoCode
(COUNT(DISTINCT(?record)) AS ?recordCount)
(COUNT(DISTINCT(?speakerLangM)) AS ?speakerM)
(COUNT(DISTINCT(?speakerLangF)) AS ?speakerF)
wWHERE{
?record prop:P2 entity:Q2 . # Filter: items where P2 'instance of' is Q2 'record'
?record prop:P4 ?languageQid . # Assign value: P4 'language' into variable ?language
?languageQid prop:P12 ?wdQid . # Assign value: P12 'wikidata id' into variable ?WD
?languageQid prop:P13 ?isoCode. # Assign value: P13 'iso639-3' into ?isoCode
#?record prop:P5 ?speakerQidM . # Assign value: P5 'speaker' into variable ?speakerQidM
#?speakerQidM prop:P8 entity:Q16 . # Filter: P8 'sex or gender' is Q16 'male
#?speakerQidM prop:P4 ?speakerLangM . # Assign value: P4 'language' into variable ?spakerLangM
?record prop:P5 ?speakerQidF . # Assign value: P5 'speaker' into variable ?speakerQidF
?speakerQidF prop:P8 entity:Q17 . # Filter: P8 'sex or gender' is Q17 'female
?speakerQidF prop:P4 ?speakerLangF . # Assign value: P4 'language' into variable ?spakerLangF
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
}
GROUP BY ?languageQidLabel ?languageQid ?wdQid ?isoCode
ORDER BY DESC(?recordCount)
|
languageQidLabel wdQid languageQid isoCode recordCount speakerM speakerF French Q150 Q21 fra 16761 0 18 Marathi Q1571 Q34 mar 13153 0 5 Polish Q809 Q298 pol 11686 0 1 … |
Tools
- Special:ApiSandbox – API queries generator for Lingualibre wikipage and wikibase contents.