Help:SPARQL gathers a list of basic SPARQL queries in the context of Lingua Libre, demoed and ready to test, together with beginners-friendly knowledges, inline-comments, introductions to concepts, code snippets and few tools. This page allows users not familiar with SPARQL to rapidly learn the basics of SPARQL, query the LinguaLibre database, and to download or directly feed that data to an application. To fit the with most frequent usages, the case of a web developper with basic Javascripts skill is taken.
Draft
December 2021 rewriting : work in progress, please do not translate yet.
Done Gather SPARQL queries related to: core, speakers, languages, audios.
NOW: Improve Base section with core SPARQL concepts.
NOW: General review, volunteers wanted. You may help by: a) reading and copy-editing the page's English, b) testing queries on LLQS, edit in or discuss improvements, 3) harmonising in-line comments.
NOW: De-Westernization, replacing Q21 (French) and Q42 (User:0x010C) by a smaller non-western languages and users.
EndpointWikidata Query Service (WDQS) – run SPARQL Queries upon Wikidata. Run, test, download the data as json, csv or tsv. Has advanced user-friendly features such as : word hovering too see a term's meaning, code optimization, etc.
LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.
For code snippet in your language : open query.wikidata.org (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
For downloading data, click "Download".
Javascript:
At least 3 methods exists (code snippet), example:
Advanced SPARQL queries with COUNT() and others are often slow (>3secs, sometime >100secs). You are encouraged to do multiple smaller SPARQL queries to then merge their responded data. By example, the complementary Javascript snippet below would help web developers to do so.
// Data from 3 sparql queries.// Important: One key must be similar in all datasets, here: 'qid'constlangs=[{qid:'Q209',label:'Breton',iso:'bre'},{qid:'Q21',label:'French',iso:'fra'}],speakersFemales=[{qid:'Q209',genderF:3,recordsF:60},{qid:'Q21',genderF:21,recordsF:15046}],speakersMales=[{qid:'Q209',genderM:7,recordsM:112},{qid:'Q21',genderM:85,recordsM:82964}];// Toolbox for merging data by same idvarmerge2ArraysBySameId=function(arr1,arr2,id1){returnarr1.map(item1=>{varidentical=arr2.find(obj=>obj[id1]===item1[id1]);returnObject.assign(identical,item1)});}// Mergingsvarstep1=merge2ArraysBySameId(langs,speakersFemales,'qid');varstep2=merge2ArraysBySameId(step1,speakersMales,'qid');alert(JSON.stringify(step2))
Lingualibre's ground
✅ Is Language (Q4) → List existing languages with: LL Qid, ISO 639-3, Name
SELECT?item?itemLabelWHERE{?itemprop:P2entity:Q5# Condition 1, P2 'instance of' is Q5 'language level'.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}
... Loading ...
✅ Is Sex or Gender (Q7) → List existing sexes or genders
SELECT?item?itemLabelWHERE{?itemprop:P2entity:Q7# Condition 1, P2 'instance of' is Q7 'sex or gender'.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}
... Loading ...
Speaker
✅ Speaker name(s) → Speaker Qid(s)
SELECT?speakerName?speakerIdWHERE{VALUES?speakerName{"Yug""VIGNERON"}# One or multiple valuesBIND(STRLANG(?speakerName,"en")AS?speakerLabel)# P2: instance of; Q3: speaker.?speakerIdprop:P2entity:Q3;rdfs:label?speakerLabel.}
# Get Q42 (User:0x010C)'s dataSELECT?predicate?object?objectLabelWHERE{entity:Q42?predicate?object.SERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}
✅ Speaker Qid (Q42) + Language LL Qid (Q21) → List records
SELECT?audio?audioLabelWHERE{?audioprop:P5entity:Q42.# Filter: P5 Speaker is Q42 User:0x010C?audioprop:P4entity:Q21.# Filter: P4 language is Q21 French# LabelsSERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}
... Loading ...
✅ Speaker Qid (Q42) + Language LL Qid (Q21) → Count records
SELECT?language?speakerLabel(COUNT(?audio)AS?audio)WHERE{VALUES?language{entity:Q21}VALUES?speaker{entity:Q42}?audioprop:P5?speaker.# Filter: P5 'speaker' is Q42 '0x010C'?audioprop:P4?language.# Filter: P4 'language' is Q21 'French'?audioprop:P2entity:Q2.# Filter: P2 'instance of' is Q2 'record'# LabelsSERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en"}}GROUP BY?language?speakerLabel
... Loading ...
Languages
?✅ Language name(s) in English → Language LL Qid(s)
SELECT?languageId?languageNameWHERE{VALUES?languageName{"Marathi""Atikamekw""Central Bikol"}# Target values?languageIdprop:P2entity:Q4;# Filter: P2 'instance of' is Q4 'language' ANDrdfs:label?languageLabel.# Assign value label into ?languageLabelBIND(STRLANG(?languageName,"en")AS?languageLabel)# Bind filter by English}
... Loading ...
✅ Language ISO-639-3 → Language LL Qid(s), Wikidata Qid, Label
SELECT?langIso?langId?langWDQid?langIdLabelWHERE{VALUES?langIso{"mar""bre""bcl""atj""ban"}# Target ISO values?langIdprop:P2entity:Q4;# Filter P2 'instance of' is Q4 'language' ANDprop:P13?langIso;# Assign value: P13 'Iso-639-3' to ?langIso ANDprop:P12?langWDQid.# Assign value: P12 'Iso-639-3' to ?langWDQid# LabelsSERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en"}}
SELECT?language(COUNT(?audio)AS?audios)WHERE{VALUES?language{entity:Q21}?audioprop:P2entity:Q2.# P2 'instance of' is Q2 'record'?audioprop:P4?language.# P4 'language' is Q21 'French'}GROUP BY?language
... Loading ...
?✅ Language LL Qid (Q21) → Count unique words
SELECT?language(COUNT(?audio)as?audios)# Count and assign value to ?Audio(COUNT(DISTINCT(?itemLabel))AS?words)(ROUND(10000*?words/?audios)/100AS?percent)WHERE{VALUES?language{entity:Q21}?audioprop:P2entity:Q2.# Filter: P2 'instance of' is Q2 'record'?audioprop:P4?language.# Filter: P4 'language' is Q21 'French'?audiordfs:label?itemLabel.# Assign value: label to ?itemLabel}GROUP BY?language
... Loading ...
✅ Language LL Qid (Q21) → Count speakers
SELECT?language(COUNT(?audio)AS?audio)WHERE{VALUES?language{entity:Q21}?audioprop:P2entity:Q3.# P2 'instance of' is Q3 'speaker'?audioprop:P4?language.# P4 'language' is Q21 'French'}GROUP BY?language
... Loading ...
✅ Language LL Qid (Q209) → List speakers
SELECT?language?speaker?speakerLabelWHERE{VALUES?language{entity:Q209}?speakerprop:P2entity:Q3.# P2 'instance of' is Q3 'speaker'?speakerprop:P4?language.# P4 'language' is Q21 'French'# LabelsSERVICEwikibase:label{bd:serviceParamwikibase:language"[AUTO_LANGUAGE],en".}}
SELECT?itemLabel?itemWHERE{?itemprop:P2entity:Q2.# Filter: P2 'instance of' Q3 'record'?itemprop:P4entity:Q22.# Filter: P4 'language' is Q22 'English'?itemrdfs:label?itemLabel.# Assign value: label to ?itemLabelFILTER(CONTAINS(?itemLabel,"apple"@en)).}limit10
... Loading ...
✅ Language (Q209) + Speaker (Q584098) + String (ni) → Record LL Qid
Case: Search in Breton language, with speaker 'ThonyVezbe',
SELECT?audioWHERE{?audioprop:P4entity:Q209.# P4 'language' is Q209 'Breton'?audioprop:P5entity:Q584098.# P5 'speaker' is Q584098 'ThonyVezbe'?audiordfs:label?word.#wordFILTER(STR(?word)="ni")# word = 'ni'}
SELECT?word?audio?urlPointer(replace(replace(replace(substr(STR(?urlPointer),52),"%20","_"),"%28","("),"%29",")")AS?filename)WHERE{?audioprop:P4entity:Q21.# Filter: P4 'language' is Q21 'French'?audioprop:P5entity:Q137047.# Filter: P5 'speaker' is Q137047 'Justforoc'?audiordfs:label?word.# Assign value: label to ?word#Filter: ?word with 'pomme' in French, non case-sensitiveFILTERREGEX(?word,"pomme"@fr,"i").?audioprop:P3?urlPointer}
... Loading ...
Heavy queries
Queries below are too large to run on LinguaLibre's wikipages, or even on Lingualibre Query Service).
To do: do smaller sub-queries, with one COUNT() function.
❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders
Query
Result
SELECT?languageQidLabel?wdQid?languageQid?isoCode(COUNT(DISTINCT(?record))AS?recordCount)(COUNT(DISTINCT(?speakerLangM))AS?speakerM)(COUNT(DISTINCT(?speakerLangF))AS?speakerF)wWHERE{?recordprop:P2entity:Q2.# Filter: items where P2 'instance of' is Q2 'record'?recordprop:P4?languageQid.# Assign value: P4 'language' into variable ?language?languageQidprop:P12?wdQid.# Assign value: P12 'wikidata id' into variable ?WD?languageQidprop:P13?isoCode.# Assign value: P13 'iso639-3' into ?isoCode#?record prop:P5 ?speakerQidM . # Assign value: P5 'speaker' into variable ?speakerQidM#?speakerQidM prop:P8 entity:Q16 . # Filter: P8 'sex or gender' is Q16 'male#?speakerQidM prop:P4 ?speakerLangM . # Assign value: P4 'language' into variable ?spakerLangM?recordprop:P5?speakerQidF.# Assign value: P5 'speaker' into variable ?speakerQidF?speakerQidFprop:P8entity:Q17.# Filter: P8 'sex or gender' is Q17 'female?speakerQidFprop:P4?speakerLangF.# Assign value: P4 'language' into variable ?spakerLangFSERVICEwikibase:label{bd:serviceParamwikibase:language"en".}}GROUP BY?languageQidLabel?languageQid?wdQid?isoCodeORDER BYDESC(?recordCount)
languageQidLabel wdQid languageQid isoCode recordCount speakerM speakerF
French Q150 Q21 fra 16761 0 18
Marathi Q1571 Q34 mar 13153 0 5
Polish Q809 Q298 pol 11686 0 1
…
❌ Is Language (Q3) → list all languages with number of unique words and speakers