Difference between revisions of "SPARQL"

Revision as of 20:04, 9 December 2021

Draft

2021/12/10 : Work in progress. Please do not translate yet as some aspects stay to polish. You may help by: reading/fixing the page, testing queries here, replacing Q21 (French) by a smaller non-western language, harmonising in-line comments, adding the right category. --Yug

Base

Code snippets

Fetch data using SPARQL

LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.

For code snippet in your language : open query.wikidata.org (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
For downloading data, click "Download".

Javascript:
At least 3 methods exists (code snippet), example:

Query

Result's basic unit

SPARQL:

SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10

{ … },
{
  "item": {
    "type": "uri",
    "value": "https://lingualibre.org/entity/Q12"
  },
  "itemLabel": {
    "xml:lang": "en",
    "type": "literal",
    "value": "beginner"
  }
},
{ … }

Javascript:

var endpoint = 'https://lingualibre.org/sparql';
var sparql = 'SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10';
$.getJSON(endpoint,
	{ query: sparql, format: 'json' },
	function(data){ console.log('JQuery: ',data)}
);

Merging data

Advanced SPARQL queries with COUNT() and others are often slow (>3secs, sometime >100secs). You are encouraged to do multiple smaller SPARQL queries to then merge their responded data. By example, the complementary Javascript snippet below would help web developers to do so.

// Data from 3 sparql queries.
// Important: One key must be similar in all datasets, here: 'qid'
const langs = [{ qid: 'Q209', label: 'Breton', iso:'bre' }, { qid: 'Q21', label: 'French', iso: 'fra' }],
    speakersFemales = [{ qid: 'Q209', genderF: 3, recordsF: 60 }, { qid: 'Q21', genderF: 21, recordsF:15046 }],
    speakersMales = [{ qid: 'Q209', genderM: 7, recordsM: 112 }, { qid: 'Q21', genderM: 85, recordsM:82964 }];
// Toolbox for merging data by same id
var merge2ArraysBySameId = function(arr1,arr2,id1){
	return arr1.map( item1 => { 
  	var identical = arr2.find(obj => obj[id1] === item1[id1]); 
  	return Object.assign(identical, item1) 
  } );
}
// Mergings
var step1 = merge2ArraysBySameId(langs,speakersFemales,'qid');
var step2 = merge2ArraysBySameId(step1,speakersMales,'qid');
alert(JSON.stringify(step2))

Lingualibre's ground

✅ Is Language (language/dialect (Q4)) → List existing languages

SELECT ?lang ?iso ?langLabel WHERE {
  ?lang prop:P2 entity:Q4 .
  ?lang prop:P13 ?iso .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅🇶 Is Speaker (speaker (Q3)) → List existing speakers

SELECT ?speaker ?speakerLabel
WHERE {
  ?speaker prop:P2 entity:Q3 .  # Condition 1, P2 'instance of' is Q3 'speaker'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Is Language level (language level (Q5)) → List existing levels

SELECT ?item ?itemLabel
WHERE {
  ?item prop:P2 entity:Q5    # Condition 1, P2 'instance of' is Q5 'language level'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Is Sex or Gender (sex or gender (Q7)) → List existing sexes or genders

SELECT ?item ?itemLabel
WHERE {
  ?item prop:P2 entity:Q7    # Condition 1, P2 'instance of' is Q7 'sex or gender'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

Speaker

✅ Speaker name(s) → Speaker Qid(s)

SELECT ?speakerName ?speakerId
WHERE {
  VALUES ?speakerName { "Yug" "VIGNERON" } # One or multiple values
  BIND ( STRLANG(?speakerName, "en") AS ?speakerLabel )
  # P2: instance of; Q3: speaker.
  ?speakerId prop:P2 entity:Q3 ; rdfs:label ?speakerLabel .
}

... Loading ...

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data, all

# Get Q42 (User:0x010C)'s data
SELECT ?predicate ?object ?objectLabel
WHERE {
  entity:Q42 ?predicate ?object .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker languages (P4)

SELECT ?languages ?languagesLabel
WHERE {
  entity:Q42 prop:P4 ?languages .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Speaker Qid + language Qid → List associated audios

SELECT ?audio ?audioLabel
WHERE {
  ?audio prop:P5 entity:Q42 .   # Condition 1, P5 Speaker is Q42 User:0x010C
  ?audio prop:P4 entity:Q21 .   # Condition 2, P4 language is Q21 French
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

Languages

✅ Language LL Qid (Q21) → Count items

SELECT ?language (COUNT(?audio) AS ?nbAudio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P4 ?language .
}
GROUP BY ?language

... Loading ...

✅ Language LL Qid (Q21) → Count records

SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P2 entity:Q2 .  # P2 'instance of' is Q2 'record'
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
}
GROUP BY ?language

... Loading ...

?✅ Language LL Qid (Q21) → Count unique words

SELECT ?language (COUNT(?audio) AS ?Audios) (COUNT(DISTINCT(?itemLabel)) AS ?Words)
(ROUND(10000*?Words/?Audios)/100 AS ?Percent)
WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P2 entity:Q2 .  # P2 'instance of' is Q2 'record'
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
  ?audio rdfs:label ?itemLabel. # Assign value: label to ?itemLabel
}
GROUP BY ?language

... Loading ...

✅ Language LL Qid (Q21) → Count speakers

SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P2 entity:Q3 .  # P2 'instance of' is Q3 'speaker'
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
}
GROUP BY ?language

... Loading ...

✅ Language LL Qid (Q209) → List speakers

SELECT ?language ?speaker ?speakerLabel WHERE {
  VALUES ?language { entity:Q209 }
  ?speaker prop:P2 entity:Q3 .  # P2 'instance of' is Q3 'speaker'
  ?speaker prop:P4 ?language .  # P4 'language' is Q21 'French'
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records

SELECT ?language ?speakerLabel (COUNT(?audio) AS ?audio)
WHERE {
  VALUES ?language { entity:Q21 }
  VALUES ?speaker { entity:Q42 }
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
  ?audio prop:P2 entity:Q2 .  # P2 'instance of' is Q2 'record'
  ?audio prop:P5 ?speaker . # P5 'speaker' is Q42 '0x010C'
  # Labels
  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"} 
}
GROUP BY ?language ?speakerLabel

... Loading ...

✅ Languages → Languages iso-639-3

SELECT * WHERE {
  ?lang prop:P13 ?code .
}

... Loading ...

✅ Isolang → Language LL Qid

SELECT ?langIso ?langId
WHERE {
  VALUES ?langIso { "ban" "bre" } # One or multiple values
  # P2 'instance of'; Q4 'language'; P13 'ISO 639-3 code'
  ?langId prop:P2 entity:Q4 ; prop:P13 ?langIso .
}

... Loading ...

Isolang → Language WD Qid

✅ Language WD Qid → Language data, all

SELECT * WHERE {
  ?lang prop:P12 "Q12107" .  # P12 'Wikidata id' is Wikidata's "Q12107"
  ?lang ?predicate ?object . # 
}

... Loading ...

✅ Language LL Qid (Breton (Q209)) → Language data, all

'Case: Get for language Q209 'Breton' all its data.

SELECT * WHERE {
  # Given Q209 'Breton language', get all properties and values
  entity:Q209 ?predicate ?object .
}

... Loading ...

✅ Language LL Qid (Breton (Q209)) → Language data, core

'Case: Get for language Q209 'Breton' all its CORE data.

SELECT * WHERE {
  # Given Q209 'Breton language', get all properties and values
  entity:Q209 ?predicate ?object .
  ?predicate rdf:type owl:DatatypeProperty .
}

... Loading ...

Records

✅ Item name + language → Qid(s)

SELECT ?itemLabel ?item
WHERE { 
  ?item prop:P2 entity:Q2 .    # Filter: P2 'instance of' Q3 'record'
  ?item rdfs:label ?itemLabel. # Assign value: label to ?itemLabel
  FILTER(CONTAINS(?itemLabel, "apple"@en)). 
} limit 10

... Loading ...

Audio Qid → Audio data

✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid

Case: Search in Breton language, with speaker 'ThonyVezbe',

SELECT ?audio
WHERE {
  ?audio prop:P4 entity:Q209 .    # P4 'language' is Q209 'Breton'
  ?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
  ?audio rdfs:label ?word . #word
  FILTER ( STR(?word) = "ni" )    # word = 'ni'
}

... Loading ...

Langue + speaker + word → Audio's Commons url pointeur (P3)

SELECT ?word ?audio ?url (STR(?url) AS ?urlStr)
WHERE {
  ?audio prop:P4 entity:Q209 .    # P4 'language' is Q209 'Breton'
  ?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
  ?audio rdfs:label ?word .       # word
  FILTER ( STR(?word) = "achantour" )    # Filter: word is 'achantour'
  ?audio prop:P3 ?url
}

... Loading ...

Heavy queries

Queries below are too large to run on LinguaLibre's wikipages, or even on Lingualibre Query Service).
To do: do smaller sub-queries, with one COUNT() function.

❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders

Query

Result

SELECT ?languageQidLabel ?wdQid ?languageQid ?isoCode 
(COUNT(DISTINCT(?record)) AS ?recordCount)
(COUNT(DISTINCT(?speakerLangM)) AS ?speakerM) 
(COUNT(DISTINCT(?speakerLangF)) AS ?speakerF)
wWHERE{
  ?record prop:P2 entity:Q2 .     # Filter: items where P2 'instance of' is Q2 'record'
  ?record prop:P4 ?languageQid .  # Assign value: P4 'language' into variable ?language
  ?languageQid prop:P12 ?wdQid .  # Assign value: P12 'wikidata id' into variable ?WD
  ?languageQid prop:P13 ?isoCode. # Assign value: P13 'iso639-3' into ?isoCode
  
  #?record prop:P5 ?speakerQidM .   # Assign value: P5 'speaker' into variable ?speakerQidM
  #?speakerQidM prop:P8 entity:Q16 .   # Filter: P8 'sex or gender' is Q16 'male
  #?speakerQidM prop:P4 ?speakerLangM .  # Assign value: P4 'language' into variable ?spakerLangM
  
  ?record prop:P5 ?speakerQidF .   # Assign value: P5 'speaker' into variable ?speakerQidF
  ?speakerQidF prop:P8 entity:Q17 .   # Filter: P8 'sex or gender' is Q17 'female
  ?speakerQidF prop:P4 ?speakerLangF .  # Assign value: P4 'language' into variable ?spakerLangF
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } 
}
GROUP BY ?languageQidLabel ?languageQid ?wdQid ?isoCode
ORDER BY DESC(?recordCount)

languageQidLabel	wdQid	languageQid	isoCode	recordCount	speakerM	speakerF
French	Q150	Q21	fra	16761	0	18
Marathi	Q1571	Q34	mar	13153	0	5
Polish	Q809	Q298	pol	11686	0	1
…

❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers

SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
  ?language prop:P2 entity:Q4 .
  ?audio prop:P4 ?language .
  ?speaker prop:P4 ?language .
}
GROUP BY ?language

Tools

LinguaLibre Query Service – run SPARQL Queries to LinguaLibre here.
Special:ApiSandbox – API queries generator for Lingualibre wikipage and wikibase contents.

Wikidata lexemes

You may [<tvar|hackme>https://jsfiddle.net/hugolpz/rygo9s5b/</> Hack me !]

It is possible to use Wikidata & Wiktionary to extract lexicographical information. Developer <tvar|sina_ahm>@sina_ahm</> created a SPARQL query generator, helping us to search words in both Wikidata & Dbnary. See demo there : <tvar|1>https://sinaahmadi.github.io/posts/sparql-query-generator-for-lexicographical-data.html</>

Revision as of 20:04, 9 December 2021 (view source) Yug (talk \| contribs) (Introduction) ← Older edit		Revision as of 20:04, 9 December 2021 (view source) Yug (talk \| contribs) Newer edit →
Line 1:		Line 1:
−	{{#Subtitle:'''Help:SPARQL''' gather a list of basic SPARQL queries in the context of Lingua Libre, together with few tools, advices, inline-comments and SPARQL's beginners-friendly concepts. This page allows users not familiar with SPARQL to rapidly learn the basics of SPARQL on LinguaLibre, query the database, and to download or feed that data to an application. For convenience and to fit with most frequent usages, the case of a web developper with basic ~~Javascript~~ skill is taken.}}	+	{{#Subtitle:'''Help:SPARQL''' gather a list of basic SPARQL queries in the context of Lingua Libre, together with few tools, advices, inline-comments and SPARQL's beginners-friendly concepts. This page allows users not familiar with SPARQL to rapidly learn the basics of SPARQL on LinguaLibre, query the database, and to download or feed that data to an application. For convenience and to fit with most frequent usages, the case of a web developper with basic Javascripts skill is taken.}}
	[[Category:?]]		[[Category:?]]

Revision as of 20:04, 9 December 2021

Base

Code snippets

Fetch data using SPARQL

Merging data

Lingualibre's ground

✅ Is Language (language/dialect (Q4)) → List existing languages

✅🇶 Is Speaker (speaker (Q3)) → List existing speakers

✅ Is Language level (language level (Q5)) → List existing levels

✅ Is Sex or Gender (sex or gender (Q7)) → List existing sexes or genders

Speaker

✅ Speaker name(s) → Speaker Qid(s)

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data, all

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker languages (P4)

✅ Speaker Qid + language Qid → List associated audios

Languages

✅ Language LL Qid (Q21) → Count items

✅ Language LL Qid (Q21) → Count records

?✅ Language LL Qid (Q21) → Count unique words

✅ Language LL Qid (Q21) → Count speakers

✅ Language LL Qid (Q209) → List speakers

✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records

✅ Languages → Languages iso-639-3

✅ Isolang → Language LL Qid

Isolang → Language WD Qid

✅ Language WD Qid → Language data, all

✅ Language LL Qid (Breton (Q209)) → Language data, all

✅ Language LL Qid (Breton (Q209)) → Language data, core

Records

✅ Item name + language → Qid(s)

Audio Qid → Audio data

✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid

Langue + speaker + word → Audio's Commons url pointeur (P3)

Heavy queries

❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders

❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers

Tools

Wikidata lexemes

See also