Difference between revisions of "SPARQL"

Revision as of 16:23, 8 December 2021

Base

Fetch data using SPARQL

LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.

For code snippet in your language : open query.wikidata.org (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
For downloading data, click "Download".

Javascript:
At least 3 methods exists (code snippet), example:

Query

Result's basic unit

SPARQL:

SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10

{ … },
{
  "item": {
    "type": "uri",
    "value": "https://lingualibre.org/entity/Q12"
  },
  "itemLabel": {
    "xml:lang": "en",
    "type": "literal",
    "value": "beginner"
  }
},
{ … }

Javascript:

var endpoint = 'https://lingualibre.org/sparql';
var sparql = 'SELECT ?item WHERE { ?item prop:P2 entity:Q5 } LIMIT 10';
$.getJSON(endpoint,
	{ query: sparql, format: 'json' },
	function(data){ console.log('JQuery: ',data)}
);

Lingualibre descriptors

✅ Is Language level (language level (Q5)) → list possible values

SELECT ?item ?itemLabel
WHERE {
  ?item prop:P2 entity:Q5    # Condition 1, P2 'instance of' is Q5 'language level'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Is Sex or Gender (sex or gender (Q7)) → list possible values

SELECT ?item ?itemLabel
WHERE {
  ?item prop:P2 entity:Q7    # Condition 1, P2 'instance of' is Q7 'sex or gender'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅🇶 Is Speaker (speaker (Q3)) → list possible speakers

SELECT ?speaker ?speakerLabel
WHERE {
  ?speaker prop:P2 entity:Q3 .  # Condition 1, P2 'instance of' is Q3 'speaker'.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

Speaker

✅ Speaker name(s) → Speaker Qid(s)

SELECT ?speakerName ?speakerId
WHERE {
  VALUES ?speakerName { "Yug" "VIGNERON" } # One or multiple values
  BIND ( STRLANG(?speakerName, "en") AS ?speakerLabel )
  # P2: instance of; Q3: speaker.
  ?speakerId prop:P2 entity:Q3 ; rdfs:label ?speakerLabel .
}

... Loading ...

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data

# Get Q42 (User:0x010C)'s data
SELECT ?predicate ?object ?objectLabel
WHERE {
  entity:Q42 ?predicate ?object .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data → Speaker languages (P4)

SELECT ?languages ?languagesLabel
WHERE {
  entity:Q42 prop:P4 ?languages .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Speaker Qid + language → list of all associated audios

SELECT ?audio ?audioLabel
WHERE {
  ?audio prop:P5 entity:Q42 .   # Condition 1, P5 Speaker is Q42 User:0x010C
  ?audio prop:P4 entity:Q21 .   # Condition 2, P4 language is Q21 French
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

Languages

✅ Language LL Qid (Q21) → Count items

SELECT ?language (COUNT(?audio) AS ?nbAudio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P4 ?language .
}
GROUP BY ?language

... Loading ...

✅ Language LL Qid (Q21) → Count records

SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P2 entity:Q2 .  # P2 'instance of' is Q2 'record'
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
}
GROUP BY ?language

... Loading ...

Language LL Qid (Q21) → Count unique words

✅ Language LL Qid (Q21) → Count speakers

SELECT ?language (COUNT(?audio) AS ?audio) WHERE {
  VALUES ?language { entity:Q21 }
  ?audio prop:P2 entity:Q3 .  # P2 'instance of' is Q3 'speaker'
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
}
GROUP BY ?language

... Loading ...

✅ Language LL Qid (Q209) → List speakers

SELECT ?language ?speaker ?speakerLabel WHERE {
  VALUES ?language { entity:Q209 }
  ?speaker prop:P2 entity:Q3 .  # P2 'instance of' is Q3 'speaker'
  ?speaker prop:P4 ?language .  # P4 'language' is Q21 'French'
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}

... Loading ...

✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records

SELECT ?language ?speakerLabel (COUNT(?audio) AS ?audio)
WHERE {
  VALUES ?language { entity:Q21 }
  VALUES ?speaker { entity:Q42 }
  ?audio prop:P4 ?language .  # P4 'language' is Q21 'French'
  ?audio prop:P2 entity:Q2 .  # P2 'instance of' is Q2 'record'
  ?audio prop:P5 ?speaker . # P5 'speaker' is Q42 '0x010C'
  # Labels
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}
GROUP BY ?language ?speakerLabel

... Loading ...

Isolang → Language LL Qid

SELECT * WHERE {
  ?lang prop:P13 ?code .
}

... Loading ...

✅ Isolang → Language WD Qid

SELECT ?langIso ?langId
WHERE {
  VALUES ?langIso { "ban" "bre" } # One or multiple values
  # P2 'instance of'; Q4 'language'; P13 'ISO 639-3 code'
  ?langId prop:P2 entity:Q4 ; prop:P13 ?langIso .
}

... Loading ...

✅ Language WD Qid → Language data, all

SELECT * WHERE {
  ?lang prop:P12 "Q12107" .  # P12 'Wikidata id' is Wikidata's "Q12107"
  ?lang ?predicate ?object . # 
}

... Loading ...

✅ Language LL Qid (Breton (Q209)) → Language data, all

'Case: Get for language Q209 'Breton' all its data.

SELECT * WHERE {
  # Given Q209 'Breton language', get all properties and values
  entity:Q209 ?predicate ?object .
}

... Loading ...

✅ Language LL Qid (Breton (Q209)) → Language data, core

'Case: Get for language Q209 'Breton' all its CORE data.

SELECT * WHERE {
  # Given Q209 'Breton language', get all properties and values
  entity:Q209 ?predicate ?object .
  ?predicate rdf:type owl:DatatypeProperty .
}

... Loading ...

✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid

Case: Search in Breton language, with speaker 'ThonyVezbe',

SELECT ?audio
WHERE {
  ?audio prop:P4 entity:Q209 .    # P4 'language' is Q209 'Breton'
  ?audio prop:P5 entity:Q584098 . # P5 'speaker' is Q584098 'ThonyVezbe'
  ?audio rdfs:label ?word . #word
  FILTER ( STR(?word) = "ni" )    # word = 'ni'
}

... Loading ...

Records

✅ Item name → Qid(s)

SELECT ?item ?itemLabel
WHERE { 
  ?item rdfs:label ?itemLabel. 
  FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)). 
} limit 10

... Loading ...

Audio Qid → Audio data

Langue + speaker + word → Audio's Commons url

Heavy queries

❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers

Too large to run (not even on Lingualibre Query).

SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
  ?language prop:P2 entity:Q4 .
  ?audio prop:P4 ?language .
  ?speaker prop:P4 ?language .
}
GROUP BY ?language

To do: do smaller sub-queries. For now, works only for one counter and one language at a time:

❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders

Query

Result

SELECT ?languageQidLabel ?wdQid ?languageQid ?isoCode 
(COUNT(DISTINCT(?record)) AS ?recordCount)
(COUNT(DISTINCT(?speakerLangM)) AS ?speakerM) 
(COUNT(DISTINCT(?speakerLangF)) AS ?speakerF)
wWHERE{
  ?record prop:P2 entity:Q2 .     # Filter: items where P2 'instance of' is Q2 'record'
  ?record prop:P4 ?languageQid .  # Assign value: P4 'language' into variable ?language
  ?languageQid prop:P12 ?wdQid .  # Assign value: P12 'wikidata id' into variable ?WD
  ?languageQid prop:P13 ?isoCode. # Assign value: P13 'iso639-3' into ?isoCode
  
  #?record prop:P5 ?speakerQidM .   # Assign value: P5 'speaker' into variable ?speakerQidM
  #?speakerQidM prop:P8 entity:Q16 .   # Filter: P8 'sex or gender' is Q16 'male
  #?speakerQidM prop:P4 ?speakerLangM .  # Assign value: P4 'language' into variable ?spakerLangM
  
  ?record prop:P5 ?speakerQidF .   # Assign value: P5 'speaker' into variable ?speakerQidF
  ?speakerQidF prop:P8 entity:Q17 .   # Filter: P8 'sex or gender' is Q17 'female
  ?speakerQidF prop:P4 ?speakerLangF .  # Assign value: P4 'language' into variable ?spakerLangF
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } 
}
GROUP BY ?languageQidLabel ?languageQid ?wdQid ?isoCode
ORDER BY DESC(?recordCount)

languageQidLabel	wdQid	languageQid	isoCode	recordCount	speakerM	speakerF
French	Q150	Q21	fra	16761	0	18
Marathi	Q1571	Q34	mar	13153	0	5
Polish	Q809	Q298	pol	11686	0	1
…

Tools

Special:ApiSandbox – API queries generator for Lingualibre wikipage and wikibase contents.

Help

Difference between revisions of "SPARQL"

Revision as of 16:23, 8 December 2021

Contents

Base

Fetch data using SPARQL

Lingualibre descriptors

✅ Is Language level (language level (Q5)) → list possible values

✅ Is Sex or Gender (sex or gender (Q7)) → list possible values

✅🇶 Is Speaker (speaker (Q3)) → list possible speakers

Speaker

✅ Speaker name(s) → Speaker Qid(s)

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data

✅🇶 Speaker Qid (0x010C (Q42)) → Speaker data → Speaker languages (P4)

✅ Speaker Qid + language → list of all associated audios

Languages

✅ Language LL Qid (Q21) → Count items

✅ Language LL Qid (Q21) → Count records

Language LL Qid (Q21) → Count unique words

✅ Language LL Qid (Q21) → Count speakers

✅ Language LL Qid (Q209) → List speakers

✅ Language LL Qid (French (Q21)) + Speaker (0x010C (Q42)) → Count records

Isolang → Language LL Qid

✅ Isolang → Language WD Qid

✅ Language WD Qid → Language data, all

✅ Language LL Qid (Breton (Q209)) → Language data, all

✅ Language LL Qid (Breton (Q209)) → Language data, core

✅ Language (Breton (Q209)) + speaker (ThonyVezbe (Q584098)) + word (ni) → Audio's Qid

Records

✅ Item name → Qid(s)

Audio Qid → Audio data

Langue + speaker + word → Audio's Commons url

Heavy queries

❌ Is Language (speaker (Q3)) → list all languages with number of unique words and speakers

❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders

Tools

@@ Line 5: / Line 5: @@
 * [[DataViz:Records]]
-== Fetch SPARQL data ==
+== Fetch data using SPARQL ==
-Data can be fetched using various coding languages such as Python, Javascript, R and others. On the [https://query.wikidata.org/ Wikidata Query Service page], after running your SPARQL query, click "Code" : a pop up window appears with various implementations.
+LinguaLibre data can be fetched using various coding languages such as Python, Javascript, R and others, returning JSON or other formats.
+* For code snippet in your language : open [https://query.wikidata.org query.wikidata.org] (WikiData Query Service, aka WDQS), run your SPARQL query, click "Code" : a pop up window appears with various implementations.
+* For downloading data, click "Download".
 '''Javascript:'''<br>
@@ Line 46: / Line 48: @@
 |}
-== ✅ Is Language level ([[Q5]]) → list possible values ==
+== Lingualibre descriptors ==
+=== ✅ Is Language level ([[Q5]]) → list possible values ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 71: / Line 74: @@
 |}
-== ✅ Is Sex or Gender([[Q7]]) → list possible values ==
+=== ✅ Is Sex or Gender ([[Q7]]) → list possible values ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 96: / Line 99: @@
 |}
-== ✅🇶 Is Speaker ([[Q3]]) → list all speakers ==
+=== ✅🇶 Is Speaker ([[Q3]]) → list possible speakers ===
 <!-- Q: add grouping per language ?-->
 {| style="width:100%"
@@ Line 122: / Line 125: @@
 |}
+== Speaker ==
-== ✅ Item name → Qid(s) ==
+=== ✅ Speaker name(s) → Speaker Qid(s) ===
-{| style="width:100%"
-|- style="vertical-align:top;"
-|style="padding: 0 3em;width:60%"|
-<syntaxhighlight lang="sparql">
-SELECT ?item ?itemLabel
-WHERE {
-  ?item rdfs:label ?itemLabel.
-  FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)).
-} limit 10
-</syntaxhighlight>
-||
-<query _pagination="5">
-SELECT ?item ?itemLabel
-WHERE {
-  ?item rdfs:label ?itemLabel.
-  FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)).
-} limit 10
-</query>
-|}
-== ✅ Speaker name(s) → Speaker Qid(s) ==
 {| style="width:100%"
@@ Line 174: / Line 153: @@
 |}
-== ✅🇶 Speaker Qid ([[Q42]]) → Speaker data ==
+=== ✅🇶 Speaker Qid ([[Q42]]) → Speaker data ===
 <!-- Q: alternative words for "predicate" and "object". "property" and "value" ?-->
 {| style="width:100%"
@@ Line 201: / Line 180: @@
 |}
-== ✅🇶  Speaker Qid ([[Q42]]) → Speaker data → Speaker languages ([[Property:P4|P4]]) ==
+=== ✅🇶  Speaker Qid ([[Q42]]) → Speaker data → Speaker languages ([[Property:P4|P4]]) ===
 <!-- Q: Add languages iso P:13 -->
 {| style="width:100%"
@@ Line 227: / Line 206: @@
 |}
-== ✅ Speaker Qid + language → list of all associated audios ==
+=== ✅ Speaker Qid + language → list of all associated audios ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 256: / Line 235: @@
 |}
-== ❌ Is Language ([[Q3]]) → list all languages with number of unique words and speakers ==
+== Languages ==
-Too large to run (not even on [https://lingualibre.org/bigdata/#query Lingualibre Query]).
-<syntaxhighlight lang="sparql">
-SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
-  ?language prop:P2 entity:Q4 .
-  ?audio prop:P4 ?language .
-  ?speaker prop:P4 ?language .
-}
-GROUP BY ?language
-</syntaxhighlight>
-To do: do smaller sub-queries. For now, works only for one counter and one language at a time:
-=== Sub-queries ===
+=== ✅ Language LL Qid (Q21) → Count items ===
-==== ✅ Language LL Qid (Q21) → All items ====
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 290: / Line 257: @@
 </query>
 |}
+=== ✅ Language LL Qid (Q21) → Count records ===
-==== ✅ Language LL Qid (Q21) → Number of records ====
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 314: / Line 280: @@
 |}
-==== <!--✅-->Language LL Qid (Q21) → Number of unique words ====
+=== <!--✅-->Language LL Qid (Q21) → Count unique words ===
-==== ✅ Language LL Qid (Q21) → Number of speakers ====
+=== ✅ Language LL Qid (Q21) → Count speakers ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 339: / Line 305: @@
 |}
+=== ✅ Language LL Qid (Q209) → List speakers ===
-==== ✅ Language LL Qid (Q209) → List of speakers ====
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 368: / Line 333: @@
 |}
-==== ✅ Language LL Qid ([[Q21]]) + Speaker ([[Q42]]) → Number of records ====
+=== ✅ Language LL Qid ([[Q21]]) + Speaker ([[Q42]]) → Count records ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 405: / Line 370: @@
 |}
-== Isolang → Language LL Qid ==
+=== Isolang → Language LL Qid ===
 {| style="width:100%"
@@ Line 423: / Line 388: @@
 |}
-== ✅ Isolang → Language WD Qid ==
+=== ✅ Isolang → Language WD Qid ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 445: / Line 410: @@
 |}
-== ✅ Language WD Qid → Language data ==
+=== ✅ Language WD Qid → Language data, all ===
 {| style="width:100%"
 |- style="vertical-align:top;"
@@ Line 464: / Line 429: @@
 |}
-== ✅ Language LL Qid ([[Q209]]) → Language data ==
+=== ✅ Language LL Qid ([[Q209]]) → Language data, all ===
 '''Case:'' Get for language Q209 'Breton' all its data.
 {| style="width:100%"
@@ Line 483: / Line 448: @@
 |}
-== ✅ Language LL Qid ([[Q209]]) → core Language data ==
+=== ✅ Language LL Qid ([[Q209]]) → Language data, core ===
 '''Case:'' Get for language Q209 'Breton' all its CORE data.
 {| style="width:100%"
@@ Line 504: / Line 469: @@
 |}
-== ✅ Language ([[Q209]]) + speaker ([[Q584098]]) + word (ni) → Audio's Qid ==
+=== ✅ Language ([[Q209]]) + speaker ([[Q584098]]) + word (ni) → Audio's Qid ===
 '''Case:''' Search in Breton language, with speaker 'ThonyVezbe',
 {| style="width:100%"
@@ Line 530: / Line 495: @@
 |}
-== Audio Qid → Audio data ==
+== Records ==
-== ✅ Langue + speaker + word → Audio's Commons url  ==
+=== ✅ Item name → Qid(s) ===
+{| style="width:100%"
+|- style="vertical-align:top;"
+|style="padding: 0 3em;width:60%"|
+<syntaxhighlight lang="sparql">
+SELECT ?item ?itemLabel
+WHERE {
+  ?item rdfs:label ?itemLabel.
+  FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)).
+} limit 10
+</syntaxhighlight>
+||
+<query _pagination="5">
+SELECT ?item ?itemLabel
+WHERE {
+  ?item rdfs:label ?itemLabel.
+  FILTER(CONTAINS(LCASE(?itemLabel), "Yug"@en)).
+} limit 10
+</query>
+|}
+=== Audio Qid → Audio data ===
+=== <!--✅--> Langue + speaker + word → Audio's Commons url  ===
-== Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders  ==
+== Heavy queries ==
+=== ❌ Is Language ([[Q3]]) → list all languages with number of unique words and speakers ===
+Too large to run (not even on [https://lingualibre.org/bigdata/#query Lingualibre Query]).
+<syntaxhighlight lang="sparql">
+SELECT ?language (COUNT(?audio) AS ?nbAudio) (COUNT(?speaker) AS ?nbSpeaker) WHERE {
+  ?language prop:P2 entity:Q4 .
+  ?audio prop:P4 ?language .
+  ?speaker prop:P4 ?language .
+}
+GROUP BY ?language
+</syntaxhighlight>
+To do: do smaller sub-queries. For now, works only for one counter and one language at a time:
+=== ❌ Languages → Name, Wikidata Qid, LLQid, Iso-639-3, and genders  ===
 {| style="width:100%"
 |-