Help
Difference between revisions of "SPARQL (intermediate)"
Help:SPARQL 2 will explore federated queries fetching data from both LinguaguaLibre and Wikidata's endpoints, then Wikidata Lexemes, an emerging source of lexicographic data. The duo can be a solid combo to provide lexicographic and multimedia (audio recordings and images) for either Wikimedia modules or web developers.
m ('Dead languages' : move query to Languages section ; add wikidata Qid to #Notable elements) |
m (→Languages) |
||
Line 63: | Line 63: | ||
=== ✅ Language () → List of LL languages with wd speaker population === | === ✅ Language () → List of LL languages with wd speaker population === | ||
− | + | === ✅ Languages () → List of LL languages with wikidata dead or extinct status === | |
− | === | ||
− | |||
It obtains the list of dead and extinct languages from Wikidata. This query is expected to be run in Lingua Libre SPARQL endpoint. It shouldn't be run in the SPARQL endpoint of Wikidata or Wikidata Query Service. | It obtains the list of dead and extinct languages from Wikidata. This query is expected to be run in Lingua Libre SPARQL endpoint. It shouldn't be run in the SPARQL endpoint of Wikidata or Wikidata Query Service. | ||
Revision as of 08:06, 22 January 2022
Tools
Lexemes Queries Generator
SPARQL to persitent data
Some SPARQL queries are meaningful but heavy and overly slow. This administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <0.1 second. Multiple data can also be merged via a common property if any.
Federated queries
- To query Lingualibre from Wikidata, use
SERVICE <https://lingualibre.org/sparql>
. - To query Wikidata from LinguaLibre, use
SERVICE <https://query.wikidata.org/sparql>
.
Retrieve data of LinguaLibre from Wikidata
The following query shows a simple example of retrieving data of LinguaLibre from Wikidata Query Service. It lists the existing levels in LinguaLibre.
PREFIX prop: <https://lingualibre.org/prop/direct/>
PREFIX entity: <https://lingualibre.org/entity/>
SELECT * {
SERVICE <https://lingualibre.org/sparql> {
SELECT
?item
?itemLabel
{
?item prop:P2 entity:Q5.
SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
}
}
}
Notable elements
LinguaLibre endpoint | Wikidata endpoint |
---|---|
|
Languages
✅ Language () → List of LL languages with wd speaker population
✅ Languages () → List of LL languages with wikidata dead or extinct status
It obtains the list of dead and extinct languages from Wikidata. This query is expected to be run in Lingua Libre SPARQL endpoint. It shouldn't be run in the SPARQL endpoint of Wikidata or Wikidata Query Service.
SELECT
?deadLanguageLinguaLibre
?deadLanguageLinguaLibreLabel
?count
WITH {
SELECT DISTINCT ?deadLanguage {
SERVICE <https://query.wikidata.org/sparql> {
{ ?deadLanguage wdt:P31/wdt:P279* wd:Q45762. }
UNION
{ ?deadLanguage wdt:P31/wdt:P279* wd:Q38058796. }
}
}
} AS %deadLanguage
WITH {
SELECT ?deadLanguageLinguaLibre {
INCLUDE %deadLanguage.
BIND(REPLACE(STR(?deadLanguage), '.*/', '') AS ?deadLanguageQid)
?deadLanguageLinguaLibre
prop:P2 entity:Q4;
prop:P12 ?deadLanguageQid.
}
} AS %deadLanguageLinguaLibre
WITH {
SELECT
?deadLanguageLinguaLibre
(COUNT(*) AS ?count)
{
INCLUDE %deadLanguageLinguaLibre.
?audio
prop:P2 entity:Q2;
prop:P4 ?deadLanguageLinguaLibre.
}
GROUP BY ?deadLanguageLinguaLibre
} AS %count
{
INCLUDE %deadLanguageLinguaLibre.
OPTIONAL{INCLUDE %count.}
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?count)
❌ Language (Marathi (Q34)) → Wikidata Qid(s) → Geo-coordinates
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX ll: <https://lingualibre.org/entity/>
PREFIX llt: <https://lingualibre.org/prop/direct/>
PREFIX lltn: <https://lingualibre.org/prop/direct-normalized/>
select distinct ?record ?transcription ?languageLabel ?wdQid ?wdQidLabel ?wdLabel ?coord
where {
?record llt:P2 ll:Q2 . # Filter: P2 'instance of' is Q2 'record'
?record llt:P4 ll:Q34 . # Filter: record's P4 'language' is Q34 'Marathi'
?record llt:P4 ?language . # Assign value: record's P4 'language' to variable ?language
?record llt:P7 ?transcription . # Assign value: record's P7 'transcription' to variable ?transcription
?record lltn:P12 ?wdQid . # Assign value: record's P12 'wikidata id' to variable ?wikidataItem
SERVICE <https://query.wikidata.org/sparql> {
OPTIONAL { ?wdQid wdt:P625 ?coord . } # Assign value: wikidata item's wd:P625 'coordinates' to variable ?coord
OPTIONAL {
?wdQid rdfs:label ?wdLabel . # Assign value: wikidata item's label to variable ?wikidataLabel
FILTER (LANG(?wdLabel) = "en") . # Filter: default language, else English
}
}
SERVICE wikibase:label {
bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
}
}
|
Result type:record transcription languageLabel wdQid wdQidLabel wdLabel coord Q196212 Tathavade Marathi Q2719024 Q2719024 Tathavade Point(73.74 18.62) Q428904 Jambavade Marathi Q24894740 Q24894740 Jambavade Point(73.85 18.51) Q428900 Dhangavhan Marathi Q24885008 Q24885008 Dhangavhan Point(73.85 18.52) … |
Lexemes
✅ Language (d:Q12107) → List of wd lexemes
Example : Q12107 breton.
✅ Language () → List of wd lexemes with LL audio
✅ Language () → List of wd lexemes with LL audio and wd translation (d:Q150)
✅ Language () → List of wd lexemes (d:Q150)
- Strange query from User:VIGNERON/common.js
SELECT DISTINCT ?lexemeLabel ?lexeme
WITH {
SELECT ?lexeme ?lexemeLabel ?lexical_category WHERE {
?lexeme a ontolex:LexicalEntry ;
dct:language wd:Q12107 ;
wikibase:lemma ?lexemeLabel .
OPTIONAL {
?lexeme wikibase:lexicalCategory ?lexical_category .
}
}
} AS %results
WHERE {
INCLUDE %results
OPTIONAL {
?lexical_category rdfs:label ?lexical_categoryLabel .
FILTER (LANG(?lexical_categoryLabel) = "en")
}
}
|
Speakers
✅ Speakers → Largest number of languages recorded and known
#Title: Speakers with recordings largest number of languages and known languages
SELECT ?speaker ?speakerLabel ?count ?languages
# Get audios, language, speaker triplet
WITH {
SELECT DISTINCT ?speaker ?language {
?audio prop:P4 ?language;
prop:P5 ?speaker.
}
} AS %speakers
# Get the count of languages per each speaker
WITH {
SELECT ?speaker (COUNT(?speaker) AS ?count) {
INCLUDE %speakers.
}
GROUP BY ?speaker
ORDER BY DESC(?count)
} AS %countOfLanguagesRecordedPerSpeaker
# Get the maximum number of languages per each speaker
WITH {
SELECT (MAX(?count) AS ?maxNumberOfLanguagesRecorded) {
INCLUDE %countOfLanguagesRecordedPerSpeaker.
}
} AS %maxNumberOfLanguagesRecorded
# Get those speakers whose count equals the maximum number of languages
WITH {
SELECT ?speaker ?count {
INCLUDE %countOfLanguagesRecordedPerSpeaker.
INCLUDE %maxNumberOfLanguagesRecorded.
FILTER(?count = ?maxNumberOfLanguagesRecorded).
}
} AS %speakersWithMostNumberOfLanguagesRecorded
# Get the languages of those speakers that have recorded audios in the
# most number of languages
WITH {
SELECT ?speaker (GROUP_CONCAT(?languageLabel; SEPARATOR = ", ") AS ?languages) {
INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
?speaker prop:P4 [
rdfs:label ?languageLabel
]
FILTER(LANG(?languageLabel) = "en").
}
GROUP BY ?speaker
} AS %languagesOfSpeakersWithMostNumberOfLanguagesRecorded
{
INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
INCLUDE %languagesOfSpeakersWithMostNumberOfLanguagesRecorded.
?speaker rdfs:label ?speakerLabel.
FILTER(LANG(?speakerLabel) = "en")
}
|
|
✅ Speakers → Countries with most speakers
SELECT ?country ?continentLabel ?ISO3 ?countryLabel (COUNT(?country) AS ?count)
WITH {
SELECT DISTINCT ?speaker {
?speaker prop:P2 entity:Q3;
}
} AS %speakers
WITH {
SELECT DISTINCT
?speaker
?country
?countryLabel
?ISO3
?continentLabel
{
INCLUDE %speakers.
?speaker prop:P14 ?residence.
# Avoids weird errors.
FILTER(REGEX(?residence, "^Q[0-9]+$"))
BIND(IRI(CONCAT('http://www.wikidata.org/entity/', ?residence)) AS ?residenceId)
# Get country from wikidata
SERVICE <https://query.wikidata.org/sparql> {
?residenceId wdt:P17 ?country.
?country rdfs:label ?countryLabel;
wdt:P298 ?ISO3;
wdt:P30 ?continent.
?continent rdfs:label ?continentLabel.
FILTER(LANG(?countryLabel) = "en").
FILTER(LANG(?continentLabel) = "en").
}
}
} AS %speakersWithCountries
{
INCLUDE %speakersWithCountries.
}
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
ORDER BY DESC(?count)
|
|
Speakers → Map of speakers by place
#defaultView:Map
PREFIX ll: <https://lingualibre.org/entity/>
PREFIX llt: <https://lingualibre.org/prop/direct/>
SELECT DISTINCT ?lLabel ?coord WITH {
SELECT ?lLabel ?loc WHERE {
SERVICE <https://lingualibre.org/sparql> {
select DISTINCT ?lLabel ?loc {
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
?l llt:P2 ll:Q3 ;
llt:P14 ?loc .
?record llt:P5 ?l.
FILTER (regex(?loc, '^Q'))
}
}
}
} AS %i
WHERE {
INCLUDE %i
BIND (URI(CONCAT("http://www.wikidata.org/entity/", ?loc)) AS ?locURL)
SERVICE <https://query.wikidata.org/sparql> {
select * {
?locURL wdt:P625 ?coord .
}
}
}
|
|