Help

Difference between revisions of "SPARQL (intermediate)"

Help:SPARQL 2 will explore federated queries fetching data from both LinguaguaLibre and Wikidata's endpoints, then Wikidata Lexemes, an emerging source of lexicographic data. The duo can be a solid combo to provide lexicographic and multimedia (audio recordings and images) for either Wikimedia modules or web developers.

m (Reverted edits by Yug (talk) to last revision by Rdrg109)
Tag: Rollback
m (Fix.)
Tag: Undo
Line 5: Line 5:
 
=== Lexemes Queries Generator ===
 
=== Lexemes Queries Generator ===
 
{{LexemeQueriesGenerator}}
 
{{LexemeQueriesGenerator}}
 +
 +
=== SPARQL to persitent data ===
 +
''Some SPARQL queries are meaningful but heavy and overly slow. His administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <1 second via a variation of <code>mw.loader.load('/index.php?title=MediaWiki:Mydata.js&action=raw&ctype=text/javascript');</code>.
 +
{{Sparql2data}}
  
 
=== Federate queries ===
 
=== Federate queries ===
Line 20: Line 24:
 
     SELECT ?item ?itemLabel {
 
     SELECT ?item ?itemLabel {
 
       ?item prop:P2 entity:Q5.
 
       ?item prop:P2 entity:Q5.
       SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
+
       SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
 
     }
 
     }
 
   }
 
   }
Line 37: Line 41:
 
=== ✅ Language () → List of wd lexemes ([[:d:Q150]]) ===
 
=== ✅ Language () → List of wd lexemes ([[:d:Q150]]) ===
 
:''Strange query from [[User:VIGNERON/common.js]]''
 
:''Strange query from [[User:VIGNERON/common.js]]''
<pre>
+
{| style="width:100%"
 +
|- style="vertical-align:top;"
 +
|style="padding: 0 3em;width:60%"|
 +
<syntaxhighlight lang="sparql">
 
SELECT DISTINCT ?lexemeLabel ?lexeme
 
SELECT DISTINCT ?lexemeLabel ?lexeme
 
WITH {
 
WITH {
Line 56: Line 63:
 
   }
 
   }
 
}
 
}
</pre>
+
</syntaxhighlight>
 +
|
 +
|}
 +
 
 
== Speakers ==
 
== Speakers ==
 
=== ✅ Speakers → Largest number of languages recorded and known ===
 
=== ✅ Speakers → Largest number of languages recorded and known ===
Line 250: Line 260:
 
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
 
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
 
ORDER BY DESC(?count)
 
ORDER BY DESC(?count)
 +
</query>
 +
|}
 +
 +
=== <!-- ✅--> Speakers → Map of speakers by place ===
 +
{| style="width:100%"
 +
|- style="vertical-align:top;"
 +
|style="padding: 0 3em;width:60%"|
 +
<syntaxhighlight lang="sparql">
 +
PREFIX ll: <https://lingualibre.org/entity/>
 +
PREFIX llt: <https://lingualibre.org/prop/direct/>
 +
 +
SELECT DISTINCT ?lLabel ?coord WITH {
 +
  SELECT ?lLabel ?loc WHERE {
 +
    SERVICE <https://lingualibre.org/sparql> {
 +
      select DISTINCT ?lLabel ?loc {
 +
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
 +
        ?l llt:P2 ll:Q3 ;
 +
          llt:P14 ?loc .
 +
        ?record llt:P5 ?l. 
 +
        FILTER (regex(?loc, '^Q'))
 +
      }
 +
    }
 +
  }
 +
} AS %i
 +
WHERE {
 +
  INCLUDE %i
 +
  BIND (URI(CONCAT("http://www.wikidata.org/entity/", ?loc)) AS ?locURL)
 +
  SERVICE <https://query.wikidata.org/sparql> {
 +
    select * {
 +
      ?locURL wdt:P625 ?coord .
 +
    }
 +
  }
 +
}
 +
 +
</syntaxhighlight>
 +
||
 +
<query _pagination="20">
 
</query>
 
</query>
 
|}
 
|}

Revision as of 09:30, 18 January 2022


Draft
Twemoji12 1f3d7.svg
Twemoji12 1f3d7.svg

This page is a work in progress.

Tools

Lexemes Queries Generator


SPARQL to persitent data

Some SPARQL queries are meaningful but heavy and overly slow. His administrator tool stores or updates the response data on LinguaLibre, within a wikipage. Stored data can then be loaded in <1 second via a variation of mw.loader.load('/index.php?title=MediaWiki:Mydata.js&action=raw&ctype=text/javascript');.


Federate queries

  • To query Lingualibre from Wikidata, use SERVICE <https://lingualibre.org/sparql>.
  • To query Wikidata from LinguaLibre, use SERVICE <https://query.wikidata.org/sparql>.

The following query shows a simple example of retrieving data of LinguaLibre from Wikidata Query Service. It lists the existing levels in LinguaLibre.

PREFIX prop: <https://lingualibre.org/prop/direct/>
PREFIX entity: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item ?itemLabel {
      ?item prop:P2 entity:Q5.
      SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
    }
  }
}

Languages

✅ Language () → List of LL languages with wd speaker population

Lexemes

✅ Language (d:Q12107) → List of wd lexemes

Example : Q12107 breton.

✅ Language () → List of wd lexemes with LL audio

✅ Language () → List of wd lexemes with LL audio and wd translation (d:Q150)

✅ Language () → List of wd lexemes (d:Q150)

Strange query from User:VIGNERON/common.js
SELECT DISTINCT ?lexemeLabel ?lexeme
WITH {
  SELECT ?lexeme ?lexemeLabel ?lexical_category WHERE {
    ?lexeme a ontolex:LexicalEntry ;
            dct:language wd:Q12107 ; 
            wikibase:lemma ?lexemeLabel .
    OPTIONAL {
      ?lexeme wikibase:lexicalCategory ?lexical_category .
    }
  }
} AS %results
WHERE {
  INCLUDE %results
  OPTIONAL {        
    ?lexical_category rdfs:label ?lexical_categoryLabel .
    FILTER (LANG(?lexical_categoryLabel) = "en")
  }
}

Speakers

✅ Speakers → Largest number of languages recorded and known

#Title: Speakers with recordings largest number of languages and known languages
SELECT ?speaker ?speakerLabel ?count ?languages
# Get audios, language, speaker triplet
WITH {
  SELECT DISTINCT ?speaker ?language {
    ?audio prop:P4 ?language;
           prop:P5 ?speaker.
  }
} AS %speakers
# Get the count of languages per each speaker
WITH {
  SELECT ?speaker (COUNT(?speaker) AS ?count) {
    INCLUDE %speakers.
  }
  GROUP BY ?speaker
  ORDER BY DESC(?count)
} AS %countOfLanguagesRecordedPerSpeaker
# Get the maximum number of languages per each speaker
WITH {
  SELECT (MAX(?count) AS ?maxNumberOfLanguagesRecorded) {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
  }
} AS %maxNumberOfLanguagesRecorded
# Get those speakers whose count equals the maximum number of languages
WITH {
  SELECT ?speaker ?count {
    INCLUDE %countOfLanguagesRecordedPerSpeaker.
    INCLUDE %maxNumberOfLanguagesRecorded.
    FILTER(?count = ?maxNumberOfLanguagesRecorded).
  }
} AS %speakersWithMostNumberOfLanguagesRecorded
# Get the languages of those speakers that have recorded audios in the
# most number of languages
WITH {
  SELECT ?speaker (GROUP_CONCAT(?languageLabel; SEPARATOR = ", ") AS ?languages) {
    INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
    ?speaker prop:P4 [
        rdfs:label ?languageLabel
      ]
    FILTER(LANG(?languageLabel) = "en").
  }
  GROUP BY ?speaker
} AS %languagesOfSpeakersWithMostNumberOfLanguagesRecorded
{
  INCLUDE %speakersWithMostNumberOfLanguagesRecorded.
  INCLUDE %languagesOfSpeakersWithMostNumberOfLanguagesRecorded.
  ?speaker rdfs:label ?speakerLabel.
  FILTER(LANG(?speakerLabel) = "en")
}
... Loading ...


✅ Speakers → Countries with most speakers

SELECT ?country ?continentLabel ?ISO3 ?countryLabel (COUNT(?country) AS ?count)
WITH {
  SELECT DISTINCT ?speaker {
    ?speaker prop:P2 entity:Q3;
  }
} AS %speakers
WITH {
  SELECT DISTINCT
    ?speaker
    ?country
    ?countryLabel
    ?ISO3
    ?continentLabel
  {
    INCLUDE %speakers.
    ?speaker prop:P14 ?residence.
    # Avoids weird errors.
    FILTER(REGEX(?residence, "^Q[0-9]+$"))
    BIND(IRI(CONCAT('http://www.wikidata.org/entity/', ?residence)) AS ?residenceId)
    
    # Get country from wikidata
    SERVICE <https://query.wikidata.org/sparql> {
      ?residenceId wdt:P17 ?country.
      ?country rdfs:label ?countryLabel;
               wdt:P298 ?ISO3;
               wdt:P30 ?continent.
      ?continent rdfs:label ?continentLabel.
      FILTER(LANG(?countryLabel) = "en").
      FILTER(LANG(?continentLabel) = "en").
    }
  }
} AS %speakersWithCountries
{
  INCLUDE %speakersWithCountries.
}
GROUP BY ?country ?continentLabel ?ISO3 ?countryLabel
ORDER BY DESC(?count)
... Loading ...

Speakers → Map of speakers by place

PREFIX ll: <https://lingualibre.org/entity/>
PREFIX llt: <https://lingualibre.org/prop/direct/>

SELECT DISTINCT ?lLabel ?coord WITH {
  SELECT ?lLabel ?loc WHERE {
    SERVICE <https://lingualibre.org/sparql> { 
      select DISTINCT ?lLabel ?loc { 
        SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
        ?l llt:P2 ll:Q3 ;
           llt:P14 ?loc . 
        ?record llt:P5 ?l.   
        FILTER (regex(?loc, '^Q')) 
      } 
    }
  }
} AS %i
WHERE {
  INCLUDE %i
  BIND (URI(CONCAT("http://www.wikidata.org/entity/", ?loc)) AS ?locURL)
  SERVICE <https://query.wikidata.org/sparql> { 
    select * { 
      ?locURL wdt:P625 ?coord . 
    } 
  }
}
... Loading ...

See also

Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.jsLiLiZip.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports