Help

SPARQL for maintenance


SPARQL for maintenance gather some SPARQL queries which help to monitors errors and bring fix to our resources.


Speakers

✅ Speakers → Username containing reserved sign `-`

See also T297635
#defaultEndpoint:Lingualibre
SELECT *
WHERE {
  ?id prop:P2 entity:Q3 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    ?id rdfs:label ?name .
  }
  BIND (regex(STR(?name),"-") AS ?has_separator)
}
ORDER BY DESC (?has_separator)
... Loading ...
#defaultEndpoint:Lingualibre
SELECT ?has_separator (COUNT(?has_separator) AS ?found)
WHERE {
  ?id prop:P2 entity:Q3 .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
    ?id rdfs:label ?name .
  }
  
  BIND (regex(STR(?name),"-") AS ?has_separator)
  # filter( regex(?name, "-" ))
}
#ORDER BY DESC (?has_separator)
GROUP BY (?has_separator)
... Loading ...

✅ Speakers → Speakers of `zho` to change to `cmn`

Should be maintained to 0.
#defaultEndpoint:Lingualibre
SELECT ?langLabel ?isoLabel (SUBSTR(STR(?speaker),32) AS ?speaker) ?speakerLabel
WHERE {
  ?speaker prop:P4 entity:Q130 .   # Filter: P4 language is Q130 zho  ;
  ?speaker prop:P2 entity:Q3 .   # Filter: speakers
  ?speaker prop:P4 ?lang .
  ?lang  prop:P13 ?iso .
  # Add Labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } 
} ORDER BY ASC(?isoLabel)
# Helper: Chinese writing zho:Q130/wd:Q7850 ; Chinese mandarin cmn:Q113/wd:Q9192 .
... Loading ...

✅ Speakers → Undefined place of residence

The following may need precision. If stale, it mean none was found.
#defaultEndpoint:Lingualibre
SELECT ?item ?itemLabel ?filledGenderLabel
WHERE {
  ?item prop:P2 entity:Q3 .   # Filter: is Q3 `speaker`
  ?item prop:P14  .   # Filter: missing P14 `place of residence`   <-------------------------------------
  # Add labels to each variable used.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}
... Loading ...

✅ Speakers → Non-documented genders values

This query returns speakers who may have willfully declined to document a gender. See also list of genders.
#defaultEndpoint:Lingualibre
SELECT ?item ?itemLabel ?filledGenderLabel
WHERE {
  ?item prop:P2 entity:Q3 .  
  { ?item prop:P8 entity:Q710460 . }    # Filter: 'gender' is 'not documented'
  ?item prop:P8 ?filledGender .
  # Add labels to each variable used.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}
... Loading ...

✅ Speakers → missing gender value

This query returns speakers who have missing gender value. See also list of genders.
#defaultEndpoint:Lingualibre
SELECT ?speaker ?speakerLabel ?gender ?languageLabel (COUNT(?audio) AS ?audio)
WHERE {
  ?speaker prop:P2 entity:Q3 .
  FILTER NOT EXISTS { ?speaker prop:P8 ?gender }
  ?audio prop:P5 ?speaker .   # Filter: P5 'speaker' is Q445757 'SangeetaRH‎'
  ?audio prop:P4 ?language .  # Filter: P4 'language' is Q34 'Marathi'
  # Add labels
  SERVICE wikibase:label {bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en"} 
}
GROUP BY ?speaker ?speakerLabel ?genderLabel ?languageLabel  # Sorting first groups per language and speaker
ORDER BY DESC (?audio)
... Loading ...

Recordings

✅ Recordings → With ISO-639-3 `zho` to change to `cmn`

#defaultEndpoint:Lingualibre
SELECT ?langLabel ?isoLabel (SUBSTR(STR(?audio),32) AS ?Audio) ?audioLabel
WHERE {
  ?audio prop:P4 entity:Q130 .   # Filter: P4 language is Q130 zho  ;
  ?audio prop:P2 entity:Q2 .   # Filter: audios
  ?audio prop:P4 ?lang .
  ?lang  prop:P13 ?iso .
  # Add Labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } 
} ORDER BY ASC(?isoLabel)
# Helper: Chinese writing zho:Q130/wd:Q7850 ; Chinese mandarin cmn:Q113/wd:Q9192 .
... Loading ...


#defaultEndpoint:Lingualibre
SELECT ?isoLabel (COUNT(?iso) AS ?found)
WHERE {
  ?audio prop:P4 entity:Q130 .   # Filter: P4 language is Q130 zho  ;
  ?audio prop:P2 entity:Q2 .   # Filter: audios
  ?audio prop:P4 ?lang .
  ?lang  prop:P13 ?iso .
  # Add Labels
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . } 
} GROUP BY ?isoLabel
# Helper: Chinese writing zho:Q130/wd:Q7850 ; Chinese mandarin cmn:Q113/wd:Q9192
... Loading ...


Languages

✅ Languages → duplicata by Wikidata ID

Duplicated languages can be merged. If stale, it mean none was found.
#defaultEndpoint:Lingualibre
SELECT ?idWD (COUNT(?item) AS ?count) (GROUP_CONCAT(?item) AS ?items)
WHERE {
  ?item prop:P2 entity:Q4 ;
        prop:P12 ?idWD .
}
GROUP BY ?idWD
HAVING ( ?count > 1 )
... Loading ...

✅ Languages → languages by type (if it exists)

Note: query to improve.
#defaultEndpoint:Lingualibre
SELECT ?language ?languageLabel ?code ?typeLabel
WHERE {
  ?language prop:P2 entity:Q4 .
  OPTIONAL { ?language prop:P13 ?code } .
  OPTIONAL { 
    ?language rdfs:label ?languageLabel .
    FILTER( lang(?languageLabel) = "en" ) } .
  OPTIONAL { ?language prop:P24 ?type } .
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
  } 
}
... Loading ...

✅ Languages → language with media type (P24) is video (Q88890)

#defaultEndpoint:Lingualibre
SELECT ?id ?idLabel WHERE { 
  ?id prop:P2 entity:Q4 . 
  ?id prop:P24 entity:Q88890 . 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr,en". } 
}
... Loading ...

✅ Languages with media type video → written word (P7), url (P3) speakers (P5)

#defaultEndpoint:Lingualibre
SELECT ?word ?filename ?speaker WHERE { 
  ?record prop:P4 entity:Q99628 . 
  ?record prop:P2 entity:Q2 . 
  ?record prop:P7 ?word . 
  ?record prop:P3 ?filename . 
  ?record prop:P5 ?speakerItem . 
  ?speakerItem rdfs:label 
  ?speaker filter ( lang( ?speaker ) = "en" ) . 
}
... Loading ...


✅ Languages iso from SignIt → relevant data

See Translatewiki:Translating:Lingua_Libre_SignIt.
#defaultEndpoint:Wikidata
SELECT ?wdQid ?wikimediaCode ?englishName (GROUP_CONCAT(?nativeLabel ; separator=" | ") as ?nativeName)
WHERE {
    VALUES ?langCode { 
    "ar" "bn" "de" "en" "es" "fr" "he" "hi" "hy" "id" "ja" 
    "kk-cyrl" "ko" "mk" "nb" "ru" "scn" "sv" "sw" "tl" "tk" 
    "tr" "zh-hans" "zh-hant"     
 }
    ?langId wdt:P31 wd:Q34770;
            wdt:P218 ?langCode;
            wdt:P424 ?wikimediaCode;
            wdt:P1705 ?native;
            # wdt:P10369 ?lingualibreQid ;
            rdfs:label ?englishName .
  FILTER ( lang(?englishName) = "en" )
  BIND(REPLACE(STR(?langId), ".*/Q", "Q") AS ?wdQid)
  BIND(CONCAT(UCASE(SUBSTR(?native, 1 , 1 )), SUBSTR(?native, 2 )) AS ?nativeLabel)
}
GROUP BY ?wdQid ?wikimediaCode ?englishName
... Loading ...

Other

All properties with their Wikidata equivalent (if it exists)

#defaultEndpoint:Lingualibre
SELECT * WHERE {
  { ?propLili rdf:type owl:ObjectProperty }
  UNION
  { ?propLili rdf:type owl:DatatypeProperty }
  OPTIONAL { 
    ?propLiliClaim <http://wikiba.se/ontology#directClaim> ?propLili ;
       prop:P12 ?propWD
  }
}
... Loading ...

See also

Lingua Libre technical helps
Template {{Speakers category}} • {{Recommended lists}} • {{To iso 639-2}} • {{To iso 639-3}} • {{Userbox-records}} • {{Bot steps}}
Audio files How to create a frequency list?Convert files formatsDenoise files with SoXRename and mass rename
Bots Help:BotsLinguaLibre:BotHelp:Log in to Lingua Libre with PywikibotLingua Libre Bot (gh) • OlafbotPamputtBotDragons Bot (gh)
MediaWiki MediaWiki: Help:Documentation opérationelle MediawikiHelp:Database structureHelp:CSSHelp:RenameHelp:OAuthLinguaLibre:User rights (rate limit) • Module:Lingua Libre record & {{Lingua Libre record}}JS scripts: MediaWiki:Common.jsLastAudios.jsSoundLibrary.jsItemsSugar.jsLexemeQueriesGenerator.js (pad) • Sparql2data.js (pad) • LanguagesGallery.js (pad) • Gadgets: Gadget-LinguaImporter.jsGadget-Demo.jsGadget-RecentNonAudio.js
Queries Help:APIsHelp:SPARQLSPARQL (intermediate) (stub) • SPARQL for lexemes (stub) • SPARQL for maintenanceLingualibre:Wikidata (stub) • Help:SPARQL (HAL)
Reuses Help:Download datasetsHelp:Embed audio in HTML
Unstable & tests Help:SPARQL/test
Categories Category:Technical reports