Difference between revisions of "YaronSh"

Latest revision as of 08:04, 31 December 2023

Welcome

YaronSh, I noticed the many translations you provided. Thank you a lot.
As for your recording, they are sent to Wikimédia Commons, then few bots inject them into various Wiktionaries (fr, en, pl, ori, oci,...). You can also download all your recordings from there.
Again, thank you for those records. Yug (talk) 20:37, 28 December 2023 (UTC)

@Yug Thank you so much, this is a very important project and I'm really glad I could help, nonetheless there are some issues I need your help with, the Wikidata query is misleading (The one building the list of Hebrew words without recordings), first of all we cannot ask the user to contribute a term without Niqqud (The Hebrew diacritical vowels) since it can have several meanings and it's hard to distinct between the variants that way, second, we're collecting the names of the items, which cannot have audio pronunciation attached to them, the audio can only be attached to a form.

I consulted the Lexicographical Wikidata community and we've come up with a solution that requires some SPARQL magic, and will probably require 2 different list (or maybe more):

Shallow list - the names of the items given that the matching form has no audio file.
Deep list - all the different forms that has "Niqqud" with no audio files (This is trickier since most of the forms doesn't have Niqqud).

Niqqud is being represented by the spelling variant of he-x-Q21283070, I created a working example (with the obvious issue of checking for audio file while there can't be any audio file directly attached to the item at this level):

SELECT  *
WHERE {
      ?l dct:language wd:Q9288; 
      wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
      minus { ?l wdt:P443 [] }
} LIMIT 50

The correction from the Wikidata community:

select * {
  ?l dct:language wd:Q9288;
     wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
  ?l ontolex:lexicalForm ?f.
  ?f ontolex:representation ?representation filter (lang(?representation) = "he-x-q21283070").
  minus { ?f wdt:P443 [] }
} limit 50

Mixing between those approaches will give us the "shallow repository", I also started a new discussion on the chat, but I've evolved since then.

EDIT:

This is the correct form for shallow repository (Contributed by the Wikidata community, thanks Nikki!):

select * {
  ?l dct:language wd:Q9288;
     wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
  ?l ontolex:lexicalForm ?f.
  ?f ontolex:representation ?heb.
  minus { ?f wdt:P443 [] }
} limit 50

@@ Line 1: / Line 1: @@
-{{Welcome/lang|user=YaronSh|welcominguser=Pamputt|1=[[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 13:45, 16 December 2023 (UTC)}}
+<!-- {{Welcome/lang|user=YaronSh|welcominguser=Pamputt|1=[[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 13:45, 16 December 2023 (UTC)}} -->
 == Welcome ==
 YaronSh, I noticed the many translations you provided. Thank you a lot.<br>
 As for your recording, they are sent to Wikimédia Commons, then few bots inject them into various Wiktionaries (fr, en, pl, ori, oci,...). You can also download all your recordings from [https://lingualibre.org/LanguagesGallery/?search=hebr there].<br>Again, thank you for those records. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 20:37, 28 December 2023 (UTC)
+:{{reply to|Yug}} Thank you so much, this is a very important project and I'm really glad I could help, nonetheless there are some issues I need your help with, the Wikidata query is misleading (The one building the list of Hebrew words without recordings), first of all we cannot ask the user to contribute a term without Niqqud (The Hebrew diacritical vowels) since it can have several meanings and it's hard to distinct between the variants that way, second, we're collecting the names of the items, which cannot have audio pronunciation attached to them, the audio can only be attached to a form.
+:I consulted the Lexicographical Wikidata community and we've come up with a solution that requires some SPARQL magic, and will probably require 2 different list (or maybe more):
+# Shallow list - the names of the items given that the matching form has no audio file.
+# Deep list - all the different forms that has "Niqqud" with no audio files (This is trickier since most of the forms doesn't have Niqqud).
+:Niqqud is being represented by the spelling variant of he-x-Q21283070, I created a working example (with the obvious issue of checking for audio file while there can't be any audio file directly attached to the item at this level):
+:<syntaxhighlight lang="sparql">
+SELECT  *
+WHERE {
+      ?l dct:language wd:Q9288;
+      wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
+      minus { ?l wdt:P443 [] }
+} LIMIT 50
+</syntaxhighlight>
+:The correction from the Wikidata community:
+:<syntaxhighlight lang="sparql">
+select * {
+  ?l dct:language wd:Q9288;
+     wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
+  ?l ontolex:lexicalForm ?f.
+  ?f ontolex:representation ?representation filter (lang(?representation) = "he-x-q21283070").
+  minus { ?f wdt:P443 [] }
+} limit 50
+</syntaxhighlight>
+:Mixing between those approaches will give us the "shallow repository", I also started a new discussion on the [[LinguaLibre:Chat_room#Hebrew_diacritics_.28Niqqud.29|chat]], but I've evolved since then.
+:<h5> EDIT: </h5>
+:This is the correct form for shallow repository (Contributed by the Wikidata community, thanks Nikki!):
+:<syntaxhighlight lang="sparql">
+select * {
+  ?l dct:language wd:Q9288;
+     wikibase:lemma ?heb filter (lang(?heb) = "he-x-q21283070").
+  ?l ontolex:lexicalForm ?f.
+  ?f ontolex:representation ?heb.
+  minus { ?f wdt:P443 [] }
+} limit 50
+</syntaxhighlight>