LinguaLibre
Difference between revisions of "Chat room/Archives/2022"
< LinguaLibre:Chat room
WikiLucas00 (talk | contribs) (Archiving posts older than 6 months) |
|||
(3 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
== Merging of items about languages == | == Merging of items about languages == | ||
+ | :''{{Done}} final fix for Chinese writing, Duala, Mossi on 07:01, 24 December 2023 (UTC) by [[User:Dragons_Bot]]/[[USer:Yug]].'' | ||
:''See also [[Help:SPARQL]] and [[Help:SPARQL for maintenance]].'' | :''See also [[Help:SPARQL]] and [[Help:SPARQL for maintenance]].'' | ||
Hi y'all, | Hi y'all, | ||
Line 36: | Line 37: | ||
Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|talk]]) 09:29, 19 February 2022 (UTC) | Cheers, [[User:VIGNERON|VIGNERON]] ([[User talk:VIGNERON|talk]]) 09:29, 19 February 2022 (UTC) | ||
+ | [[File:Dragons bot on babel tower.jpg|Thumb|right|300px|{{u|Dragons Bot}} monitoring Lingualibre.]] | ||
:Thanks VIGNERON for finding them and cleaning it. Now what to do with recording items that use the doublon language item (for example with [[Special:WhatLinksHere/Q52071|Duala]]). I think we must modify {{P|4}} for all recording items so that languages are not counted twice and also to clean up the database (there are also transcription problems for items listed in the Duala example). [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 16:16, 19 February 2022 (UTC) | :Thanks VIGNERON for finding them and cleaning it. Now what to do with recording items that use the doublon language item (for example with [[Special:WhatLinksHere/Q52071|Duala]]). I think we must modify {{P|4}} for all recording items so that languages are not counted twice and also to clean up the database (there are also transcription problems for items listed in the Duala example). [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 16:16, 19 February 2022 (UTC) | ||
::Thank you {{ping|VIGNERON}} for pointing these out. As you can see, most of them were not created manually but using the tool (the pages wheighted circa 4kB, with labels in many languages). It seems that the Lingua Importer tool has (or had?) a problem, but I could not reproduce it (trying to import languages that are already in LL wikibase).<br/> During last summer's hackathon we talked a bit about languages in our wikibase, but I can't remember why we need to have language elements in our Wikibase, and not just use the existing base of WikiData 🤔 — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 23:23, 19 February 2022 (UTC) | ::Thank you {{ping|VIGNERON}} for pointing these out. As you can see, most of them were not created manually but using the tool (the pages wheighted circa 4kB, with labels in many languages). It seems that the Lingua Importer tool has (or had?) a problem, but I could not reproduce it (trying to import languages that are already in LL wikibase).<br/> During last summer's hackathon we talked a bit about languages in our wikibase, but I can't remember why we need to have language elements in our Wikibase, and not just use the existing base of WikiData 🤔 — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 23:23, 19 February 2022 (UTC) | ||
+ | :Hello {{Ping|WikiLucas00|VIGNERON|Pamputt}} | ||
+ | :{{Done}} Issue solved. After 1+ year monitoring of the issue, learning bots ({{u|Dragons_Bot}}, and 6+ hours coding, all existing recordings with | ||
+ | :* erroneous [[Q130]] (Chinese writing), | ||
+ | :* erroneous [[Q52071]] (Duala) | ||
+ | :* erroneous [[Q170137]] (Mossi) | ||
+ | :were edited to point to the correct Qid value. On both Lingualibre and Wikimedia Commons. So there is no more remaining duplicated languages. | ||
+ | :Wishing everyone a good Xmas season ! [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 07:01, 24 December 2023 (UTC) | ||
== MediaWiki customizations of LinguaLibre == | == MediaWiki customizations of LinguaLibre == |
Latest revision as of 07:08, 24 December 2023
Comment changer de pseudonyme
Bonjour, sur les projets de Wikimedia, mon pseudonyme est Manjiro91 (et anciennement GamissimoYT), comment change-t-on de pseudonyme ? GamissimoYT (talk) 17:13, 11 January 2022 (UTC)
- Bonjour GamissimoYT. Lingua Libre utilise le même pseudo que celui qui est en utilisation sur Wikimedia Commons. Donc si vous voulez utiliser le pesudonyme Manjiro91, déconnectez-vous de Lingua Libre, puis de Wikimedia Commons. Ensuite, connectez vous à Commons avec le pseudo Manjiro91 et enfin reconnectez vous à Lingua Libre. Pamputt (talk) 21:05, 11 January 2022 (UTC)
@Pamputt Mon pseudonyme Wikimedia Commons est Manjiro91 (anciennement GamissimoYT mais le changement de pseudonyme ne s'effectue pas sur LiLi. GamissimoYT (talk) 13:38, 12 January 2022 (UTC)
- @GamissimoYT , tu as bien fait les connexions/déconnexions dans l'ordre que j'ai indiqué ? Si tu es sûr que tu es connecté avec Manjiro91 sur Wikimedia Commons, alors tu peux essayer de te déconnecter de Lingua Libre et te reconnecter dans la foulée. Essayer de vider le cache du navigateur peut peut-être aidé aussi. Pamputt (talk) 07:37, 13 January 2022 (UTC)
Merging of items about languages
- Done final fix for Chinese writing, Duala, Mossi on 07:01, 24 December 2023 (UTC) by User:Dragons_Bot/USer:Yug.
- See also Help:SPARQL and Help:SPARQL for maintenance.
Hi y'all,
For the record, I just merge a couple of items about the same language:
- Duala (Q52071) in Duala (Q73)
- Māori (Q139228) in Māori (Q183)
- Mossi (Q170137) in Mossi (Q359)
- Meitei language (Q683869) in Meitei language (Q418)
- Algerian Arabic (Q646169) in Algerian Arabic (Q6714)
- Eton language (Q570518) in Eton language (Q52069)
- Guianan Creole (Q538624) in Guianan Creole (Q84030)
- Egyptian Arabic (Q646173) in Egyptian Arabic (Q390314)
- Cypriot Arabic (Q646161) in Cypriot Arabic (Q502754)
- Futunan (Q570510) in Futunan (Q489393)
I detected them with this SPARQL query:
SELECT ?idWD (COUNT(?item) AS ?compte) (GROUP_CONCAT(?item) AS ?items) WHERE {
?item prop:P2 entity:Q4 ; prop:P12 ?idWD .
}
GROUP BY ?idWD
HAVING ( ?compte > 1 )
Ping @WikiLucas00 it seems you are responsible for some of them...
Cheers, VIGNERON (talk) 09:29, 19 February 2022 (UTC)
- Thanks VIGNERON for finding them and cleaning it. Now what to do with recording items that use the doublon language item (for example with Duala). I think we must modify language (P4) for all recording items so that languages are not counted twice and also to clean up the database (there are also transcription problems for items listed in the Duala example). Pamputt (talk) 16:16, 19 February 2022 (UTC)
- Thank you @VIGNERON for pointing these out. As you can see, most of them were not created manually but using the tool (the pages wheighted circa 4kB, with labels in many languages). It seems that the Lingua Importer tool has (or had?) a problem, but I could not reproduce it (trying to import languages that are already in LL wikibase).
During last summer's hackathon we talked a bit about languages in our wikibase, but I can't remember why we need to have language elements in our Wikibase, and not just use the existing base of WikiData 🤔 — WikiLucas (🖋️) 23:23, 19 February 2022 (UTC)
- Thank you @VIGNERON for pointing these out. As you can see, most of them were not created manually but using the tool (the pages wheighted circa 4kB, with labels in many languages). It seems that the Lingua Importer tool has (or had?) a problem, but I could not reproduce it (trying to import languages that are already in LL wikibase).
- Hello @WikiLucas00, VIGNERON, & Pamputt
- Done Issue solved. After 1+ year monitoring of the issue, learning bots (Dragons_Bot, and 6+ hours coding, all existing recordings with
- were edited to point to the correct Qid value. On both Lingualibre and Wikimedia Commons. So there is no more remaining duplicated languages.
- Wishing everyone a good Xmas season ! Yug (talk) 07:01, 24 December 2023 (UTC)
MediaWiki customizations of LinguaLibre
Love the MediaWiki skin of LinguaLibre and I am curious of skin and customizations made. Who are the authors? (can not see credits) --Zblace (talk) 10:15, 19 February 2022 (UTC)
- The skin is known as BlueLL. The source code is available on github. It has been developed by Wikimedia France in 2020. That's said, it is true there is no licence and credits on Github. I will ask to Adélaïde Calais WMFr if she remember anything so that I can the missing informations. Pamputt (talk) 16:58, 19 February 2022 (UTC)
- Hi @Zblace , this skin's author is User:0x010C, and its opensource. Can be reused freely. Yug (talk) 22:45, 22 May 2022 (UTC)
New property: translation
Hello, I've created translation (P38) to be used in case there is no writing in the recording language but instead a translation in the vehicular language. See for example what I did here and there. Do you agree with that? Any comment? Pamputt (talk) 16:33, 19 February 2022 (UTC)
- It's a good idea! Many users tend to add a translation as they find it important for other people to have. It will also be handy for cases like your second example, where we only have the translation but not the transcription of the source language: we will be able to query the base to see all audios of a language that have a translation. — WikiLucas (🖋️) 23:28, 19 February 2022 (UTC)
- I am thinking about a way to populate automatically this property via the Record Wizard. Currently, it seems that the Record Wizard populates qualifier (P18) when something is written between brackets (see fils (pluriel de fil) (Q1685) for example but I have not checked recently). So, if we modify the Record Wizard code, it is possible to recognize this is a translation in another language and so to populate translation (P38). But I would like to be sure to propose the best way to do it before asking for such development. The idea is to be managed automatically (or at least not completely manually). Pamputt (talk) 00:18, 20 February 2022 (UTC)
Lingua Libre Wishlist for 2022-2023
Hi everyone !
This week, Wikimedia France is preparing its budget for the fiscal year to come : July 2022 to June 2023. If there are things you would like to see done or to do with our help on Lingua Libre, please share it on this page : https://lingualibre.org/wiki/LinguaLibre:2022-2023_projection
Have a great week-end ! --Adélaïde Calais WMFr (talk) 17:23, 11 March 2022 (UTC)
- marreromarco Thank you for your suggestions. However, I have some reservations about "Add function to "Request" a Pronunciation to Native Speakers" at this current stage for two reasons. First, this will require quite a bit of moderation to correct requests for grammar and spelling (e.g. HASBAND) as well as remove terrible requests. This will place a large burden on a few users and can easily lead to questionable decisions by moderators. Second, Forvo is flooded with requests that are either overly specific (e.g. "He came back from abyss and won the tie.") and, therefore, likely benefit only one user. IMHO, Rdrg109 proposal to focus on providing pronunciations for entries on the various wiktionaries is a better approach to building up the LL at this point. It will provide a solid foundation for users to find any word in LL. It might be a better time to open up LL to general requests once this project is completed and the community has grown. Languageseeker (talk) 15:49, 21 May 2022 (UTC)
How to get the city country label in SPARQL
- See also Help:SPARQL.
I'm working on an Anki extension for LL, but I'm having a little trouble writing the sparql query. In short, I want to be able to get the city and country for a recording in LL. However, when I query P14, I get the link to the item instead of 'residence': {'type': 'literal', 'value': 'Q142'} or 'residence': {'type': 'literal', 'value': 'Q142'}. Instead I hope to get city:"" and country "France" for the first query city:"Paris" and country:"France" for the second one. Any ideas? Languageseeker (talk) 20:23, 19 May 2022 (UTC)
- Hi Languageseeker thanks for your work on a Anki extension. Could you post here the query you have now? Pamputt (talk) 16:58, 20 May 2022 (UTC)
- Hi Pamputt . The query that I'm using is a very lightly modified version of the bot query.
ENDPOINT = "https://lingualibre.org/bigdata/namespace/wdq/sparql" API = "https://lingualibre.org/api.php" BASEQUERY = """ SELECT DISTINCT ?record ?file ?transcription ?recorded ?languageIso ?languageQid ?languageWMCode ?residence ?learningPlace ?languageLevel ?speaker ?linkeduser WHERE { ?record prop:P2 entity:Q2 . ?record prop:P3 ?file . ?record prop:P4 ?language . ?record prop:P5 ?speaker . ?record prop:P6 ?recorded . ?record prop:P7 ?transcription . ?language prop:P13 ?languageIso. ?speakerLanguagesStatement llq:P16 ?languageLevel . ?speaker prop:P11 ?linkeduser . ?speaker prop:P14 ?residence . ?speaker llp:P4 ?speakerLanguagesStatement . ?speakerLanguagesStatement llv:P4 ?speakerLanguages . OPTIONAL { ?speakerLanguagesStatement llq:P16 ?languageLevel . } FILTER( ?speakerLanguages = ?language) . SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . } #filters }"""
- Currently, I'm running it with filters = "" because it seems that a query for a single term takes around 70s, while fetching a single transcription takes about 145 seconds. My plan is to group the results by transcription and then write that into a json file to avoid the costly query. Basically, I need the speaker name, the term, their country, their city, the ISO code of the language, date created, and the filename, languageLevel.
- For example, for the term un chien, the json would look like:
- { "term": {"un chien": {"speaker": "Julien Baley", "language": "fra", "city": "", "country": "France", "recorded": "2020-11-27", "filename": "LL-Q150_(fra)-Julien_Baley-un_chien.wav", "languageLevel": "Q15"}}} Languageseeker (talk) 23:17, 20 May 2022 (UTC)
Contribution: Python program to download all files created by a specific user
- See also Help:Download datasets.
I wrote a python program that downloads all the files created by one user. For video files, it downloads the full webm. For audio files, the default is to download the wave file. However, for audio files, you can optionally choose either mp3 or ogg files. Currently, the configuration requires a minor modification of lluad.py. If there is strong demand, I will write a command line parser for it. Please report any bugs or errors on the github page. Feature requests are welcome. Languageseeker (talk) 02:28, 20 May 2022 (UTC)
- @Languageseeker please add your tool to Help:Download datasets. It lists several tools with different specifics, your tool is welcome and may help some Python users as well. Yug (talk) 22:41, 22 May 2022 (UTC)
Garbage Values in prop:P14
- See also Help:SPARQL for maintenance and Help:SPARQL_for_maintenance#.E2.9C.85_Speakers_.E2.86.92_Undefined_place_of_residence.
As part of my Anki project, I queried the entire LL database and I'm trying to parse the output of ?speaker prop:P14 ?residence. I've noticed that there are a number of garbage values in provided for P14, such as Q1, Q2, Q103962887, Q6099648, Strasbourg. There seem to be three cases.
- Users wishing to enter an extremely vague place such as Earth or the Universe. These should be set to None.
- Users accidentally linking to a disambiguation page. These require correction.
- Users not even entering a Wikidata item which require manual correction.
To solve the root of the problem, I propose that P14 should be restricted to only Wikidata items that exist and have P17. Languageseeker (talk) 21:22, 25 May 2022 (UTC)
- @Languageseeker it's a good find. If you still have that SPARQL query under hand please add it into Help:SPARQL for maintenance. Yes, it's something we should clean up i think. There may be some few case where the speaker dont want to share its location but in 95% of cases i think we can go ahead, correct or ask them to correct it. Yug (talk) 12:39, 26 May 2022 (UTC)
- I noticed that when creating a new speaker, place of learning is optional. Not cool. Yug (talk) 21:32, 27 May 2022 (UTC)
- @YUG For the life of me, I can't get the federated query to work, but I have a separate query to get the location and country labels from wikidata. These are the problematic ones. Note, that Q20 is on the list because Q20 "Norway" is missing P17
- ['MichaelSchoenitzer', None]
- ['D.Muralidharan', None]
- ['Kaderousse', None]
- ['Krokus', None]
- ['विदुला टोकेकर', 'Q103962887']
- ['DoctorandusManhattan', 'Q2']
- ['Justforoc', 'Q2']
- ['Student16 de', None]
- ['Didierwiki', 'Q6099648']
- ['Sarah2149', None]
- ['DomesticFrog', 'Q1']
- ['Drkanchi', None]
- ['Satdeep Gill', None]
- ['Iwan.Aucamp', 'Q20']
- ['Skimel', 'Q2']
- ['Abeɣzan', None]
- ['Gibraltar Rocks', None]
- ['Bomdapatrick', None]
- ['Ibtissam RAHMOUNI', None]
- ['Trabelsiismail', None]
- ['Ziko', 'Q2']
- ['Youcefelallali', None]
- ['Foxxipeter7', None]
- ['Celevra089', None]
- ['Bodhisattwa', None]
- ['Atudu', None]
- ['KageyamaxNishinoya', 'Q30915818']
- ['Darkdadaah', None]
- ['JayashreeVI', None]
- ['रश्मीमहेश', 'Q103962887']
- ['गीता गोविंद नेने', 'Q103893785']
- ['Awangba Mangang', None]
- ['Abigaljo', None]
- ['FaelDaug', 'Q29423162']
Languageseeker (talk) 02:16, 30 May 2022 (UTC)
Anki Extension Release
I just released Lingua Libre and Forvo Addon. It has a number of advanced options to improve search results and can run either as a batch operation or on an individual note.
By default, it first checks Lingua Libre and, if there are no results on Lingua Libre, it then checks Forvo. To run as a pure Lingua Libre extension, you will need to set "disable_Forvo" to True
in your configuration section.
Please reports bugs, issues, ideas on github. I would love any feedback. Languageseeker (talk) 02:23, 31 May 2022 (UTC)