User talk
Difference between revisions of "Yug"
Line 51: | Line 51: | ||
split -d -l 2000 --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-" | split -d -l 2000 --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-" | ||
</pre> | </pre> | ||
+ | |||
+ | == Feature idea : table tacking existing languages on LinguaLibre.fr == | ||
+ | :[[LinguaLibre:Sparql]] | ||
+ | I have difficulties to keep track all the languages I helped to add to LinguaLibre. Taiwan has 16 languages and 42 locals variations. Maybe it already exists... If not, It would be a positive have a '''sortable table''' such as below : | ||
+ | {| class="wikitable sortable" | ||
+ | ! Wikidata qid || LinguaLibre qid || English name || Language group || Active ? || Numb. or recordings | ||
+ | |- | ||
+ | | [[:wikidata:Q715766|Q715766]] || [[Q51302]] || [[:en:Atayal|Atayal]] || Taiwanese || Low || 4 | ||
+ | |- | ||
+ | | [[:wikidata:Q718269|Q718269]] || [[Q51871]] || [[:en:Sakizaya|Sakizaya]] || Taiwanese || Low || 6 | ||
+ | |- | ||
+ | | ... || ... || .... || ... || ... || ... | ||
+ | |} | ||
+ | [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:16, 31 December 2018 (UTC) | ||
+ | :I'am finding out how [[LinguaLibre:Stats]] is coded, maybe I will be able to produce something :D [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:59, 31 December 2018 (UTC) | ||
+ | |||
+ | <query _pagination="10" wdata="Wikidata" language="Item" name="Language" nb="Number of records" next="next"> | ||
+ | # Q4: language; | ||
+ | # Q2: record; | ||
+ | # P2: instance of; | ||
+ | # P4: language; | ||
+ | # P12: wikidata id; | ||
+ | # P13: ISO 639-3 code; | ||
+ | select ?language (if( ?language = entity:Q4, '???', ?languageLabel ) as ?name) (COUNT(?record) as ?nb) | ||
+ | where { | ||
+ | ?record prop:P2 entity:Q2 . | ||
+ | ?record prop:P4 ?lang . | ||
+ | |||
+ | BIND( IF( isBLANK(?lang), entity:Q4, ?lang ) as ?language ). | ||
+ | |||
+ | SERVICE wikibase:label { | ||
+ | bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . | ||
+ | ?language rdfs:label ?languageLabel. | ||
+ | } | ||
+ | } | ||
+ | GROUP BY ?language ?languageLabel | ||
+ | ORDER BY DESC(?nb) | ||
+ | </query> |
Revision as of 13:28, 31 December 2018
dev
Salut Yug,
Petit rappel que pour faire des tests, il y a l'instance de développement https://v2.lingualibre.fr sur laquelle tu peux faire tous les essais que tu veux, à n'appliquer ici qu'une fois que c'est fin prèt. Ça évitera à des gens de tomber sur des trucs bizarre en chargeant le site au mauvais moment ;).
Bisous — 0x010C ~talk~ 15:45, 26 December 2018 (UTC)
PS : et ne laisse pas traîner des pages inutiles (MediaWiki:Recordwizard) ; en l'occurence ça bloque notamment d'éventuelles futur changements et traductions poussés par translatewiki.
- Piouf! T'es vraiment tombé dessus !!! *0* !!! J'ai fais aussi vite que possible !!!! Merci pour ta vigilance, tout me semblait bien restoré, sauf si j'étais aveuglé par du cache !... Ah! v2 c'est vrai ! Tu peux me mettre admin ??? je voudrais tester le support d'images dans https://v2.lingualibre.fr/wiki/MediaWiki:Sidebar ! Yug (talk) 20:09, 26 December 2018 (UTC)
- PS: Je ne comprends pas bien ces questions / pages / balises de traductions... Yug (talk) 20:10, 26 December 2018 (UTC)
About
Ton intro et l'historique sont bien sur LinguaLibre:About, mais j'ai viré ce qui n'a rien à faire sur cette page (histoire que ça soit un minimum pro / propre / efficace). Tu peux retrouver le contenu sur la version historisé. — 0x010C ~talk~ 21:59, 26 December 2018 (UTC)
- Ok, cool! J'ai recupéré ca de github, j'y fais du nettoyage. J'ai pas encore décider ou mettre ces truc sur LinguaLibre... Je te tiens au jus.
- Sur github, le repository listes est supprimable ! Yug (talk) 18:08, 28 December 2018 (UTC)
Ongoing work
Get your data :
$git clone git@github.com:hermitdave/FrequencyWords.git
Then, save into dave-to-LL.mk
the following in the root of your directory:
# RUN: # make -f dave-to-LL.mk iso2=pl iso3=pol processing # to do the work # make -f dave-to-LL.mk iso2=pl iso3=pol all # to do the work AND print few messages iso2="pl" iso3="pol" all: processing messages processing: sed -E 's/ [0-9]+$$//g' $(iso2)_50k.txt | sed -E 's/^/# /g' > $(iso2)-words-LL.txt split -d -l 2000 --additional-suffix=".txt" $(iso2)-words-LL.txt "$(iso3)-words-by-frequency-" messages: head -n 5 $(iso2)_50k.txt head -n 5 $(iso2)-words-LL.txt head -n 5 "$(iso3)-words-by-frequency-00.txt" head -n 5 "$(iso3)-words-by-frequency-01.txt" wc -l $(iso2)-words-LL.txt wc -l "$(iso3)-words-by-frequency-01.txt"
Then, find your {iso2}_50k.txt file. Put both in the same folder, and run the command below with your needed iso2 and iso3 values :
make -f dave-to-LL.mk iso2=pl iso3=pol processing
Yug (talk) 18:17, 28 December 2018 (UTC)
Subtlex
iconv -f "GB18030" -t "UTF-8" SUBTLEX-CH-WF.csv -o $iso2-words.txt sed -E 's/(,[0-9]+.?[0-9]*)+//g' $iso2-words.txt | tail -n+4 | head -n 20000 | sed -E 's/^/# /g' > $iso2-words-LL.txt split -d -l 2000 --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-"
Feature idea : table tacking existing languages on LinguaLibre.fr
I have difficulties to keep track all the languages I helped to add to LinguaLibre. Taiwan has 16 languages and 42 locals variations. Maybe it already exists... If not, It would be a positive have a sortable table such as below :
Wikidata qid | LinguaLibre qid | English name | Language group | Active ? | Numb. or recordings |
---|---|---|---|---|---|
Q715766 | Atayal (Q51302) | Atayal | Taiwanese | Low | 4 |
Q718269 | Sakizaya (Q51871) | Sakizaya | Taiwanese | Low | 6 |
... | ... | .... | ... | ... | ... |
Yug (talk) 12:16, 31 December 2018 (UTC)
- I'am finding out how LinguaLibre:Stats is coded, maybe I will be able to produce something :D Yug (talk) 12:59, 31 December 2018 (UTC)