User talk

Difference between revisions of "Yug"

Line 51: Line 51:
 
split -d -l 2000  --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-"
 
split -d -l 2000  --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-"
 
</pre>
 
</pre>
 +
 +
== Feature idea : table tacking existing languages on LinguaLibre.fr ==
 +
:[[LinguaLibre:Sparql]]
 +
I have difficulties to keep track all the languages I helped to add to LinguaLibre. Taiwan has 16 languages and 42 locals variations. Maybe it already exists... If not, It would be a positive have a '''sortable table''' such as below :
 +
{| class="wikitable sortable"
 +
! Wikidata qid || LinguaLibre qid || English name || Language group || Active ? || Numb. or recordings
 +
|-
 +
| [[:wikidata:Q715766|Q715766]] || [[Q51302]] || [[:en:Atayal|Atayal]]        || Taiwanese      ||  Low      || 4
 +
|-
 +
| [[:wikidata:Q718269|Q718269]] || [[Q51871]] || [[:en:Sakizaya|Sakizaya]]        || Taiwanese      ||  Low      || 6
 +
|-
 +
|    ...          ||      ...        ||  ....        || ...      ||  ...      || ...
 +
|}
 +
[[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:16, 31 December 2018 (UTC)
 +
:I'am finding out how [[LinguaLibre:Stats]] is coded, maybe I will be able to produce something :D [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:59, 31 December 2018 (UTC)
 +
 +
<query _pagination="10" wdata="Wikidata" language="Item" name="Language" nb="Number of records" next="next">
 +
    # Q4: language;
 +
    # Q2: record;
 +
    # P2: instance of;
 +
    # P4: language;
 +
    # P12: wikidata id;
 +
    # P13: ISO 639-3 code;
 +
    select ?language (if( ?language = entity:Q4, '???', ?languageLabel ) as ?name) (COUNT(?record) as ?nb)
 +
    where {
 +
        ?record prop:P2 entity:Q2 .
 +
        ?record prop:P4 ?lang .
 +
 +
      BIND( IF( isBLANK(?lang), entity:Q4, ?lang ) as ?language ).
 +
     
 +
        SERVICE wikibase:label {
 +
            bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" .
 +
            ?language  rdfs:label ?languageLabel.
 +
        }
 +
    }
 +
    GROUP BY ?language ?languageLabel
 +
    ORDER BY DESC(?nb)
 +
</query>

Revision as of 13:28, 31 December 2018

dev

Salut Yug,
Petit rappel que pour faire des tests, il y a l'instance de développement https://v2.lingualibre.fr sur laquelle tu peux faire tous les essais que tu veux, à n'appliquer ici qu'une fois que c'est fin prèt. Ça évitera à des gens de tomber sur des trucs bizarre en chargeant le site au mauvais moment ;).
Bisous — 0x010C ~talk~ 15:45, 26 December 2018 (UTC)
PS : et ne laisse pas traîner des pages inutiles (MediaWiki:Recordwizard) ; en l'occurence ça bloque notamment d'éventuelles futur changements et traductions poussés par translatewiki.

Piouf! T'es vraiment tombé dessus !!! *0* !!! J'ai fais aussi vite que possible !!!! Merci pour ta vigilance, tout me semblait bien restoré, sauf si j'étais aveuglé par du cache !... Ah! v2 c'est vrai ! Tu peux me mettre admin ??? je voudrais tester le support d'images dans https://v2.lingualibre.fr/wiki/MediaWiki:Sidebar ! Yug (talk) 20:09, 26 December 2018 (UTC)
PS: Je ne comprends pas bien ces questions / pages / balises de traductions... Yug (talk) 20:10, 26 December 2018 (UTC)
Je t'ai envoyé un mail avec le mot de passe du compte de test admin. — 0x010C ~talk~ 21:59, 26 December 2018 (UTC)

About

Ton intro et l'historique sont bien sur LinguaLibre:About, mais j'ai viré ce qui n'a rien à faire sur cette page (histoire que ça soit un minimum pro / propre / efficace). Tu peux retrouver le contenu sur la version historisé. — 0x010C ~talk~ 21:59, 26 December 2018 (UTC)

Ok, cool! J'ai recupéré ca de github, j'y fais du nettoyage. J'ai pas encore décider ou mettre ces truc sur LinguaLibre... Je te tiens au jus.
Sur github, le repository listes est supprimable ! Yug (talk) 18:08, 28 December 2018 (UTC)

Ongoing work

Get your data :

$git clone git@github.com:hermitdave/FrequencyWords.git

Then, save into dave-to-LL.mk the following in the root of your directory:

# RUN:
# make -f dave-to-LL.mk iso2=pl iso3=pol processing    # to do the work
# make -f dave-to-LL.mk iso2=pl iso3=pol all           # to do the work AND print few messages
iso2="pl"
iso3="pol"

all: processing messages
processing:
	sed -E 's/ [0-9]+$$//g' $(iso2)_50k.txt | sed -E 's/^/# /g' > $(iso2)-words-LL.txt
	split -d -l 2000  --additional-suffix=".txt" $(iso2)-words-LL.txt "$(iso3)-words-by-frequency-" 

messages:
	head -n 5 $(iso2)_50k.txt
	head -n 5 $(iso2)-words-LL.txt
	head -n 5 "$(iso3)-words-by-frequency-00.txt"
	head -n 5 "$(iso3)-words-by-frequency-01.txt"
	wc -l $(iso2)-words-LL.txt
	wc -l "$(iso3)-words-by-frequency-01.txt"

Then, find your {iso2}_50k.txt file. Put both in the same folder, and run the command below with your needed iso2 and iso3 values :

make -f dave-to-LL.mk iso2=pl iso3=pol processing

Yug (talk) 18:17, 28 December 2018 (UTC)

Subtlex

iconv -f "GB18030" -t "UTF-8" SUBTLEX-CH-WF.csv -o $iso2-words.txt
sed -E 's/(,[0-9]+.?[0-9]*)+//g' $iso2-words.txt | tail -n+4 | head -n 20000 | sed -E 's/^/# /g' > $iso2-words-LL.txt
split -d -l 2000  --additional-suffix=".txt" $iso2-words-LL.txt "$iso3-words-by-frequency-"

Feature idea : table tacking existing languages on LinguaLibre.fr

LinguaLibre:Sparql

I have difficulties to keep track all the languages I helped to add to LinguaLibre. Taiwan has 16 languages and 42 locals variations. Maybe it already exists... If not, It would be a positive have a sortable table such as below :

Wikidata qid LinguaLibre qid English name Language group Active ? Numb. or recordings
Q715766 Atayal (Q51302) Atayal Taiwanese Low 4
Q718269 Sakizaya (Q51871) Sakizaya Taiwanese Low 6
... ... .... ... ... ...

Yug (talk) 12:16, 31 December 2018 (UTC)

I'am finding out how LinguaLibre:Stats is coded, maybe I will be able to produce something :D Yug (talk) 12:59, 31 December 2018 (UTC)
... Loading ...