User talk
Difference between revisions of "Balyozxane"
(Created page with "{{susbt:Welcome|~~~~}}") |
Balyozxane (talk | contribs) |
||
(8 intermediate revisions by 3 users not shown) | |||
Line 1: | Line 1: | ||
− | {{ | + | {{Welcome/lang|user=Balyozxane|welcominguser=Yug|1=[[User:Yug|Yug]] ([[User talk:Yug|talk]]) 18:53, 21 February 2021 (UTC)}} |
+ | :Welcome here Balyozxane! Happy to see kurdish language is coming ! :) [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 18:53, 21 February 2021 (UTC) | ||
+ | ::Thank you! --[[User:Balyozxane|Balyozxane]] ([[User talk:Balyozxane|talk]]) 04:11, 22 February 2021 (UTC) | ||
+ | |||
+ | |||
+ | == Kurdish lists to test ? == | ||
+ | hello, | ||
+ | <br>I noticed the emerging efforts of the Kurdish community, and noticed one of your community's bottleneck may be the lack of lists of words. I wonder if I could help, so I looked for and found a free frequency list [https://github.com/lingua-libre/unilex/tree/master/data/frequency online] with the language code <code>ku</code>, which I believe is the same as <code>kur</code> (Kurdish language). I coded a command and created the two following lists : | ||
+ | * [[List:Kur/words-by-frequency-00001-to-01000]] | ||
+ | * [[List:Kur/words-by-frequency-01001-to-05000]] | ||
+ | |||
+ | Since I don't read Kurdish could you help me by confirming to me: are these lists indeed in Kurdish language ? are these frequency list relevant (are the words are indeed frequent Kurdish words) ? There may be minor noise such as letters or frequent acronyms, but it believe these list should be +95% to +99% of highly used Kurdish words. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:10, 24 February 2021 (UTC)<br> | ||
+ | Note:I did not find <code>ckb</code> – Sorani, <code>kmr</code> – Kurmanji, <code>sdh</code> – Southern Kurdish, nor <code>lki</code> – Laki language resources. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 12:46, 24 February 2021 (UTC) | ||
+ | :@[[User:Yug|Yug]] These are all (Northern) Kurdish (Kurmanji) words. I believe you got the list from User:Şêr who prepared it from 37 issues of Le Monde diplomatique Kurdî. They are not manually edited to be in their dictionary form, still quite usefull tho, thank you so much! Btw I'm bothered by the lang codes used here. [http://unicode.org/reports/tr35/#Language_Locale_Field_Definitions unicode.org Language Identifier Field Definitions] says use "ku" for Northern Kurdish (Kurmanji), we use "ku" on ku.wikt as well but en.wikt recently switched to "kmr" (which I supported at the time). And right now all the kurdish words listed in [[c:Category:Lingua Libre pronunciation-kur]] are in Northern Kurdish. I'm worried once Sorani (Central Kurdish) "ckb" starts uploading pronunciations, we might get in a little trouble. Any idea what to do?--[[User:Balyozxane|Balyozxane]] ([[User talk:Balyozxane|talk]]) 13:39, 24 February 2021 (UTC) | ||
+ | ::'''Data:''' Hi there. My data actually comes from UNILEX, which is a Google-led Unicode Consortium project. So I suspect it uses Google's best scrappers and NLP. I then forked their github, and made a script to convert their format into Lili's <code>List:*</code> format. See: | ||
+ | ::* [https://github.com/lingua-libre/unilex/ github.com/lingua-libre/unilex]/[https://github.com/lingua-libre/unilex/tree/master/data/frequency-sorted-hash data/frequency-sorted-hash]/[https://github.com/lingua-libre/unilex/tree/master/data/frequency-sorted-hash/ig.txt /ig.txt] | ||
+ | ::* [https://github.com/lingua-libre/unilex/ github.com/lingua-libre/unilex]/[https://github.com/lingua-libre/unilex/tree/master/data/frequency-sorted-hash data/frequency-sorted-count]/[https://github.com/lingua-libre/unilex/tree/master/data/frequency-sorted-count/ig.txt /ig.txt] | ||
+ | ::For the Kurdish files, replace <code>ig</code> by <code>ku</code>. | ||
+ | ::'''Iso3 ?:''' This is typically a discussion to have with your kurdish community, if possible computer linguists. As of now the [[:Commons:Category:Lingua_Libre_pronunciation-kur]] has 577 files, or 30mins workload. We can change course. If you confirm me that: | ||
+ | ::# Kurdish subgroup are distinct languages (either different word forms, grammar, conjugaisons, syntaxes such as spanish, portuguese, and braziliand portuguese). | ||
+ | ::# the lists created today actually are <code>kmr</code> Northern Kurdish (Kurmanji) | ||
+ | ::We can do several things. First, move the lists to <code>List:Kmr/</code>. Second, confirm or add the 4 locals to Lingualibre. Third, notify the 3 Kurdish speakers that the all-inclusive <code>kur</code> code is depreciated. Then, help them to change their profiles to <code>kmr</code> or relevant form. Call them to continue working on <code>List:Kmr/</code>. As for the clean up, I'am not sure yet : rename or delete. Honestly, for efficiency sakes, we would go faster by removing and starting anew. Renaming would require hand work or bot work, and likely 2~3hours minimum. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 16:27, 24 February 2021 (UTC) | ||
+ | :Hi, let me comment about the codes and the Kurdish languages. | ||
+ | :* kur is the macrocode for '''all''' Kurdish languages ("ku" is the ISO 639-1 code, Lingua Libre uses ISO 639-3 code so far to identify languages); it is similar to the code "ara" for all Arabic languages. In principle this code, that tags "Kurdish language", can thus be used by a locutor if he/she does not which specific Kurdish language he/she speaks. | ||
+ | :If you know which Kurdish language you speak, you may use either "kur", or one of the following more precise codes (according to [https://iso639-3.sil.org/code/kur iso639-3.sil.org]: | ||
+ | :* "[[Q390278|kmr]]", tags for the "Kurmanji/Northern Kurdish", is already present on Lingua Libre so it is already possible to record in this language | ||
+ | :* "[[Q386444|ckb]]", tags for the "Sorani/Central Kurdish", is already present on Lingua Libre so it is already possible to record in this language | ||
+ | :* "sdh", tags for the Southern Kurdish. It is not yet present here. I can create it as Southern Kurdish. | ||
+ | : So you can either choose the Kurdish language or a more specific one. Last point, if you can attest that all the recordings in [[:c:Category:Lingua Libre pronunciation-kur|Lingua Libre pronunciation-kur]] are actually Kurmanji, we can rename the category so that we clearly identify the Kurdish language. [[User:Pamputt|Pamputt]] ([[User talk:Pamputt|talk]]) 16:32, 24 February 2021 (UTC) | ||
+ | ::Thank you both! I tried to bring the issue up on ku.wikt to see what they think but they (two most active users) are not inclined to use kmr for Northern Kurdish. I'll check with other people to see how we should proceed.--[[User:Balyozxane|Balyozxane]] ([[User talk:Balyozxane|talk]]) 19:15, 24 February 2021 (UTC) |
Latest revision as of 19:15, 24 February 2021
Welcome
Lingua Libre is a project which aims to build a collaborative multilingual audiovisual corpus under free licence in order to expand knowledge about languages and help online language communities to develop.
You can help us!
You can visit this page if you want to learn more about the project.
You can create your User page by clicking on it. We recommend that you integrate the babel template onto it. It is very useful to indicate to others the languages you speak, and to facilitate finding other persons speaking your languages. If you are not familiar with wikicode, you can go to this demo User page, read the instructions and copy the prepared babel template, and then paste it onto your user page before adapting it to your information. You can then publish your User page!
- Follow the steps of the Record Wizard
- Think of the words you want to record. You may enter them one by one (live list), use an existing category from a Wiktionary or Wikipedia project, or create your own list.
You can visit this help page where you will find advice for beginning your contributions on Lingua Libre. If you did not find the answer to your question, please ask it in the Chat room.
- Try to avoid background noises during the recording
- Please listen to the pronunciations before uploading them
- Consider using an external microphone
- Welcome here Balyozxane! Happy to see kurdish language is coming ! :) Yug (talk) 18:53, 21 February 2021 (UTC)
- Thank you! --Balyozxane (talk) 04:11, 22 February 2021 (UTC)
Kurdish lists to test ?
hello,
I noticed the emerging efforts of the Kurdish community, and noticed one of your community's bottleneck may be the lack of lists of words. I wonder if I could help, so I looked for and found a free frequency list online with the language code ku
, which I believe is the same as kur
(Kurdish language). I coded a command and created the two following lists :
Since I don't read Kurdish could you help me by confirming to me: are these lists indeed in Kurdish language ? are these frequency list relevant (are the words are indeed frequent Kurdish words) ? There may be minor noise such as letters or frequent acronyms, but it believe these list should be +95% to +99% of highly used Kurdish words. Yug (talk) 12:10, 24 February 2021 (UTC)
Note:I did not find ckb
– Sorani, kmr
– Kurmanji, sdh
– Southern Kurdish, nor lki
– Laki language resources. Yug (talk) 12:46, 24 February 2021 (UTC)
- @Yug These are all (Northern) Kurdish (Kurmanji) words. I believe you got the list from User:Şêr who prepared it from 37 issues of Le Monde diplomatique Kurdî. They are not manually edited to be in their dictionary form, still quite usefull tho, thank you so much! Btw I'm bothered by the lang codes used here. unicode.org Language Identifier Field Definitions says use "ku" for Northern Kurdish (Kurmanji), we use "ku" on ku.wikt as well but en.wikt recently switched to "kmr" (which I supported at the time). And right now all the kurdish words listed in c:Category:Lingua Libre pronunciation-kur are in Northern Kurdish. I'm worried once Sorani (Central Kurdish) "ckb" starts uploading pronunciations, we might get in a little trouble. Any idea what to do?--Balyozxane (talk) 13:39, 24 February 2021 (UTC)
- Data: Hi there. My data actually comes from UNILEX, which is a Google-led Unicode Consortium project. So I suspect it uses Google's best scrappers and NLP. I then forked their github, and made a script to convert their format into Lili's
List:*
format. See: - For the Kurdish files, replace
ig
byku
. - Iso3 ?: This is typically a discussion to have with your kurdish community, if possible computer linguists. As of now the Commons:Category:Lingua_Libre_pronunciation-kur has 577 files, or 30mins workload. We can change course. If you confirm me that:
- Kurdish subgroup are distinct languages (either different word forms, grammar, conjugaisons, syntaxes such as spanish, portuguese, and braziliand portuguese).
- the lists created today actually are
kmr
Northern Kurdish (Kurmanji)
- We can do several things. First, move the lists to
List:Kmr/
. Second, confirm or add the 4 locals to Lingualibre. Third, notify the 3 Kurdish speakers that the all-inclusivekur
code is depreciated. Then, help them to change their profiles tokmr
or relevant form. Call them to continue working onList:Kmr/
. As for the clean up, I'am not sure yet : rename or delete. Honestly, for efficiency sakes, we would go faster by removing and starting anew. Renaming would require hand work or bot work, and likely 2~3hours minimum. Yug (talk) 16:27, 24 February 2021 (UTC)
- Data: Hi there. My data actually comes from UNILEX, which is a Google-led Unicode Consortium project. So I suspect it uses Google's best scrappers and NLP. I then forked their github, and made a script to convert their format into Lili's
- Hi, let me comment about the codes and the Kurdish languages.
- kur is the macrocode for all Kurdish languages ("ku" is the ISO 639-1 code, Lingua Libre uses ISO 639-3 code so far to identify languages); it is similar to the code "ara" for all Arabic languages. In principle this code, that tags "Kurdish language", can thus be used by a locutor if he/she does not which specific Kurdish language he/she speaks.
- If you know which Kurdish language you speak, you may use either "kur", or one of the following more precise codes (according to iso639-3.sil.org:
- "kmr", tags for the "Kurmanji/Northern Kurdish", is already present on Lingua Libre so it is already possible to record in this language
- "ckb", tags for the "Sorani/Central Kurdish", is already present on Lingua Libre so it is already possible to record in this language
- "sdh", tags for the Southern Kurdish. It is not yet present here. I can create it as Southern Kurdish.
- So you can either choose the Kurdish language or a more specific one. Last point, if you can attest that all the recordings in Lingua Libre pronunciation-kur are actually Kurmanji, we can rename the category so that we clearly identify the Kurdish language. Pamputt (talk) 16:32, 24 February 2021 (UTC)
- Thank you both! I tried to bring the issue up on ku.wikt to see what they think but they (two most active users) are not inclined to use kmr for Northern Kurdish. I'll check with other people to see how we should proceed.--Balyozxane (talk) 19:15, 24 February 2021 (UTC)