User talk
Difference between revisions of "KlaudiuMihaila"
(3 intermediate revisions by 2 users not shown) | |||
Line 13: | Line 13: | ||
:::::::Thank you for this important feedback. I suspected some languages would need human review and correction, for which the need and methodology are do clarify. (Knowing other sources are available). | :::::::Thank you for this important feedback. I suspected some languages would need human review and correction, for which the need and methodology are do clarify. (Knowing other sources are available). | ||
:::::::Hermite Dave list: This Romanian lists come from Hermite Dave lists, and amateur open source contributor who used opensubtitles data. Both the source (open subtitle) and the amateur limited-resources could be source of noise and misspellings. H. Dave's lists are the frequency lists wildly shared on wiktionaries. See [[List:Ron/words-by-frequency-00001-to-01000]] & [[List:Ron/words-by-frequency-01001-to-05000]] | :::::::Hermite Dave list: This Romanian lists come from Hermite Dave lists, and amateur open source contributor who used opensubtitles data. Both the source (open subtitle) and the amateur limited-resources could be source of noise and misspellings. H. Dave's lists are the frequency lists wildly shared on wiktionaries. See [[List:Ron/words-by-frequency-00001-to-01000]] & [[List:Ron/words-by-frequency-01001-to-05000]] | ||
− | :::::::UNILEX list: There is another source, UNILEX, by Unicode Consortium and Google. Could be better ? I don't see diacritics either tho. See [[List:Ron/frequency-00001-to-05000-UNILEX]]. | + | :::::::UNILEX list: There is another source, UNILEX, by Unicode Consortium and Google. Could be better ? I don't see much diacritics either tho. See [[List:Ron/frequency-00001-to-05000-UNILEX]]. |
:::::::Could you take a look and assess the quality of each source ? A raw estimate of % of misspelled items would do. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 13:18, 25 February 2021 (UTC) | :::::::Could you take a look and assess the quality of each source ? A raw estimate of % of misspelled items would do. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 13:18, 25 February 2021 (UTC) | ||
+ | :::::::I think I understand. You are talking about canonical writing. Both Dave and UNILEX's list shows common/popular writing. [[User:Yug|Yug]] ([[User talk:Yug|talk]]) 19:18, 25 February 2021 (UTC) | ||
+ | |||
+ | == Place of "residence" == | ||
+ | Hello {{ping|KlaudiuMihaila}} congratulations for your great work these past months and especially in May! | ||
+ | |||
+ | I don't know if you are aware of it, but on Lingua Libre speakers can specify their place of residence/learning (it can be a continent, a country, a region or a city) in order for the bots to be able to write this place along with every recording on Wiktionaries for example. Currently, there is no place associated to [[Q470543|your speaker profile]], so you could add it to improve the quality of the information displayed on projects that use your recordings. | ||
+ | |||
+ | If you leave in a place that has no major incidence on the way you speak (e.g. you were born and lived in London for 20 years but you now live in Paris since 6 months), I suggest that you indicate the place that was more important for your learning of your language (i.e. London instead of Paris for the example). — '''[[User:WikiLucas00|WikiLucas]]''' [[User talk:WikiLucas00|(🖋️)]] 13:08, 24 May 2021 (UTC) | ||
+ | :{{ping|WikiLucas00}} Merci beaucoup! [[User:KlaudiuMihaila|KlaudiuMihaila]] ([[User talk:KlaudiuMihaila|talk]]) 18:06, 24 May 2021 (UTC) |
Latest revision as of 18:06, 24 May 2021
Welcome
Lingua Libre is a project which aims to build a collaborative multilingual audiovisual corpus under free licence in order to expand knowledge about languages and help online language communities to develop.
You can help us!
You can visit this page if you want to learn more about the project.
You can create your User page by clicking on it. We recommend that you integrate the babel template onto it. It is very useful to indicate to others the languages you speak, and to facilitate finding other persons speaking your languages. If you are not familiar with wikicode, you can go to this demo User page, read the instructions and copy the prepared babel template, and then paste it onto your user page before adapting it to your information. You can then publish your User page!
- Follow the steps of the Record Wizard
- Think of the words you want to record. You may enter them one by one (live list), use an existing category from a Wiktionary or Wikipedia project, or create your own list.
You can visit this help page where you will find advice for beginning your contributions on Lingua Libre. If you did not find the answer to your question, please ask it in the Chat room.
- Try to avoid background noises during the recording
- Please listen to the pronunciations before uploading them
- Consider using an external microphone
Best regards! — WikiLucas (🖋️) 13:41, 11 February 2021 (UTC)
Want to go larger ?
Hello KlaudiuMihaila, your current userrights on Commons limits you to 380 recordings per 72mins. Today you did 340 recordings in 40 minutes. We can upgrade you rapidly via a request on Commons:Requests for rights. Interested ? Yug (talk) 20:42, 17 February 2021 (UTC)
- Sounds great! Thanks! Do I need to request it myself, or would someone here nominate me? KlaudiuMihaila (talk) 09:38, 22 February 2021 (UTC)
- The best is to request for yourself, and you can copy the link to the discussion here so that we can publicly support your request. Pamputt (talk) 09:46, 22 February 2021 (UTC)
- You got the
autopatrolled
userrights. You now have unlimited upload. :) - Do you need further help ? I noticed you recorded
Gălățanu
(Gălățanu (Q491460)) but I only see one Romanian list and I don't findGălățanu
in it. May I ask what is your working process ? I try to understand how users contribute so I we may make things easier for you or future contributors. Yug (talk) 22:49, 23 February 2021 (UTC)- I am using Wikipedia categories for now. I just completed the surnames, might do first names, perhaps place names afterwards. KlaudiuMihaila (talk) 23:39, 23 February 2021 (UTC)
- Approach by Wikipedia categories and names I see. Thank for the info :)
- Just for your information, I created frequency lists for the Romanian language & community. If you can occasionally share the news on your local Romanian forums then sooner or later some users may join in and attack this side. As all the Wikimedia movement, we lack women contributors and regional pronunciations. There have been a recent push with kids recordings which opens an joyful and interesting avenue :) Yug (talk) 10:04, 24 February 2021 (UTC)
- I did see the frequency lists, those can be useful. However, there are several types of errors in those lists. For instance, most entries in those lists use the wrong type of diacritics (cedilla instead of comma), which presumably stems from the (older) texts from which the words were extracted. Other entries do not use diacritics at all, and although I mostly understand what the word refers to, it is still written incorrectly - how should that be pronounced, as if it were written correctly, or as it is written in that list? Diacritics can make a big difference in meaning, e.g., "tată" (father), "țață" (old hag), "țâță" (tit, breast). Furthermore, some word compounds are only partially mentioned - e.g., the verb without the short pronoun particle attached to it; this would not be a valid word unless both components are present - e.g., "băga-mi-aș" has three components, the verb, the pronoun, the verb particle, or "citindu-i" has two components, the verb and pronoun. I have also seen words from other languages, and again this probably stems from the texts based on which the lists were compiled. How would one go about fixing those? KlaudiuMihaila (talk) 12:23, 25 February 2021 (UTC)
- Thank you for this important feedback. I suspected some languages would need human review and correction, for which the need and methodology are do clarify. (Knowing other sources are available).
- Hermite Dave list: This Romanian lists come from Hermite Dave lists, and amateur open source contributor who used opensubtitles data. Both the source (open subtitle) and the amateur limited-resources could be source of noise and misspellings. H. Dave's lists are the frequency lists wildly shared on wiktionaries. See List:Ron/words-by-frequency-00001-to-01000 & List:Ron/words-by-frequency-01001-to-05000
- UNILEX list: There is another source, UNILEX, by Unicode Consortium and Google. Could be better ? I don't see much diacritics either tho. See List:Ron/frequency-00001-to-05000-UNILEX.
- Could you take a look and assess the quality of each source ? A raw estimate of % of misspelled items would do. Yug (talk) 13:18, 25 February 2021 (UTC)
- I think I understand. You are talking about canonical writing. Both Dave and UNILEX's list shows common/popular writing. Yug (talk) 19:18, 25 February 2021 (UTC)
- I did see the frequency lists, those can be useful. However, there are several types of errors in those lists. For instance, most entries in those lists use the wrong type of diacritics (cedilla instead of comma), which presumably stems from the (older) texts from which the words were extracted. Other entries do not use diacritics at all, and although I mostly understand what the word refers to, it is still written incorrectly - how should that be pronounced, as if it were written correctly, or as it is written in that list? Diacritics can make a big difference in meaning, e.g., "tată" (father), "țață" (old hag), "țâță" (tit, breast). Furthermore, some word compounds are only partially mentioned - e.g., the verb without the short pronoun particle attached to it; this would not be a valid word unless both components are present - e.g., "băga-mi-aș" has three components, the verb, the pronoun, the verb particle, or "citindu-i" has two components, the verb and pronoun. I have also seen words from other languages, and again this probably stems from the texts based on which the lists were compiled. How would one go about fixing those? KlaudiuMihaila (talk) 12:23, 25 February 2021 (UTC)
- I am using Wikipedia categories for now. I just completed the surnames, might do first names, perhaps place names afterwards. KlaudiuMihaila (talk) 23:39, 23 February 2021 (UTC)
- You got the
- The best is to request for yourself, and you can copy the link to the discussion here so that we can publicly support your request. Pamputt (talk) 09:46, 22 February 2021 (UTC)
Place of "residence"
Hello @KlaudiuMihaila congratulations for your great work these past months and especially in May!
I don't know if you are aware of it, but on Lingua Libre speakers can specify their place of residence/learning (it can be a continent, a country, a region or a city) in order for the bots to be able to write this place along with every recording on Wiktionaries for example. Currently, there is no place associated to your speaker profile, so you could add it to improve the quality of the information displayed on projects that use your recordings.
If you leave in a place that has no major incidence on the way you speak (e.g. you were born and lived in London for 20 years but you now live in Paris since 6 months), I suggest that you indicate the place that was more important for your learning of your language (i.e. London instead of Paris for the example). — WikiLucas (🖋️) 13:08, 24 May 2021 (UTC)
- @WikiLucas00 Merci beaucoup! KlaudiuMihaila (talk) 18:06, 24 May 2021 (UTC)