Help
Difference between revisions of "Why wordlists matter?"
m (Yug moved page Help:Why wordlist matter? to Help:Why wordlists matter ?: better) |
m (Pamputt moved page Help:Why wordlists matter ? to Help:Why wordlists matter?) |
||
(13 intermediate revisions by 2 users not shown) | |||
Line 2: | Line 2: | ||
== Some context == | == Some context == | ||
− | '''Priorities :''' With limited recording | + | '''Priorities :''' With limited recording capabilities, it is better to use '''frequency lists''' to record the most frequent words first. With unlimited recording abilities, the order doesn’t matter much since we we assume that all the target words will eventually be recorded. Frequency lists have high correlation between languages<ref>Paul Nation and David Crabbe (1991), "A SURVIVAL LANGUAGE LEARNING SYLLABUS FOR FOREIGN TRAVEL" Victoria University of Wellington, New Zealand Published in System Vol 19, No 3, 1991, pp 191-201.</ref>. |
'''Corpus’purpose :''' As for language’s learning, written transcripts of spoken language such as films’ subtitles are known to be better materials (see [https://en.wikipedia.org/wiki/Word_lists_by_frequency#SUBTLEX_movement SUBTLEX studies], 2007). Other corpuses will also allows you to do a good work to provide audio recording. For lexicographic purposes as Wiktionary, rare words are as interesting as frequent words, and the aim is to provide all items with their audio. | '''Corpus’purpose :''' As for language’s learning, written transcripts of spoken language such as films’ subtitles are known to be better materials (see [https://en.wikipedia.org/wiki/Word_lists_by_frequency#SUBTLEX_movement SUBTLEX studies], 2007). Other corpuses will also allows you to do a good work to provide audio recording. For lexicographic purposes as Wiktionary, rare words are as interesting as frequent words, and the aim is to provide all items with their audio. | ||
Line 8: | Line 8: | ||
'''Consistency :''' It is best to provide consistent audio data, with same neutral or enhousiastic tone and same speaker. | '''Consistency :''' It is best to provide consistent audio data, with same neutral or enhousiastic tone and same speaker. | ||
− | '''Lexicon range for learners :''' For language learners and assuming learning via the most frequent words, a minimum vocabulary of 2000-2500 base-words is required to move the learner to autonomous level. Language teaching academics name this level the “threshold level”. The [https://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages CEFR (Common European Framework of Reference for Languages: Learning, Teaching, Assessment)] ([https://rm.coe.int/1680459f97 | + | '''Lexicon range for learners :''' For language learners and assuming learning via the most frequent words, a minimum vocabulary of 2000-2500 base-words is required to move the learner to autonomous level. Language teaching academics name this level the “threshold level”. The [https://en.wikipedia.org/wiki/Common_European_Framework_of_Reference_for_Languages CEFR (Common European Framework of Reference for Languages: Learning, Teaching, Assessment)]<ref>"Common European Framework of Reference for Languages: Learning, Teaching, Assessment" (2001), ([https://rm.coe.int/1680459f97 pdf]</ref>, Chinese’s HSK levels and their pairing with CEFR levels, and some academic researches<ref>Marc Brysbaert*, Michaël Stevens, Paweł Mandera and Emmanuel Keuleers (2016), ''How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age''. https://www.frontiersin.org/articles/10.3389/fpsyg.2016.01116/full</ref> lead to the following relation between lexicon size, CEFR level and competence : |
+ | == Lexicon's size == | ||
{| class="wikitable" | {| class="wikitable" | ||
!width="16%"| Lexicon(*) | !width="16%"| Lexicon(*) | ||
− | !width="5%"| Levels | + | !width="5%" | Levels |
!width="78%"| CEFR’s descriptors | !width="78%"| CEFR’s descriptors | ||
|- | |- | ||
Line 23: | Line 24: | ||
| 5,000 || B2 || “Independant user. Vantage or upper intermediate” | | 5,000 || B2 || “Independant user. Vantage or upper intermediate” | ||
|- | |- | ||
− | | 20,000 || C2 || “Mastery or proficiency”. Native after graduation from highschool. | + | | 20,000+ || C2 || “Mastery or proficiency”. Native after graduation from highschool. |
|} | |} | ||
(*) : Assuming the most frequent word-families learnt first. | (*) : Assuming the most frequent word-families learnt first. | ||
See also [https://rm.coe.int/1680459f97 CEFR 5.2.1.1] ([https://i.stack.imgur.com/1fLE2.png image]), with the most relevant section cited below : | See also [https://rm.coe.int/1680459f97 CEFR 5.2.1.1] ([https://i.stack.imgur.com/1fLE2.png image]), with the most relevant section cited below : | ||
+ | |||
+ | == Vocabulary range == | ||
{| class="wikitable" | {| class="wikitable" | ||
! || VOCABULARY RANGE | ! || VOCABULARY RANGE | ||
Line 54: | Line 57: | ||
|} | |} | ||
+ | == Vocabulary control == | ||
{| class="wikitable" | {| class="wikitable" | ||
! || VOCABULARY CONTROL | ! || VOCABULARY CONTROL | ||
Line 76: | Line 80: | ||
| Users of the Framework may wish to consider and where appropriate state: | | Users of the Framework may wish to consider and where appropriate state: | ||
• which lexical elements (fixed expressions and single word forms) the learner will need/be | • which lexical elements (fixed expressions and single word forms) the learner will need/be | ||
− | equipped/be required to recognise and/or use; | + | equipped/be required to recognise and/or use;<br> |
• how they are selected and ordered | • how they are selected and ordered | ||
|} | |} | ||
+ | |||
+ | == References == | ||
+ | <references /> | ||
+ | |||
+ | == See also == | ||
+ | {{Helps}} |
Latest revision as of 20:02, 15 September 2022
Some context
Priorities : With limited recording capabilities, it is better to use frequency lists to record the most frequent words first. With unlimited recording abilities, the order doesn’t matter much since we we assume that all the target words will eventually be recorded. Frequency lists have high correlation between languages[1].
Corpus’purpose : As for language’s learning, written transcripts of spoken language such as films’ subtitles are known to be better materials (see SUBTLEX studies, 2007). Other corpuses will also allows you to do a good work to provide audio recording. For lexicographic purposes as Wiktionary, rare words are as interesting as frequent words, and the aim is to provide all items with their audio.
Consistency : It is best to provide consistent audio data, with same neutral or enhousiastic tone and same speaker.
Lexicon range for learners : For language learners and assuming learning via the most frequent words, a minimum vocabulary of 2000-2500 base-words is required to move the learner to autonomous level. Language teaching academics name this level the “threshold level”. The CEFR (Common European Framework of Reference for Languages: Learning, Teaching, Assessment)[2], Chinese’s HSK levels and their pairing with CEFR levels, and some academic researches[3] lead to the following relation between lexicon size, CEFR level and competence :
Lexicon's size
Lexicon(*) | Levels | CEFR’s descriptors |
---|---|---|
600 | A1 | “Basic user. Breakthrough or beginner”. Survival communication, expressing basic needs. |
1,200 | A2 | “Basic user. Waystage or elementary” |
2,500 | B1 | “Independant user. Threshold or intermediate”. |
5,000 | B2 | “Independant user. Vantage or upper intermediate” |
20,000+ | C2 | “Mastery or proficiency”. Native after graduation from highschool. |
(*) : Assuming the most frequent word-families learnt first.
See also CEFR 5.2.1.1 (image), with the most relevant section cited below :
Vocabulary range
VOCABULARY RANGE | |
---|---|
C2 | Has a good command of a very broad lexical repertoire including idiomatic expressions and
colloquialisms; shows awareness of connotative levels of meaning. |
C1 | Has a good command of a broad lexical repertoire allowing gaps to be readily overcome with
circumlocutions; little obvious searching for expressions or avoidance strategies. Good command of idiomatic expressions and colloquialisms. |
B2 | Has a good range of vocabulary for matters connected to his/her field and most general topics. Can
vary formulation to avoid frequent repetition, but lexical gaps can still cause hesitation and circumlocution. |
B1 | Has a sufficient vocabulary to express him/herself with some circumlocutions on most topics pertinent to
his/her everyday life such as family, hobbies and interests, work, travel, and current events. Has sufficient vocabulary to conduct routine, everyday transactions involving familiar situations and topics. |
A2 | Has a sufficient vocabulary for the expression of basic communicative needs. Has a sufficient vocabulary for coping with simple survival needs. |
A1 | Has a basic vocabulary repertoire of isolated words and phrases related to particular concrete
situations. |
Vocabulary control
VOCABULARY CONTROL | |
---|---|
C2 | Consistently correct and appropriate use of vocabulary. |
C1 | Occasional minor slips, but no significant vocabulary errors. |
B2 | Lexical accuracy is generally high, though some confusion and incorrect word choice does occur without
hindering communication. |
B1 | Shows good control of elementary vocabulary but major errors still occur when expressing more complex
thoughts or handling unfamiliar topics and situations. |
A2 | Can control a narrow repertoire dealing with concrete everyday needs. |
A1 | No descriptor available |
Users of the Framework may wish to consider and where appropriate state:
• which lexical elements (fixed expressions and single word forms) the learner will need/be
equipped/be required to recognise and/or use; |
References
- ↑ Paul Nation and David Crabbe (1991), "A SURVIVAL LANGUAGE LEARNING SYLLABUS FOR FOREIGN TRAVEL" Victoria University of Wellington, New Zealand Published in System Vol 19, No 3, 1991, pp 191-201.
- ↑ "Common European Framework of Reference for Languages: Learning, Teaching, Assessment" (2001), (pdf
- ↑ Marc Brysbaert*, Michaël Stevens, Paweł Mandera and Emmanuel Keuleers (2016), How Many Words Do We Know? Practical Estimates of Vocabulary Size Dependent on Word Definition, the Degree of Language Input and the Participant’s Age. https://www.frontiersin.org/articles/10.3389/fpsyg.2016.01116/full