User

Difference between revisions of "Psubhashish/tools/Prepare words for Lingua Libre"

< User:Psubhashish

(Created page with "The HTML code below is for a converter to prepare words for Lingua Libre. It is created considering a language written in a writing system other than Latin. It has five major...")
 
('alphabetically' only for alphabets)
 
(2 intermediate revisions by one other user not shown)
Line 3: Line 3:
 
# Remove words typed in Latin script
 
# Remove words typed in Latin script
 
# Remove duplicates
 
# Remove duplicates
# Sort alphabetically
+
# Sort lexicographically
# Replace lines with # signs
+
# Prepend lines with # signs to format as an ordered list
  
Your source text can be copied into the box below the "Input Text". You can then press <code>1. Remove punctuation & words to new line</code> followed by the other buttons. After the fifth step click on the button "Copy text" below the "Output Text" box. Go to LinguaLibre's RecordWizard and paste the copied text to generate a word list.
+
== How to use this ==
 +
 
 +
Use any text editor to copy the HTML code in the box below. Save the file as a .<code>HTML</code> file (the default extension would be .<code>txt</code> otherwise). Open the .HTML file using any browser.
 +
 
 +
Your source text can be copied into the "Input Text" box below. You can then press <code>1. Remove punctuation & words to new line</code> followed by the other buttons. After the fifth step, click "Copy text" below the "Output Text" box. Go to LinguaLibre's RecordWizard and paste the copied text to generate a word list.
  
 
<pre>
 
<pre>
Line 19: Line 23:
 
     <h1>Prepare words for Lingua Libre</h1>
 
     <h1>Prepare words for Lingua Libre</h1>
 
     <label for="input-text">Input Text:</label><br>
 
     <label for="input-text">Input Text:</label><br>
     <textarea id="input-text" rows="10" cols="50"></textarea><br>
+
     <textarea id="input-text" rows="10" cols="70"></textarea><br>
 
     <button onclick="removePunctuationAndWords()">1. Remove punctuation &amp; words to new line</button><br/><br/>
 
     <button onclick="removePunctuationAndWords()">1. Remove punctuation &amp; words to new line</button><br/><br/>
 
     <button onclick="removeLatinWords()">2. Remove Latin character words</button><br/><br/>
 
     <button onclick="removeLatinWords()">2. Remove Latin character words</button><br/><br/>
Line 25: Line 29:
 
     <button onclick="sortAlphabetically()">4. Sort alphabetically</button><br/><br/>
 
     <button onclick="sortAlphabetically()">4. Sort alphabetically</button><br/><br/>
 
     <button onclick="replaceLinesWithHash()">5. Replace lines with #</button><br/><br/>
 
     <button onclick="replaceLinesWithHash()">5. Replace lines with #</button><br/><br/>
 +
    <label for="output-text">Output Text:</label><br>
 +
    <textarea id="output-text" rows="10" cols="70"></textarea><br>
 
     <button onclick="copyText()">Copy text</button><br><br/>
 
     <button onclick="copyText()">Copy text</button><br><br/>
    <label for="output-text">Output Text:</label><br>
 
    <textarea id="output-text" rows="10" cols="50"></textarea><br>
 
  
 
     <script>
 
     <script>
Line 86: Line 90:
  
 
</pre>
 
</pre>
 +
 +
== Note ==
 +
 +
* This tool works for the Odia language, written in the Odia script. You'll need to find the [https://en.wikipedia.org/wiki/Category:Unicode_blocks Unicode block for your language] in case your language does not use the Latin script, and edit the line number 36: <code>var outputText = inputText.replace(/[^\u0B00-\u0B7F\s]+/g, "");</code> and replace "u0B00" and "u0B7F" with the ranges for your script.

Latest revision as of 09:45, 5 May 2023

The HTML code below is for a converter to prepare words for Lingua Libre. It is created considering a language written in a writing system other than Latin. It has five major functions:

  1. Remove punctuation and convert words into new lines
  2. Remove words typed in Latin script
  3. Remove duplicates
  4. Sort lexicographically
  5. Prepend lines with # signs to format as an ordered list

How to use this

Use any text editor to copy the HTML code in the box below. Save the file as a .HTML file (the default extension would be .txt otherwise). Open the .HTML file using any browser.

Your source text can be copied into the "Input Text" box below. You can then press 1. Remove punctuation & words to new line followed by the other buttons. After the fifth step, click "Copy text" below the "Output Text" box. Go to LinguaLibre's RecordWizard and paste the copied text to generate a word list.

<!DOCTYPE html>
<html>
<head>
    <title>Prepare words for Lingua Libre</title>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
    <h1>Prepare words for Lingua Libre</h1>
    <label for="input-text">Input Text:</label><br>
    <textarea id="input-text" rows="10" cols="70"></textarea><br>
    <button onclick="removePunctuationAndWords()">1. Remove punctuation & words to new line</button><br/><br/>
    <button onclick="removeLatinWords()">2. Remove Latin character words</button><br/><br/>
    <button onclick="removeDuplicates()">3. Remove duplicates</button><br/><br/>
    <button onclick="sortAlphabetically()">4. Sort alphabetically</button><br/><br/>
    <button onclick="replaceLinesWithHash()">5. Replace lines with #</button><br/><br/>
    <label for="output-text">Output Text:</label><br>
    <textarea id="output-text" rows="10" cols="70"></textarea><br>
    <button onclick="copyText()">Copy text</button><br><br/>

    <script>
        function removePunctuationAndWords() {
            var inputText = document.getElementById("input-text").value;

            // Replace punctuation with a space, except for hyphen and apostrophe
            var textWithSpaces = inputText.replace(/[।“”\(\)\[\]\{\}<>.,\/#!$%\^&\*;:{}=\-_`~]+/g, " ").replace(/(^[-])|([-]$)/g, "");

            // Convert spaces to new line breaks
            var textWithLineBreaks = textWithSpaces.replace(/[\s]+/g, "\n");

            document.getElementById("output-text").value = textWithLineBreaks;
        }

        function removeLatinWords() {
            var inputText = document.getElementById("output-text").value;
            var outputText = inputText.replace(/[^\u0B00-\u0B7F\s]+/g, "");
            document.getElementById("output-text").value = outputText.trim();
        }

        function removeDuplicates() {
            var inputText = document.getElementById("output-text").value;
            var words = inputText.split(/\s+/);
            var uniqueWords = [];
            for (var i = 0; i < words.length; i++) {
                if (uniqueWords.indexOf(words[i]) === -1) {
                    uniqueWords.push(words[i]);
                }
            }
            var outputText = uniqueWords.join("\n");
            document.getElementById("output-text").value = outputText.trim();
        }

        function sortAlphabetically() {
            var inputText = document.getElementById("output-text").value;
            var words = inputText.split(/\s+/);
            words.sort();
            var outputText = words.join("\n");
            document.getElementById("output-text").value = outputText.trim();
        }

        function replaceLinesWithHash() {
            var inputText = document.getElementById("output-text").value;
            var outputText = inputText.replace(/\n+/g, "#");
            document.getElementById("output-text").value = outputText.trim();
        }

        function copyText() {
            var outputText = document.getElementById("output-text");
            outputText.select();
            document.execCommand("copy");
            alert("Text copied to clipboard!");
        }
    </script>
</body>
</html>

Note

  • This tool works for the Odia language, written in the Odia script. You'll need to find the Unicode block for your language in case your language does not use the Latin script, and edit the line number 36: var outputText = inputText.replace(/[^\u0B00-\u0B7F\s]+/g, ""); and replace "u0B00" and "u0B7F" with the ranges for your script.