I'm trying to put together a regular expression for a JavaScript command that accurately counts the number of words in a textarea.
One solution I had found is as follows:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\b\w+\b/).length -1;
But this doesn't count any non-Latin characters (eg: Cyrillic, Hangul, etc); it skips over them completely.
Another one I put together:
document.querySelector("#wordcount").innerHTML = document.querySelector("#editor").value.split(/\s+/g).length -1;
But this doesn't count accurately unless the document ends in a space character. If a space character is appended to the value being counted it counts 1 word even with an empty document. Furthermore, if the document begins with a space character an extraneous word is counted.
Is there a regular expression I can put into this command that counts the words accurately, regardless of input method?
This should do what you're after:
value.match(/\S+/g).length;
Rather than splitting the string, you're matching on any sequence of non-whitespace characters.
There's the added bonus of being easily able to extract each word if needed ;)