How to classify Japanese characters as either kanji or kana?

alex2k8 picture alex2k8 · Sep 30, 2010 · Viewed 10.7k times · Source

Given the text below, how can I classify each character as kana or kanji?

誰か確認上記これらのフ

To get some thing like this

誰 - kanji
か - kana
確 - kanji
認 - kanji 
上 - kanji 
記 - kanji 
こ - kana 
れ - kana
ら - kana
の - kana
フ - kana

(Sorry if I did it incorrectly.)

Answer

Josh Lee picture Josh Lee · Sep 30, 2010

This functionality is built into the Character.UnicodeBlock class. Some examples of the Unicode blocks related to the Japanese language:

Character.UnicodeBlock.of('誰') == CJK_UNIFIED_IDEOGRAPHS
Character.UnicodeBlock.of('か') == HIRAGANA
Character.UnicodeBlock.of('フ') == KATAKANA
Character.UnicodeBlock.of('フ') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('!') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('。') == CJK_SYMBOLS_AND_PUNCTUATION

But, as always, the devil is in the details:

Character.UnicodeBlock.of('A') == HALFWIDTH_AND_FULLWIDTH_FORMS

where is the full-width character. So this is in the same category as the halfwidth Katakana above. Note that the full-width is different from the normal (half-width) A:

Character.UnicodeBlock.of('A') == BASIC_LATIN