How to classify Japanese characters as either kanji or kana?

Question 1

How to classify Japanese characters as either kanji or kana?

java unicode cjk

alex2k8 · Sep 30, 2010 · Viewed 10.7k times · Source

Answer

Answer

This functionality is built into the Character.UnicodeBlock class. Some examples of the Unicode blocks related to the Japanese language:

Character.UnicodeBlock.of('誰') == CJK_UNIFIED_IDEOGRAPHS
Character.UnicodeBlock.of('か') == HIRAGANA
Character.UnicodeBlock.of('フ') == KATAKANA
Character.UnicodeBlock.of('ﾌ') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('！') == HALFWIDTH_AND_FULLWIDTH_FORMS
Character.UnicodeBlock.of('。') == CJK_SYMBOLS_AND_PUNCTUATION

But, as always, the devil is in the details:

Character.UnicodeBlock.of('Ａ') == HALFWIDTH_AND_FULLWIDTH_FORMS

where Ａ is the full-width character. So this is in the same category as the halfwidth Katakana ﾌ above. Note that the full-width Ａ is different from the normal (half-width) A:

Character.UnicodeBlock.of('A') == BASIC_LATIN

Question 2

Given the text below, how can I classify each character as kana or kanji?

誰か確認上記これらのフ

To get some thing like this

誰 - kanji
か - kana
確 - kanji
認 - kanji 
上 - kanji 
記 - kanji 
こ - kana 
れ - kana
ら - kana
の - kana
フ - kana

(Sorry if I did it incorrectly.)

How to classify Japanese characters as either kanji or kana?

Answer

Related questions