Working setup for hunspell in Emacs

monotux picture monotux · Oct 18, 2010 · Viewed 7.3k times · Source

Does anyone have a working setup for hunspell and Emacs? Simply setting ispell-program-name to hunspell doesn't work, the output (when using flyspell, for example) looks like below:

-> UTF-8 encoding error. Missing continuation byte in 0. character position: - 9631: word not found

(my files are usually encoded in UTF-8)

I've seen a few different setups, but they've all failed in one way or another. If the encoding works like it should it usually has problems finding the right dictionary.

Anyone with a working solution? It would be nice to be able to switch between two dictionaries (the default should be the swedish dictionary, and the secondary english), but having anything running would be even better.

Answer

Brandon Rhodes picture Brandon Rhodes · Nov 6, 2010

If you are getting that UTF-8 encoding error, then it means that the hunspell process is getting run with an argument specifying some other encoding. When I check my process list, for example, I see this child process to Emacs once it has started up:

/usr/bin/hunspell -a  -B -i iso-8859-1

The ispell-get-coding-system function is what decides which encoding to use, which it does by examining the big ispell-dictionary-alist variable that seems to list every language known to Emacs. The function normally grabs the last symbol off of the entry that matches the language you want to check. For some reason that I did not bother to figure out, this list has iso-8859-1 for English — instead of, you know, paying attention to the encoding in your actual buffer. I know, it seems to make no sense. But we carry on.

You would think that you could override this by setting your own value for the variable ispell-dictionary-alist and use utf-8 as the last of the eight parameters:

;; I could never get Emacs to pay attention to this
(setq ispell-dictionary-alist
  '((nil "[A-Za-z]" "[^A-Za-z]" "[']" t ("-d" "en_US") nil utf-8)))

But I could never get this setting to actually work, whether or not I did a (load-library "ispell") first in my .emacs, or whether I did it inside of one of those:

;; Did not work for me either.
(eval-after-load "ispell" '(progn ...))

Either way, if I started up a fresh Emacs and entered *scratch* and typed ispell-dictionary-alist and pressed Control-J, then the huge original list that ispell creates would come up. Every time.

So I decided to do an end-run around the entire problem of this huge list and simply rewrite the ispell-get-coding-system function to always return utf-8. Sure, this will bite me the next time that I open a file that is really in iso-8859-1, but I never do that anyway, right?

To implement this successfully in my .emacs file (well, ~/.emacs.d/init.el but that takes so much typing for a Stack Overflow answer) required this code:

;; It works!  It works!  After two hours of slogging, it works!
(if (file-exists-p "/usr/bin/hunspell")
    (progn
      (setq ispell-program-name "hunspell")
      (eval-after-load "ispell"
        '(progn (defun ispell-get-coding-system () 'utf-8)))))

I now have hunspell up and working like a champ! Unfortunately the whole reason I went through getting it working was in the hopes that its dictionary was vastly larger than aspell's but I see that it's highlighting some of the same words. Oh well, I'll try another approach. I basically want a spell checker that can be loaded up with the /usr/share/dict/american-english-huge dictionary that is available on Ubuntu, but aspell died in many ways when I tried to expand its horizons. Maybe I will be luckier with hunspell — we will see.