I'm writing a program that works with documents in Perl and a lot of the documents have characters such as ä, ö, ü, é, etc
(both capital and lowercase). I'd like to replace them with ASCII counterparts a, o, u, e, etc
. How would I do it in Perl?
One of the solutions I thought of is to have a hash with keys being the umlaut and accent characters, and the values being ASCII counterparts, but that requires me to have a list of all umlaut and accent characters, which I don't have, and if I built a list, I'd certainly miss many as I'm unfamiliar with all the possible characters that could have umlauts, accents and other diacritics.
As usual, if you think of a problem which most certainly is not yours only, there's already a solution on CPAN. ) In this case it's called Text::Unidecode
use warnings;
use strict;
use utf8;
use Text::Unidecode;
print unidecode('ä, ö, ü, é'); # will print 'a, o, u, e'