How to match accented characters with a regex?

user502052 picture user502052 · Sep 3, 2011 · Viewed 31.9k times · Source

I am running Ruby on Rails 3.0.10 and Ruby 1.9.2. I am using the following Regex in order to match names:

NAME_REGEX = /^[\w\s'"\-_&@!?()\[\]-]*$/u

validates :name,
  :presence   => true,
  :format     => {
    :with     => NAME_REGEX,
    :message  => "format is invalid"
  }

However, if I try to save some words like the followings:

Oilalà
Pì
Rùby
...

# In few words, those with accented characters

I have a validation error "Name format is invalid..

How can I change the above Regex so to match also accented characters like à, è, é, ì, ò, ù, ...?

Answer

Lars Haugseth picture Lars Haugseth · Sep 3, 2011

Instead of \w, use the POSIX bracket expression [:alpha:]:

"blåbær dèjá vu".scan /[[:alpha:]]+/  # => ["blåbær", "dèjá", "vu"]

"blåbær dèjá vu".scan /\w+/  # => ["bl", "b", "r", "d", "j", "vu"]

In your particular case, change the regex to this:

NAME_REGEX = /^[[:alpha:]\s'"\-_&@!?()\[\]-]*$/u

This does match much more than just accented characters, though. Which is a good thing. Make sure you read this blog entry about common misconceptions regarding names in software applications.