How to remove all non - ASCII characters from a string in Ruby

Nick picture Nick · Jul 8, 2010 · Viewed 11.3k times · Source

I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.

#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode

output:

UTF-8
Hello this a mixed str

ing © that I made.

When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT

So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.

I have been searching and experimenting for quite some time now.

If I try to use

  puts s.encode('ASCII-8BIT')

It gives the error:

 : "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)

Answer

Jörg W Mittag picture Jörg W Mittag · Jul 8, 2010

You can just literally translate what you asked into a Regexp. You wrote:

I want to get rid of all non ASCII characters

We can rephrase that a little bit:

I want to substitue all characters which don't thave the ASCII property with nothing

And that's a statement that can be directly expressed in a Regexp:

s.gsub!(/\P{ASCII}/, '')

As an alternative, you could also use String#delete!:

s.delete!("^\u{0000}-\u{007F}")