I seems to be a very simple and much needed method. I need to remove all non ASCII characters from a string. e.g © etc. See the following example.
#coding: utf-8
s = " Hello this a mixed string © that I made."
puts s.encoding
puts s.encode
output:
UTF-8
Hello this a mixed str
ing © that I made.
When I feed this to Watir, it produces following error:incompatible character encodings: UTF-8 and ASCII-8BIT
So my problem is that I want to get rid of all non ASCII characters before using it. I will not know which encoding the source string "s" uses.
I have been searching and experimenting for quite some time now.
If I try to use
puts s.encode('ASCII-8BIT')
It gives the error:
: "\xC2\xA9" from UTF-8 to ASCII-8BIT (Encoding::UndefinedConversionError)
You can just literally translate what you asked into a Regexp
. You wrote:
I want to get rid of all non ASCII characters
We can rephrase that a little bit:
I want to substitue all characters which don't thave the
ASCII
property with nothing
And that's a statement that can be directly expressed in a Regexp
:
s.gsub!(/\P{ASCII}/, '')
As an alternative, you could also use String#delete!
:
s.delete!("^\u{0000}-\u{007F}")