I'm reading data from a remote source, and occassionally get some characters in another encoding. They're not important.
I'd like to get get a "best guess" utf-8 string, and ignore the invalid data.
Main goal is to get a string I can use, and not run into errors such as:
I thought this was it:
string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "?")
will replace all knowns with '?'.
To ignore all unknowns, :replace => ''
:
string.encode("UTF-8", :invalid => :replace, :undef => :replace, :replace => "")
Edit:
I'm not sure this is reliable. I've gone into paranoid-mode, and have been using:
string.encode("UTF-8", ...).force_encoding('UTF-8')
Script seems to be running, ok now. But I'm pretty sure I'd gotten errors with this earlier.
Edit 2:
Even with this, I continue to get intermittant errors. Not every time, mind you. Just sometimes.