I receive data from a external Microsoft SQL 2008 database (I make queries with MyBatis). The data is encoded as "Windows-1252".
I have tried to re-encode to UTF-8:
String textoFormado = ...value from MyBatis... ;
String s = new String(textoFormado.getBytes("Windows-1252"), "UTF-8");
Almost the whole string is correctly decoded, but some letters with accents are not.
For example:
�vila
�?vila
Ávila
Obviously, textoFormado
is a variable of type String
. This means that the bytes were already decoded. Java then internally uses a 16-bit Unicode representation. What you did, is to encode your string with Windows-1252 followed by reading the resulting bytes with an UTF-8 encoding. That does not work.
What you need is the correct encoding when reading the bytes:
byte[] sourceBytes = getRawBytes();
String data = new String(sourceBytes , "Windows-1252");
For using this string inside your program, you do not need to do anything. Simply use it. If - however - you want to write the data back to a file for example, you need to encode again:
byte[] destinationBytes = data.getBytes("UTF-8");
// write bytes to destination file here