Problem: Arabic words in my text files read by java show as series of question marks : ??????
Here is the code:
File[] fileList = mainFolder.listFiles();
BufferedReader bufferReader = null;
Reader reader = null;
try{
for(File f : fileList){
reader = new InputStreamReader(new FileInputStream(f.getPath()), "UTF8");
bufferReader = new BufferedReader(reader);
String line = null;
while((line = bufferReader.readLine())!= null){
System.out.println(new String(line.getBytes(), "UTF-8"));
}
}
}
catch(Exception exc){
exc.printStackTrace();
}
finally {
//Close the BufferedReader
try {
if (bufferReader != null)
bufferReader.close();
} catch (IOException ex) {
ex.printStackTrace();
}
As you can see I have specified the UTF-8 encoding in different places and still I get question marks, do you have any idea how can I fix this??
Thanks
Instead of trying to print out the line directly, print out the Unicode values of each character. For example:
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length; i++)
{
System.out.println(i + ": " + chars[i] + " - " + (int) chars[i]);
}
Then look up the relevant characters in the Unicode code charts.
If you find it's printing 63, then those really are question marks... which would suggest that your text file isn't truly UTF-8 to start with.
If, on the other hand for some characters it's printing out "?" but then a value other than 63, then that would suggest it's a console display issue and you're reading the data correctly.