I'm having problems reading from a file, processing its string and saving to an UTF-8 File.
Here is the code:
try:
filehandle = open(filename,"r")
except:
print("Could not open file " + filename)
quit()
text = filehandle.read()
filehandle.close()
I then do some processing on the variable text.
And then
try:
writer = open(output,"w")
except:
print("Could not open file " + output)
quit()
#data = text.decode("iso 8859-15")
#writer.write(data.encode("UTF-8"))
writer.write(text)
writer.close()
This output the file perfectly but it does so in iso 8859-15 according to my editor. Since the same editor recognizes the input file (in the variable filename) as UTF-8 I don't know why this happened. As far as my reasearch has shown the commented lines should solve the problem. However when I use those lines the resulting file has gibberish in special character mainly, words with tilde as the text is in spanish. I would really appreciate any help as I am stumped....
Process text to and from Unicode at the I/O boundaries of your program using the codecs
module:
import codecs
with codecs.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with codecs.open(filename, 'w', encoding='utf8') as f:
f.write(text)
Edit: The io
module is now recommended instead of codecs and is compatible with Python 3's open
syntax, and if using Python 3, you can just use open
if you don't require Python 2 compatibility.
import io
with io.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# process Unicode text
with io.open(filename, 'w', encoding='utf8') as f:
f.write(text)