There are two ways to open a text file in Python:
f = open(filename)
And
import codecs
f = codecs.open(filename, encoding="utf-8")
When is codecs.open
preferable to open
?
Since Python 2.6, a good practice is to use io.open()
, which also takes an encoding
argument, like the now obsolete codecs.open()
. In Python 3, io.open
is an alias for the open()
built-in. So io.open()
works in Python 2.6 and all later versions, including Python 3.4. See docs: http://docs.python.org/3.4/library/io.html
Now, for the original question: when reading text (including "plain text", HTML, XML and JSON) in Python 2 you should always use io.open()
with an explicit encoding, or open()
with an explicit encoding in Python 3. Doing so means you get correctly decoded Unicode, or get an error right off the bat, making it much easier to debug.
Pure ASCII "plain text" is a myth from the distant past. Proper English text uses curly quotes, em-dashes, bullets, € (euro signs) and even diaeresis (¨). Don't be naïve! (And let's not forget the Façade design pattern!)
Because pure ASCII is not a real option, open()
without an explicit encoding is only useful to read binary files.