I'd like to distinguish between None
and empty strings (''
) when going back and forth between Python data structure and csv representation using Python's csv
module.
My issue is that when I run:
import csv, cStringIO
data = [['NULL/None value',None],
['empty string','']]
f = cStringIO.StringIO()
csv.writer(f).writerows(data)
f = cStringIO.StringIO(f.getvalue())
data2 = [e for e in csv.reader(f)]
print "input : ", data
print "output: ", data2
I get the following output:
input : [['NULL/None value', None], ['empty string', '']]
output: [['NULL/None value', ''], ['empty string', '']]
Of course, I could play with data
and data2
to distinguish None
and empty strings with things like:
data = [d if d!=None else 'None' for d in data]
data2 = [d if d!='None' else None for d in data2]
But that would partly defeat my interest of the csv
module (quick deserialization/serialization implemented in C, specially when you are dealing with large lists).
Is there a csv.Dialect
or parameters to csv.writer
and csv.reader
that would enable them to distinguish between ''
and None
in this use-case?
If not, would there be an interest in implementing a patch to csv.writer
to enable this kind of back and forth? (Possibly a Dialect.None_translate_to
parameter defaulting to ''
to ensure backward compatibility.)
You could at least partially side-step what the csv
module does by creating your own version of a singleton None
-like class/value:
from __future__ import print_function
import csv
class NONE(object):
''' None-like class. '''
def __repr__(self): # Method csv.writer class uses to write values.
return 'NONE' # Unique string value to represent None.
def __len__(self): # Method called to determine length and truthiness.
return 0
NONE = NONE() # Singleton instance of the class.
if __name__ == '__main__':
try:
from cStringIO import StringIO # Python 2.
except ModuleNotFoundError:
from io import StringIO # Python 3.
data = [['None value', None], ['NONE value', NONE], ['empty string', '']]
f = StringIO()
csv.writer(f).writerows(data)
f = StringIO(f.getvalue())
print(" input:", data)
print("output:", [e for e in csv.reader(f)])
Results:
input: [['None value', None], ['NONE value', NONE], ['empty string', '']]
output: [['None value', ''], ['NONE value', 'NONE'], ['empty string', '']]
Using NONE
instead of None
would preserve enough information for you to be able to differentiate between it and any actual empty-string data values.
You could use the same approach to implement a pair of relatively lightweight csv.reader
and csv.writer
“proxy” classes — necessary since you can't actually subclass the built-in csv
classes which are written in C — without introducing a lot of overhead (since the majority of the processing would still be performed by the underlying built-ins). This would make what goes on completely transparent since it's all encapsulated within the proxies.
from __future__ import print_function
import csv
class csvProxyBase(object): _NONE = '<None>' # Unique value representing None.
class csvWriter(csvProxyBase):
def __init__(self, csvfile, *args, **kwrags):
self.writer = csv.writer(csvfile, *args, **kwrags)
def writerow(self, row):
self.writer.writerow([self._NONE if val is None else val for val in row])
def writerows(self, rows):
list(map(self.writerow, rows))
class csvReader(csvProxyBase):
def __init__(self, csvfile, *args, **kwrags):
self.reader = csv.reader(csvfile, *args, **kwrags)
def __iter__(self):
return self
def __next__(self):
return [None if val == self._NONE else val for val in next(self.reader)]
next = __next__ # Python2.x compatibility.
if __name__ == '__main__':
try:
from cStringIO import StringIO # Python 2.
except ModuleNotFoundError:
from io import StringIO # Python 3.
data = [['None value', None], ['empty string', '']]
f = StringIO()
csvWriter(f).writerows(data)
f = StringIO(f.getvalue())
print("input : ", data)
print("ouput : ", [e for e in csvReader(f)])
Results:
input: [['None value', None], ['empty string', '']]
output: [['None value', None], ['empty string', '']]