How to make json.dumps in Python ignore a non-serializable field

mz8i picture mz8i · Aug 3, 2018 · Viewed 21.9k times · Source

I am trying to serialize the output of parsing some binary data with the Construct2.9 library. I want to serialize the result to JSON.

packet is an instance of a Construct class Container.

Apparently it contains a hidden _io of type BytesIO - see output of dict(packet) below:

{
'packet_length': 76, 'uart_sent_time': 1, 'frame_number': 42958, 
'subframe_number': 0, 'checksum': 33157, '_io': <_io.BytesIO object at 0x7f81c3153728>, 
'platform':661058, 'sync': 506660481457717506, 'frame_margin': 20642,
'num_tlvs': 1, 'track_process_time': 593, 'chirp_margin': 78,
'timestamp': 2586231182, 'version': 16908293
}

Now, calling json.dumps(packet) obviously leads to a TypeError:

...

File "/usr/lib/python3.5/json/__init__.py", line 237, in dumps
    **kw).encode(obj)
File "/usr/lib/python3.5/json/encoder.py", line 198, in encode
    chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python3.5/json/encoder.py", line 256, in iterencode
    return _iterencode(o, 0)
File "/usr/lib/python3.5/json/encoder.py", line 179, in default
    raise TypeError(repr(o) + " is not JSON serializable")
TypeError: <_io.BytesIO object at 0x7f81c3153728> is not JSON serializable

However what I am confused about, is that running json.dumps(packet, skipkeys=True) results in the exact same error, while I would expect it to skip the _io field. What is the problem here? Why is skipkeys not allowing me to skip the _io field?

I got the code to work by overriding JSONEncoder and returning None for fields of BytesIO type, but that means my serialized string contains loads of "_io": null elements, which I would prefer not to have at all...

Answer

Martijn Pieters picture Martijn Pieters · Aug 3, 2018

Keys with a leading _ underscore are not really 'hidden', they are just more strings to JSON. The Construct Container class is just a dictionary with ordering, the _io key is not anything special to that class.

You have two options:

  • implement a default hook that just returns a replacement value.
  • Filter out the key-value pairs that you know can't work before serialising.

and perhaps a third, but a casual scan of the Construct project pages doesn't tell me if it is available: have Construct output JSON or at least a JSON-compatible dictionary, perhaps by using adapters.

The default hook can't prevent the _io key from being added to the output, but would let you at least avoid the error:

json.dumps(packet, default=lambda o: '<not serializable>')

Filtering can be done recursively; the @functools.singledispatch() decorator can help keep such code clean:

from functools import singledispatch

_cant_serialize = object()

@singledispatch
def json_serializable(object, skip_underscore=False):
    """Filter a Python object to only include serializable object types

    In dictionaries, keys are converted to strings; if skip_underscore is true
    then keys starting with an underscore ("_") are skipped.

    """
    # default handler, called for anything without a specific
    # type registration.
    return _cant_serialize

@json_serializable.register(dict)
def _handle_dict(d, skip_underscore=False):
    converted = ((str(k), json_serializable(v, skip_underscore))
                 for k, v in d.items())
    if skip_underscore:
        converted = ((k, v) for k, v in converted if k[:1] != '_')
    return {k: v for k, v in converted if v is not _cant_serialize}

@json_serializable.register(list)
@json_serializable.register(tuple)
def _handle_sequence(seq, skip_underscore=False):
    converted = (json_serializable(v, skip_underscore) for v in seq)
    return [v for v in converted if v is not _cant_serialize]

@json_serializable.register(int)
@json_serializable.register(float)
@json_serializable.register(str)
@json_serializable.register(bool)  # redudant, supported as int subclass
@json_serializable.register(type(None))
def _handle_default_scalar_types(value, skip_underscore=False):
    return value

I have the above implementation an additional skip_underscore argument too, to explicitly skip keys that have a _ character at the start. This would help skip all additional 'hidden' attributes the Construct library is using.

Since Container is a dict subclass, the above code will automatically handle instances such as packet.