fastest way to python dict to json binary string

tensor picture tensor · Apr 29, 2018 · Viewed 12.2k times · Source

I need to convert this python dict into binary json

   d = {'1': 'myval', '2': 'myval2'}

   json_binary_str = b'{"1": "myval", "2": "myval2"}'

in python 3, I have this :

   import ujson
   ujson.dumps(d)

but, this does not create binary string. How can I do this ?

Answer

Keeely picture Keeely · Jan 12, 2021

In the RFC https://www.rfc-editor.org/rfc/rfc7159, it says:

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32

At first glance it does seem that Python isn't really following the spec when you first look at this after all what does it mean to encode something when it remains a Python3 'str' string, however Python is doing some encoding for you nonetheless. Try this:

>>> json.dumps({"Japan":"日本"})
'{"Japan": "\\u65e5\\u672c"}'

You can see that the Japanese has got converted to unicode escapes, and the resultant string is actually ASCII, even if it's still a Python str. I'm unsure how to get json.dumps() to actually give you utf-8 sequences - for interoperability purposes - if you wanted them, however for all practical purposes this is good enough for most people. The characters are there and will be interpreted correctly. It's easy to get binary with:

>>> json.dumps({"Japan":"日本"}).encode("ascii")
b'{"Japan": "\\u65e5\\u672c"}'

And python does the right thing when loading back in:

>>> json.loads(json.dumps({"Japan":"日本"}).encode("ascii"))
{'Japan': '日本'}

But if you don't bother encoding at all, the loads() still figures out what to do as well when given a str:

>>> json.loads(json.dumps({"Japan":"日本"}))
{'Japan': '日本'}

Python is - as ever - trying to be as helpful as possible in figuring out what you want and doing it, but this is perplexing to people who dig a little deeper, and in spite of loving Python to bits I sympathise with the OP. Whether this kind of 'helpful' behaviour is worth the confusion is a debate that will rage on.

Worth noting that if the next thing to be done with the output is writing to a file, then you can just do:

pathlib.Path("myfile.json").open("w").write(json_data)

Then you don't need it binary because the file is opened in text mode and encoding is done for you.