I need to convert this python dict into binary json
d = {'1': 'myval', '2': 'myval2'}
json_binary_str = b'{"1": "myval", "2": "myval2"}'
in python 3, I have this :
import ujson
ujson.dumps(d)
but, this does not create binary string. How can I do this ?
In the RFC https://www.rfc-editor.org/rfc/rfc7159, it says:
JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32
At first glance it does seem that Python isn't really following the spec when you first look at this after all what does it mean to encode something when it remains a Python3 'str' string, however Python is doing some encoding for you nonetheless. Try this:
>>> json.dumps({"Japan":"日本"})
'{"Japan": "\\u65e5\\u672c"}'
You can see that the Japanese has got converted to unicode escapes, and the resultant string is actually ASCII, even if it's still a Python str. I'm unsure how to get json.dumps() to actually give you utf-8 sequences - for interoperability purposes - if you wanted them, however for all practical purposes this is good enough for most people. The characters are there and will be interpreted correctly. It's easy to get binary with:
>>> json.dumps({"Japan":"日本"}).encode("ascii")
b'{"Japan": "\\u65e5\\u672c"}'
And python does the right thing when loading back in:
>>> json.loads(json.dumps({"Japan":"日本"}).encode("ascii"))
{'Japan': '日本'}
But if you don't bother encoding at all, the loads() still figures out what to do as well when given a str:
>>> json.loads(json.dumps({"Japan":"日本"}))
{'Japan': '日本'}
Python is - as ever - trying to be as helpful as possible in figuring out what you want and doing it, but this is perplexing to people who dig a little deeper, and in spite of loving Python to bits I sympathise with the OP. Whether this kind of 'helpful' behaviour is worth the confusion is a debate that will rage on.
Worth noting that if the next thing to be done with the output is writing to a file, then you can just do:
pathlib.Path("myfile.json").open("w").write(json_data)
Then you don't need it binary because the file is opened in text mode and encoding is done for you.