Python how convert single quotes to double quotes to format as json string

revy picture revy · Dec 5, 2017 · Viewed 14.8k times · Source

I have a file where on each line I have text like this (representing cast of a film):

[{'cast_id': 23, 'character': "Roger 'Verbal' Kint", 'credit_id': '52fe4260c3a36847f8019af7', 'gender': 2, 'id': 1979, 'name': 'Kevin Spacey', 'order': 5, 'profile_path': '/x7wF050iuCASefLLG75s2uDPFUu.jpg'}, {'cast_id': 27, 'character': 'Edie's Finneran', 'credit_id': '52fe4260c3a36847f8019b07', 'gender': 1, 'id': 2179, 'name': 'Suzy Amis', 'order': 6, 'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

I need to convert it in a valid json string, thus converting only the necessary single quotes to double quotes (e.g. the single quotes around word Verbal must not be converted, eventual apostrophes in the text also should not be converted).

I am using python 3.x. I need to find a regular expression which will convert only the right single quotes to double quotes, thus the whole text resulting in a valid json string. Any idea?

Answer

user3850 picture user3850 · Dec 5, 2017

First of all, the line you gave as example is not parsable! … 'Edie's Finneran' … contains a syntax error, not matter what.

Assuming that you have control over the input, you could simply use eval() to read in the file. (Although, in that case one would wonder why you can't produce valid JSON in the first place…)

>>> f = open('list.txt', 'r')
>>> s = f.read().strip()
>>> l = eval(s)

>>> import pprint
>>> pprint.pprint(l)
[{'cast_id': 23,
  'character': "Roger 'Verbal' Kint",
  ...
  'profile_path': '/b1pjkncyLuBtMUmqD1MztD2SG80.jpg'}]

>>> import json
>>> json.dumps(l)
'[{"cast_id": 23, "character": "Roger \'Verbal\' Kint", "credit_id": "52fe4260ca36847f8019af7", "gender": 2, "id": 1979, "name": "Kevin Spacey", "order": 5, "rofile_path": "/x7wF050iuCASefLLG75s2uDPFUu.jpg"}, {"cast_id": 27, "character":"Edie\'s Finneran", "credit_id": "52fe4260c3a36847f8019b07", "gender": 1, "id":2179, "name": "Suzy Amis", "order": 6, "profile_path": "/b1pjkncyLuBtMUmqD1MztDSG80.jpg"}]'

If you don't have control over the input, this is very dangerous, as it opens you up to code injection attacks.

I cannot emphasize enough that the best solution would be to produce valid JSON in the first place.