python requests randomly breaks with JSONDecodeError

mateocam picture mateocam · Sep 25, 2018 · Viewed 14.5k times · Source

I have been debugging for hours why my code randomly breaks with this error: JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This is the code I have:

while True:
    try:
        submissions = requests.get('http://reymisterio.net/data-dump/api.php/submission?filter[]=form,cs,'+client+'&filter[]=date,cs,'+since).json()['submission']['records']
        break
    except requests.exceptions.ConnectionError:
        time.sleep(100)

And I've been debugging by printing requests.get(url) and requests.get(url).text and I have encountered the following "special "cases:

  1. requests.get(url) returns a successful 200 response and requests.get(url).text returns html. I have read online that this should fail when using requests.get(url).json(), because it won't be able to read the html, but somehow it doesn't break. Why is this?

  2. requests.get(url) returns a successful 200 response and requests.get(url).text is in json format. I don't understand why when it goes to the requests.get(url).json() line it breaks with the JSONDecodeError?

The exact value of requests.get(url).text for case 2 is:

{
  "submission": {
    "columns": [
      "pk",
      "form",
      "date",
      "ip"
    ],
    "records": [
      [
        "21197",
        "mistico-form-contacto-form",
        "2018-09-21 09:04:41",
        "186.179.71.106"
      ]
    ]
  }
}

Answer

Henry Woody picture Henry Woody · Sep 25, 2018

Looking at the documentation for this API it seems the only responses are in JSON format, so receiving HTML is strange. To increase the likelihood of receiving a JSON response, you can set the 'Accept' header to 'application/json'.

I tried querying this API many times with parameters and did not encounter a JSONDecodeError. This error is likely the result of another error on the server side. To handle it, except a json.decoder.JSONDecodeError in addition to the ConnectionError error you currently except and handle this error in the same way as the ConnectionError.

Here is an example with all that in mind:

import requests, json, time, random

def get_submission_records(client, since, try_number=1):
    url = 'http://reymisterio.net/data-dump/api.php/submission?filter[]=form,cs,'+client+'&filter[]=date,cs,'+since
    headers = {'Accept': 'application/json'}
    try:
        response = requests.get(url, headers=headers).json()
    except (requests.exceptions.ConnectionError, json.decoder.JSONDecodeError):
        time.sleep(2**try_number + random.random()*0.01) #exponential backoff
        return get_submission_records(client, since, try_number=try_number+1)
    else:
        return response['submission']['records']

I've also wrapped this logic in a recursive function, rather than using while loop because I think it is semantically clearer. This function also waits before trying again using exponential backoff (waiting twice as long after each failure).

Edit: For Python 2.7, the error from trying to parse bad json is a ValueError, not a JSONDecodeError

import requests, time, random

def get_submission_records(client, since, try_number=1):
    url = 'http://reymisterio.net/data-dump/api.php/submission?filter[]=form,cs,'+client+'&filter[]=date,cs,'+since
    headers = {'Accept': 'application/json'}
    try:
        response = requests.get(url, headers=headers).json()
    except (requests.exceptions.ConnectionError, ValueError):
        time.sleep(2**try_number + random.random()*0.01) #exponential backoff
        return get_submission_records(client, since, try_number=try_number+1)
    else:
        return response['submission']['records']

so just change that except line to include a ValueError instead of json.decoder.JSONDecodeError.