Parsing large JSON file in .NET

Yavar Hasanov picture Yavar Hasanov · Aug 26, 2015 · Viewed 32.9k times · Source

I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.

I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.

 using (WebClient client = new WebClient())
 {
      string json = client.DownloadString(stringUrl);

      var result = JsonConvert.DeserializeObject<List<Contact>>(json);

 }

The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.

The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.

I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.

UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:

[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]
[
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  },
  {
    "firstname": "sometext",
    "lastname": "sometext"
  }
]

Answer

Brian Rogers picture Brian Rogers · Aug 27, 2015

As you've correctly diagnosed in your update, the issue is that the JSON has a closing ] followed immediately by an opening [ to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.

Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader directly to read the JSON, you can set the SupportMultipleContent flag to true, and then use a loop to deserialize each item individually.

This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.

    using (WebClient client = new WebClient())
    using (Stream stream = client.OpenRead(stringUrl))
    using (StreamReader streamReader = new StreamReader(stream))
    using (JsonTextReader reader = new JsonTextReader(streamReader))
    {
        reader.SupportMultipleContent = true;

        var serializer = new JsonSerializer();
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                Contact c = serializer.Deserialize<Contact>(reader);
                Console.WriteLine(c.FirstName + " " + c.LastName);
            }
        }
    }

Full demo here: https://dotnetfiddle.net/2TQa8p