A quick helper to work on JSON streams

Newtonsoft’s JSON library provides lots of different ways to read JSON documents, including a SAX-like way to work with the document by reading it forward-only: JsonTextReader. This is very advantageous if you’re working with very large documents, especially if they are being streamed from a remote source. Using this kind of API, there is no need to have the whole document loaded in memory at once. The downside is that those API are usually quite unfriendly and low-level: you usually have to move a cursor and read values manually. This is not just tedious, it’s also brittle and typically creates excessive coupling with the structure of the JSON document. Last week, I had to read a large JSON document, and I wanted to explore better ways to perform such tasks, which led me to write a small set of helper methods that make it a lot easier to walk JSON documents using Newtonsoft’s JsonTextReader.

Given the following document:

{
    foo: "fou",
    bar: "barre",
    obj: {
        baz: "base",
        arr: [1, 4, 8]
    },
    after: "apres",
    number: 42,
    end: "fin"
}

We can advance to a child property by name:

var found = reader.AdvanceTo("baz");

Assert.IsTrue(found);
Assert.AreEqual("base", reader.Value);

We can also advance to the next child property of a specific type:

var found = reader.AdvanceTo(JsonToken.String);

Assert.IsTrue(found);
Assert.AreEqual("fou", reader.Value);

And we can enumerate the properties of an object:

reader.AdvanceTo("obj");
var results = new List<Tuple<string, JsonToken>>();

foreach(var propertyName in reader.Children())
{
    results.Add(new Tuple<string, JsonToken>(propertyName, reader.TokenType));
}
Assert.AreEqual(JsonToken.EndObject, reader.TokenType);
Assert.AreEqual(2, results.Count);
Assert.AreEqual("baz", results[0].Item1);
Assert.AreEqual(JsonToken.String, results[0].Item2);
Assert.AreEqual("arr", results[1].Item1);
Assert.AreEqual(JsonToken.StartArray, results[1].Item2);

Or the elements of an array:

reader.AdvanceTo("arr");
var results = new List<long>();

foreach (var propertyName in reader.Children())
{
    results.Add((long)reader.Value);
}
Assert.AreEqual(JsonToken.EndArray, reader.TokenType);
Assert.AreEqual(3, results.Count);
Assert.AreEqual(1L, results[0]);
Assert.AreEqual(4L, results[1]);
Assert.AreEqual(8L, results[2]);

All this makes it a lot easier to walk a whole document:

reader.AdvanceTo("foo");
Assert.AreEqual("fou", reader.Value);
reader.AdvanceTo("bar");
Assert.AreEqual("barre", reader.Value);
reader.AdvanceTo("obj");
var enumerator = reader.Children().GetEnumerator();
enumerator.MoveNext();
Assert.AreEqual("baz", enumerator.Current);
Assert.AreEqual("base", reader.Value);
enumerator.MoveNext();
Assert.AreEqual("arr", enumerator.Current);
var arrayValues = new List();
foreach(var shouldBeNull in reader.Children())
{
    Assert.IsNull(shouldBeNull);
    arrayValues.Add((long)reader.Value);
}
Assert.AreEqual(1, arrayValues[0]);
Assert.AreEqual(4, arrayValues[1]);
Assert.AreEqual(8, arrayValues[2]);
Assert.IsFalse(enumerator.MoveNext());
reader.AdvanceTo("after");
Assert.AreEqual("apres", reader.Value);
reader.AdvanceTo("number");
Assert.AreEqual(42L, reader.Value);
reader.AdvanceTo("end");
Assert.AreEqual("fin", reader.Value);

Of course, this is overkill for small documents: in those case, you’ll be better off parsing the whole thing in memory and exploring it using better APIs. For huge documents, however, I hope this little helper library can be helpful to others.

Find the extension methods here: https://gist.github.com/bleroy/a5f48372320c3bdb84ad, and the tests here: https://gist.github.com/bleroy/1f3a249be06ce16dba86.

No Comments