Some notes on representing JSON in Prolog

In JSON and many other languages, structured data is represented as a set of key/value pairs. This can either be static (compiled languages) or dynamic (Python, JSON, JavaScript, …).

For a low number of keys, Prolog uses compound terms with positional arguments. My favorite example is a point in a 2D environment. The vast majority of the Prolog programmers will represent a point as point(X,Y). The vast majority of languages with key/value pairs will use {x:X, y:Y} or typedef struct { int x; int y; } point;, etc.

If the number of keys gets too large, the compound with positional arguments becomes impractical or impossible. This may be resolved using e.g. SWI-Prolog’s library(record) that allows for e.g.

:- record point(x:integer, y:integer).

which is term expanded to represent the point as point(X,Y), providing access predicates for the members by name.

If the keys are dynamic, we need some sort of dynamic structure in Prolog. Options are [key(Value)|...], [key-Value|...], [key=Value|...], [key:Value|...], (since Janus) {key:Value, ...} and AVL trees. The first series has the advantage that they are easy to type and process. The main disadvantage is that they need to be wrapped in some compound to make the thing known as key/value set rather than just a list and accessing them is O(n). The {key:Value, ...} looks naturally, is easy to type and doesn’t need wrapping. It is also O(n), and requires a dedicated set of predicates to examine, compose, etc. AVL trees are unreadable, fairly large and cannot simply be entered in a source file. They have a good O(log(N)) library to manage them.

This, as a side note, is the motivation behind SWI-Prolog’s dicts: they have a good syntax, compact representation, can deal with dynamic keys and have O(log(N)) access. Like compounds has =../2, dicts have dict_pairs(?Dict,?Tag,?Pairs) to (dis)assemble them. The representation is a compound with a reserved symbol as functor:

C'dict(Tag, V1, K1, V2, K2, ....)

where Kn are atoms or small (inline) integers sorted by handle. So, lookup is O(log(n)) and merging is O(n), all with a very low constant factor. The break-even for inserting into a dict vs. AVL tree is around 1,000 keys.

As dicts have their keys in canonical order, ==/2 has its normal meaning.

Finally, Prolog many choose to represent the data as unit clauses. That can be interesting, but is outside the scope of this note.

Use cases

Generate a JSON reply from Prolog

Typically, the Prolog data is a mixture of data that is preferably simply encoded in the source and generated by the program. For example, we want Prolog to compute a set of polygons and some meta-data. We would like to write something like this

   setof(Poly, polygon(Poly), Polies),
   get_time(Now),
   reply_json({time:Now, polygons:Polies}).

But, each Poly is naturally a term point(X,Y). SWI-Prolog has a library(json_convert) that allows defining the mapping from point(X,Y) to {x:X, y:Y} or {type:point, x:X, y:Y}, etc. Given that, we can call prolog_to_json/2 that recursively processes a Prolog term and applies the defined mapping to convert compounds into JSON key/value terms. Similarly, json_to_prolog/2 does the inverse.

See https://www.swi-prolog.org/pldoc/man?section=jsonconvert

Consuming JSON

Here, we do not have to type anything. In many cases the number of keys per object is low enough for a linear scan. If not, we can explicitly create an AVL tree. Typically, we want

Extracts a selection of the keys often nested inside multiple layers of objects.
Translate sets of objects to compounds

SWI-Prolog dicts are a good match, offering a compact and fast representation. The exact representation doesn’t matter too much, while the access and possible conversion utilities do.