Some notes on representing JSON in Prolog
In JSON and many other languages, structured data is represented as a set of key/value pairs. This can either be static (compiled languages) or dynamic (Python, JSON, JavaScript, …).
For a low number of keys, Prolog uses compound terms
with positional arguments. My favorite example is
a point in a 2D environment. The vast majority of the
Prolog programmers will represent a point as
point(X,Y)
. The vast majority of languages
with key/value pairs will use {x:X, y:Y}
or
typedef struct { int x; int y; } point;
,
etc.
If the number of keys gets too large,
the compound with positional arguments becomes impractical
or impossible. This may be resolved using
e.g. SWI-Prolog’s library(record)
that allows
for e.g.
:- record point(x:integer, y:integer).
which is term expanded to represent the point as point(X,Y), providing access predicates for the members by name.
If the keys are dynamic, we need some
sort of dynamic structure in Prolog. Options are
[key(Value)|...]
,
[key-Value|...]
,
[key=Value|...]
,
[key:Value|...]
, (since Janus)
{key:Value, ...}
and AVL trees. The first
series has the advantage that they are easy to type and
process. The main disadvantage is that they need to be
wrapped in some compound to make the thing known as
key/value set rather than just a list and accessing them
is O(n). The {key:Value, ...}
looks
naturally, is easy to type and doesn’t need wrapping. It
is also O(n), and requires a dedicated set of predicates
to examine, compose, etc. AVL trees are unreadable, fairly
large and cannot simply be entered in a source file. They
have a good O(log(N)) library to manage them.
This, as a side note, is the motivation behind SWI-Prolog’s dicts: they have a good syntax, compact representation, can deal with dynamic keys and have O(log(N)) access. Like compounds has =../2, dicts have dict_pairs(?Dict,?Tag,?Pairs) to (dis)assemble them. The representation is a compound with a reserved symbol as functor:
C'dict(Tag, V1, K1, V2, K2, ....)
where Kn are atoms or small (inline) integers sorted by handle. So, lookup is O(log(n)) and merging is O(n), all with a very low constant factor. The break-even for inserting into a dict vs. AVL tree is around 1,000 keys.
As dicts have their keys in canonical order, ==/2 has its normal meaning.
Finally, Prolog many choose to represent the data as unit clauses. That can be interesting, but is outside the scope of this note.
Use cases
Generate a JSON reply from Prolog
Typically, the Prolog data is a mixture of data that is preferably simply encoded in the source and generated by the program. For example, we want Prolog to compute a set of polygons and some meta-data. We would like to write something like this
setof(Poly, polygon(Poly), Polies),
get_time(Now),
reply_json({time:Now, polygons:Polies}).
But, each Poly is naturally a term point(X,Y).
SWI-Prolog has a library(json_convert)
that
allows defining the mapping from point(X,Y)
to {x:X, y:Y}
or
{type:point, x:X, y:Y}
, etc. Given that, we
can call prolog_to_json/2
that recursively
processes a Prolog term and applies the defined mapping to
convert compounds into JSON key/value terms. Similarly,
json_to_prolog/2 does the inverse.
See https://www.swi-prolog.org/pldoc/man?section=jsonconvert
Consuming JSON
Here, we do not have to type anything. In many cases the number of keys per object is low enough for a linear scan. If not, we can explicitly create an AVL tree. Typically, we want
- Extracts a selection of the keys often nested inside multiple layers of objects.
- Translate sets of objects to compounds
SWI-Prolog dicts are a good match, offering a compact and fast representation. The exact representation doesn’t matter too much, while the access and possible conversion utilities do.