A syntax or labelled sub terms

Motivation

Prolog defines a syntax that can only express trees of sub terms. At runtime however, a Prolog term can be any directed graph where the nodes are compound terms and the leaves are atomic terms. Such terms cannot be written and read while preserving the structure. There are two cases where this is problematic.

  1. We cannot represent rational trees, also called cyclic terms. Such terms can be created by unification. The typical example is ?- X = f(X)..
  2. We cannot transmit terms with internal sharing. Using write_term/2 followed by read_term/2 creates a copy that is a clean tree. The original term with sharing and the tree are equivalent under ==/2, but they have a different layout in memory.

While (1) is a clear omission, this is less clear for (2). Some notes on this:

  • Internal sharing is used in practice. Consider library(rbtrees) from YAP and ported to various systems. This library uses a nil node with the shape black('',_,_,''). This node is stored in the toplevel term and reused for creating new nodes. Using this nil-node simplifies the algorithms with only a minor space overhead. If, however, we write and read such a tree, a copy of the nil node ends in each node.
  • We could argue that read_term/2 could minimize the term by finding common sub terms and using only a single copy thereof.
  • With labelled sub terms in the syntax we can reduce the amount of data transferred and improve the readability of large terms with a lot of sharing.
  • Whether a term is a copy or not matters when using setarg/3. This could be resolved using SISCtus support for mutable terms or by adding a fresh variable to a compound on which you want to perform setarg/3. A fresh variable avoids unwanted sharing by smart implementations of copy_term/2 as well as smart versions of read_term/2, garbage collection, etc.

Solutions

SICStus introduced write_term/2 using the option cycles(true) which, if a term is cyclic, writes @(Skeleton, Substitutions), e.g. a cyclic term X = f(X) is written as @(X, [X=f(X)]). The predicate read_term/2 has the same option and on reading such a term the Substitutions are unified and the resulting Skeleton is returned. This approach has been copied by SWI-Prolog. It has two disadvantages:

  • It is ambiguous.
  • Cyclic terms cannot be embedded into larger terms as @(_,_) only has a special meaning if it is the top-term. We can of course generate cycles at any level, e.g. @(t(X), [X=f(X)]).

ECLiPSe introduced an extension to its attributed variable syntax. ECLiPSe supports Var{att1: Val1, attr2: Val2, ...} to denote Var with the given attributes. ECLiPSe introduces a reserved attribute =. When present, the variable is unified to the corresponding value. E.g., X = f(X) becomes X{= : f(X)}. Note the space between the = and : is required to satisfy Prolog tokenization.

Note that SICStus does not write attributes, but supports copy_term/3 to obtain a copy of a term holding attributed variables. The copy has no attributes and the 3rd argument is unified to a list of goals that reinstantiate the constraints when called.

Danielle Church has proposed on the SWI-Prolog Discourse group to allow for (Var) Term. The equivalent ECLiPSe notation is Var{= : Term}. The main disadvantage of this is the ambiguity. It was proposed to allow for

p((X) hello(Y)) :- q(X).

Eventually it was rejected in favour of compiler support to make the code below having all benefits of hello(Y) appearing in the head.

p(X) :- X = hello(Y), q(X).

Alternative proposals

A few variations on the ECLiPSe syntax proposed are

  • X{_: f(X)}. The advantage is that it requires no reserved attribute. Otherwise it is considered less clean.
  • X{X: f(X)}. Similar issues

Issues

How to handle a term contains multiple attribute constructs for the same variable?

Two proposals are

  • One option is to make each Var{...} create an attributed term and use unification if we find multiple instances of Var. This implies we need to call the constraint solver inside read/1. Constraint solvers may fail, raise exceptions or be non-deterministic. It is undesirable to propagate this to read/1.
  • The current ECLiPSe and SWI-Prolog implementations allow for multiple Var{...} on the same variable, but:
    • If the same attribute is provided, the value must be the same under ==/2. In that case the second instance is ignored. Otherwise a syntax error is raised.

Can the = attribute be combined with other attributes?

This is not meaningful and problematic as it would require constraint solving.

What about Var{}?

This can be

  • A syntax error (current ECLiPSe)
  • The same as Var (current SWI-Prolog)
  • Create an attributed variable without attributes (if the system supports this notion).

Note that <atom>{} is already valid syntax, both for struct/argnames and for dynamic dicts (currently SWI-Prolog only).

Implications

Providing this syntax poses some new challenges to the Prolog system. Consider the query below. Correctly executed it should print an infinite number of lines.

?- G{= : (writeln('Hello world'), G)}.

However, current SWI-Prolog’s toplevel goal analysis as well as term and goal expansion loop on body terms that contain cyclic control structures.



The Prolog Implementers Forum is a part of the "All Things Prolog" online Prolog community, an initiative of the Association for Logic Programming stemming from the Year of Prolog activities.