Dicts in Prolog

[Note: If you have comments please post them at the Prolog Community Discourse for this PIP]

META-DISCUSSION FOR THIS PIP

The PIP covers several aspects. What should be covered in this PIP?

  1. An API for representing and manipulating key/value structures? (e.g., dict_new(point, [x:10,y:20], P))
  2. Nice syntax to represent key/value structures? (e.g., P = point{x:10,y:20})
  3. Nice syntax to manipulate key/value structures? (e.g., X = P.x as dict_get(x,P,X))

Some thoughts about the points above:

  1. If this the the only goal of the PIP, it only needs to provide a standardized key/value data type. It can be: 1.1. Abstract data type with no fixed representation 1.2. (partially) fix the internal representation to ensure some properties (serialization, unification, interaction with builtins).
  2. Depending on the internal representation, syntax may have a: 2.1. cheap translation (statically at compile time?) or 2.2. complex translation (dynamically at runtime?)
  3. Keep this outside the PIP? (this is functional notation).

One of the most interesting aspects of SWI’s dictionaries is that they seem to cover 1.2 and 2.1 (and 3, which Ciao can do with generic fsyntax). It’d be good to emphasize those points:

A proposal:

  • P = point{x:10,y:20} has a fixed part (tag and keys) and a variable part (arguments). We encode all the information in the functor as '{}point.x.y'(10,20).

    Idea based on https://cliplab.org/papers/indexed-terms-lopstr-2014-postproc-2015.pdf (“pre-indexing terms”).

  • Tried in a prototype implementation (that should be easily ported to other systems).

  • Similar to argnames, but dynamic. API for pretty printing, getting/setting keys, etc. detects the special '{}point.x.y' encoding, fills a key/index table for the tag, and continues.

  • Compatible with static dictionary versions, as declared functors can store key/index table.

  • Good properties with minimal/no changes:

    • Notation can be implemented in read/write with prolog flags.
    • point{x:11,y:21} = point{y:Y,x:X} is really cheap
    • No semantic changes (just structures with additional information to access arguments by name). Interoperability with standard Prolog predicates ensured:
      • =../2, ground/1, functor/3, arg/3, …
    • Portable term serialization.

PIP - Representing dicts in Prolog

Many programming languages have a data structure to represent a set of key-value pairs. Think of records, structs, property lists, objects, maps, etc. Prolog has a wide variety of ways to deal with this. To name a few:

  • Lists of Key-Value
  • Lists of Key=Value
  • Lists of Key:Value
  • Lists of Key(Value)
  • library(assoc) and/or library(rbtrees)

Some drawbacks of these data structure implementations:

  • Often complexity of operations is O(log(n)) or worse (not O(1)).
  • Unordered list based versions are comfortable to read and write, but any more sophisticated version requires inspection through the API.
  • A further complication occurs if we use any of this to represent nested data structures. Now it may become ambiguous whether the list is a key-value set or just a list. In any case, the empty list [] is ambiguous. This implies we must wrap the key-value list in a functor to disambiguate, making it harder to use the Prolog list manipulation functionality directly.

However, Prolog has O(1) access for compound terms (using the arg/3 and functor/3 predicates) that use positional arguments, i.e., a compound is what could be known elsewhere as a named array, where the functor is the name.

Prolog needs an easy to read and write as well as standard representation for key-value sets, which is useful for:

  • As efficient dynamic named records (essential for promoting abstraction, encapsulation, and reusability, ensuring type safety, easing maintenance, and facilitating communication among developers).

  • Facilitate portable interaction with other languages and data formats such as JSON.

Contributions of this PIP

  • A dictionary data type whose semantics are compatible with representation as standard Prolog terms.
  • Ideally similar to functor/3 and arg/3 but using atomic keys to access arguments rather than numeric indices.
  • (Optional) syntax for dictionaries.

Related

There are some Prolog systems with built-in support for sets of key-value pairs.

  • In ECLiPSe we can define named arguments for a compound, after which name{k1:v1, k2:v2} is mapped to name(…, v1, …, v2, …) where unspecified arguments are left unbound. This does not allow for dynamic key-value sets.

  • In Ciao Prolog, name${k1=>v1, ..., kn=>vn} using the argnames package. Keys must be declared. Unspecified arguments are left unbound. This does not allow for dynamic key-value sets.

  • In SWI-Prolog, the same name{k1:v1, k2:v2} maps to a dynamic data type, represented as a compound with reserved functor:

    (name, v1, k1, v2, k2).

    Where keys are ordered such that many operations can be completed in O(log(N)).

Semantics and operations

Requirements:

  • Idea: like functor/3+arg/3 but using atomic keys to access arguments rather than numeric indices.
  • Semantics:
    • A new term tag{k1:v1, ..., kn:vn}, with ki sorted. (syntax is optional, dict_pairs/3)
    • Equivalent to a term 'tag+k1+...+kn'(v1,...,vn) where 'tag+k1+...+kn' encodes information for tags the key/index mapping.
    • Implementations may chose other representations.
  • Internally representable as Prolog terms, do not need to modify the unification algorithm.
  • Order of keys is irrelevant.

Operations on dicts

  • [Jose] tag missing from operations, is that a special key?

  • is_dict(@Term) is semidet.

    Type test.

  • [Jose] dict_new/2, dict_empty/1? or using dict_pairs/2?

  • dict_get(+Path, +Dict, -Value) is semidet.

    Get a Value from Dict. Path is either an atom or a term Key1/Key2/... to access values in tested dicts.

  • dict_set(+Path, +Dict, ?Value, -NewDict, +NewValue) is semidet.

    Replaces Value inside Dict by NewValue, returning a new dict.

  • dict_get_ex(+Path, +Dict, ?Value) is det.

    As dict_get/3, but raise an exception of a key does not exist.

  • put_dict(+Key, +DictIn, +Value, -DictOut) is det.

    Add or replace the value associated to Key

  • put_dict(+New, +DictIn, -DictOut) is det.

    Merge New into dict. DicttOut contains the union of the keys, with all keys that appear in both dicts with the associated the the value as it appears in New.

  • del_dict(+Key, +DictIn, ?Value, -DictOut) is semidet.

    Remove Key:Value from DictIn

  • dict_pairs(?Dict, ?Pairs)

    Similar to Univ (=..), translating between a dict and a list of Key-Value Pairs.

SWI-Prolog also defines

  • [Jose] API predicates without those operators?

  • Dict1 :< Dict2 is semidet.

    True when all keys in Dict1 also appear in Dict2 and the values unify. This is useful for fetching multiple values efficiently.

  • Dict1 >:< Dict2 is semidet.

    True when the values of the shared keys can be unified. This implies the dicts are “compatible”.

(TODO) Challanges, comments, etc.

  • [Jose] SWI allow tree variables in dict tag name (anonymous dicts). This makes the term<->dict equivalence presented above impossible (like _(A) = f(3) in plain Prolog).

  • [Jose] Although syntax may not be mandatory for this PIP, it is convenient to use a syntactic extensions for normalization purposes, so that two different implementations or representations of dictionaries can interchange data without exposing the internal representation.

  • [Jose] Writing/reading/serializiing static dictionaries is trivial when they are seen as syntactic sugar for terms (like DCGs for heads), but a bit complicated if this is hidden (since the static declarations are erased).

Appendix: Syntax (Jan’s first proposal, Janus-compatible syntax)

[Jose] f({a:1, b:2}) = f({b:2, a:1}) will not unify without a term translation.

With the Janus Python interface, we settled using

{k1:v1, k2:v2, ...}

optionally wrapped in a functor py/1. Note that wrapping is necessary because {} can both be an empty key/value set and an atom, which may be mapped to a string in the target languages. Note that this notation is syntactically compatible with the SWI-Prolog and ECLiPSe notation if the py is defined as a prefix operator.

We propose to use the f({k1:v1, k2:v2, ...}) as the core of the key-value set representation, where we need to decide on f. Options:

  • Leave it to the user, optionally by letting the user declare admissible names.
  • Specify per source/destination, e.g., py{...} or json{...} The advantage of this is we deal with small differences. For example, Python has None where JSON has null.
  • Settle on a single, language neutral name. (In SWI-Prolog I tend to use #{…} for “anonymous” dicts).

Appendix: Syntax (Jan’s second email)

To recall our discussion on this topic, I think this is the status about which we wanted to think for some time.

  • The syntax is tag{k1:v1, …}
  • Tags are atoms that need a declaration. In ISO Prolog, a side effect of this declaration is that Tag becomes a low-priority prefix operator. As is, only ECLiPSe and SWI-Prolog add a special meaning to tag{...} when there is no space between the tag and the ‘{’ (similar to compound terms).
  • All keys are atoms. Values can be any Prolog term.
  • Dicts shall allow for a dynamic version, i.e., where the set of keys is not known at compile time. BUT
    • Systems can define that specific tags refer to a dict with a fixed set of keys.

Note that libraries may accept {…} as a key-value set in addition to the above.



The Prolog Implementers Forum is a part of the "All Things Prolog" online Prolog community, an initiative of the Association for Logic Programming stemming from the Year of Prolog activities.