Implementation of CML2

Deduction algorithm

CML2 is not built around an algorithm for the propositional satisfiability (or SAT) problem , such as GSAT or SATO. Given that CML2 is a constraint-based language, this might at first seem curious. But there are special features of CML2's domain that would make these relatively poor choices and difficult to apply.

The Linux kernel configuration problem for which CML2 was originally invented and tuned involves approximately 1750 symbols. Of these, however, fewer than 400 participate in about the same number of constraints, mostly implied constraints of simple (a <= b) form. However, many of the variables are not booleans but tristates, blowing the equivalent problem back up to about 1600 boolean shadow variables over 400 constraints, and making the translation of the problem into pure boolean-propositional form a significant exercise in itself.

Complete SAT algorithms like SATO are very time-consuming (the problem is NP-complete), enough to cause unacceptable lag in an interactive configurator on a problem this size. Incomplete SAT algorithms like the stochastic GSAT technique are not guaranteed to yield an answer even though one may exist. But there is a more fundamental problem: we do not actually want a "model" in the SAT sense; where variables are underconstrained we want to leave them as don't-cares (e.g. not set them.)

(Good references on the SAT problem are available on the Web. SATO is the best-of-breed among complete SAT algorithms and the GSAT Page includes a Python implementation, and some good discussion of the algorithm's limitations. Michael Littman has also assembled a mini-survey of the literature.)

CML2 instead uses a relatively simple custom algorithm independently invented by the author, and more closely related to the resolution method of the original Davis-Putnam elimination algorithm than to the splitting-rule approach of SATO and other modern SAT techniques. The CML2 algorithm exploits facts about the domain. One is the fact that we don't actually need to find a full model. Another is the fact that many of the constraints (about 3/4 in the Linux-kernel problem) are simple chains of the form x1 <= x2 <= x3...<= xn created by suppress dependent declarations or sub-menu bracketing with {}.

When the value of a symbol is changed, CML2 tries to find variables it can force. It does this by simplifying the value of all frozen, "chilled", and tentatively set variables out of constraints. In any conjunction that this simplification leaves, the code looks for relationals with a constant on one side and a mutable symbol on the other. When these relationals constrain the symbol to a single value, that value is forced and the symbol is marked "chilled".

The assignment of the forced symbol is itself done using the same algorithm. Redundant assignments are ignored; an attempt to set a chilled symbol means the ruleset has inconsistent constraints and raises a fatal error.

Implementation

The reference implementation of CML2 is written in Python. This choice has an obvious virtue and a couple of non-obvious ones.

The obvious virtue is that Python is a a truly high-level language that does memory management automatically, eliminating the single most common source of bugs in languages without this property. However, several other non-obvious virtues are equally important. Here are some of them:

  • "Batteries are included." While Python does not have as rich an extension-module base as its main competitor Perl, rather more capability is bundled with the stock Python interpreter. One built-in facility that is particularly important for CML2 is Python's "pickle" or object-serialization support. A CML2 rulebase is a pickled object.

  • Python, unlike other scripting languages, can be (effectively) compiled to pure C using the freeze facility. The translation is not pretty, and produces rather large C programs from even small Python sources, but it does meet the problem of portability head-on. Kernels could be shipped with a precompiled rulebase and a frozen C version of the CML2 interpreter to avoid the requirement for Python.

  • Another non-obvious virtue is the way that Python supports conditional runtime loading of support modules. We can use this to detect and recover from situations in which the library support does not exist to provide Tcl-based or curses-based front end.

Note: the CML2 implementation uses two modules that are not part of stock Python. One is an enhanced version of shlex.py slated to enter the stock environment in Python 1.6. The other is John Aycock's SPARK toolkit, used for expression parsing. Copies of shlex.py and spark.py are shipped with the CML2 implementation.