CML2 Language Elements

Syntax

Lexically, a CML program consists tokens: of barewords, whitespace, strings and punctuation (there is one exception associated with the icon declaration). A bareword is a token composed of alphanumeric characters and _ (underscore). Whitepace is any mix of spaces, tabs, and linefeeds. A string is delimited by either single or double quotes and may contain whitespace. Everything else is punctuation. Some pairs of punctuation characters stick together; ==, !=, <=, >=. All other punctuation is treated as single-character tokens.

Here are lexical rules, regular expressions describing valid tokens:
 
<symbol>      ::= [A-Z][A-Za-z0-9_]*
<menu-id>     ::= [a-z][a-z0-9_]*
<string>      ::= '[^']*'|&dquot;[^&dquot;]*&dquot;
<decimal>     ::= [0-9]+
<hexadecimal> ::= 0x[A-Fa-f0-9]+
 
base64-data is any number of lines in RFC2045 base64 format, terminated by a newline or comment.

Also, note that there is a lexical-level inclusion facility. The token "source" is interpreted as a request to treat the immediately following token as a filename (string quotes are first stripped if it is a string). Upon encountering this directive, the compiler opens the file (which may be a relative or absolute pathname), and read input from that file until EOF. On that EOF, the current input source is immediately popped back to the including file.

Comments are supported, and run from a # to end-of-line.

Here is a BNF of the grammar. Following it, each language element will be described in detail.

 
;; A few things we need to define up front...
;;
<tritval> ::= 'y' | 'm' | 'n'		;; Yes, no, or module
<name>    ::= <menu-id>|<symbolname>	;; A bareword composed of alphanumerics
<int>     ::= <decimal> | <hexadecimal> ;; Integer literal

;; A CML system consists of a sequence of declarations.
;;
<system> ::= <declaration>*

;; A declaration may be of one of the following types:
;;
<declaration> ::= <source-statement>
              |   <symbols-declaration>
	      |   <menus-declaration>
	      |   <helpfile-declaration>

	      |   <private-declaration>
	      |   <visibility-rule>
	      |   <menu-definition>
	      |   <radio-definition>
	      |   <derive-definition>
	      |   <default-definition>
	      |   <requirement-definition>

	      |   <start-definition>
	      |   <prefix-definition>
	      |   <banner-definition>

	      |   <options-definition>
	      |   <condition-declaration>
	      |   <warndepend-declaration>
	      |   <icon-definition>
	      |   <debug-definition>

;; A source statement declares a file to be inserted in place of
;; the source statement in the ruleset.
;;
<source-statement> ::= 'source' <string>

;; A symbols definition creates configuration symbols, 
;; and associates prompt strings with them.
;;
<symbols-declaration> ::= 'symbols' {<symbol> <string>}*

;; A menus definition creates menu symbols, 
;; and associates banner strings with them.
;;
<menus-declaration> ::= 'menus' {<menu-id> <string>}*

;; A helpfile declaration declares a file to be scanned for help text
;; 
<helpfile-declaration> ::= 'helpfile' <word>

;; A private declaration declares that the associated symbols should
;; not be exported in the end-result configuration
;;
<private-declaration> ::= 'private' <symbol>*

;; A visibility rule associates a visibility predicate with symbols.
;; Optionally, it may declare that the suppressed symbols are constrained
;; in value by the predicate symbols.
;;
<visibility-rule> ::=  'unless' <logical> 'suppress' ['dependent'] <name>*;

;; A menu definition links a sequence or subtree of symbols with a 
;; menu identifier.  Subtrees generate implied dependency statements.
;;
<menu-definition> ::= 'menu' <menu-id> <name-or-subtree>*
<name-or-subtree> ::= <name> <suffix>
		  |   <name> <suffix> '{' <name-or-subtree>* '}'
 
<suffix> ::=		;; Empty suffix declares boolean type
	 | '?'		;; declares trit type
	 | '%'		;; declares decimal type
	 | '@'		;; declares hexadecimal type
	 | '$'		;; declares string type

;; A radio-menu definition links a choice of symbols with a menu identifier.
;;
<radio-definition> ::= 'choices' <menu-id> <symbol>* 'default' <symbol>

;; A derivation binds a symbol to a formula, so the value of that
;; symbol is always the current value of the formula.
;;
<derive-definition> ::= 'derive' <symbol> 'from' <expr>

;; A default definition sets the value a symbol will have unless it is
;; explicitly set by the user's answer to a question. It may have a 
;; range specification attached.
;;
<default-definition> ::= 'default' <symbol> 'from' <expr> 
                                   ['range' {<int> | {<int> '-' <int>}}+]

;; A requirement definition constrains the value of one or more symbols 
;; by requiring that the given expression be true in any valid configuration.
;;
<requirement-definition> ::=  {'require'|'prohibit'} <logical>

;; We have to declare a start menu, for the beginning of execution
;;
<start-definition> ::= 'start' <menu-id>

;; A prefix definition sets a string to be prepended to all symbols
;; when they are named in a configuration file.
;;
<prefix-definition> ::= 'prefix' <string>

;; A banner definition sets a string to used in the configurator
;; greeting line.
;;
<banner-definition> ::= 'banner' <menu-id>

;; An option definition sets command-line options for the configurator
;;
<options-definition> ::= 'options' <word>*

;; A condition statement ties a &CML2; control flag to a symbol
;;
<condition-declaration> ::= 'condition' <word> 'on' <configsymbol>

;; A warndepend flags symbols that make dependents dangerous
;;
<warndepend-declaration> ::= 'warndepend' <name>*

;; An icon definition associates data for a graphic icon with the
;; rulebase.
;;
<icon-definition> := 'icon' <base64-data>

;; A debug definition enables debugging output.
;;
<debug-definition> ::= 'debug' <decimal>

;; An expression is a formula
;;
<expr> ::= <expr> '+' <expr>
        | <expr> '-' <expr>
	| <expr> '*' <expr>
	| <logical>

<logical> ::= <logical> 'or' <logical>
	  | <logical> 'and' <logical>
	  | <logical> 'implies' <logical>
	  | <relational>

<relational> ::= <term> '==' <term>
		 | <term> '!=' <term>
		 | <term> '<=' <term>
		 | <term> '>=' <term>
		 | <term>
		 | 'not' <relational>

<term> ::= <term> '|' <term>	;; maximum or sum or union value 
       | <term> '&' <term>	;; minimum or multiple or intersection value
       | <term> '$' <term>	;; similarity value
       | <atom>

<atom> ::= <symbol>
        | <tritval>
        | <string>
        | <decimal>
        | <hexadecimal>
	| '(' <expr> ')'
 

Operators have the precedence implied by the above productions. From least to most binding, precedence classes are:

 
      1: + -
      2: *
      3: implies
      3: or
      4: and
      5: not
      6: ==, !=, >=, <=, >, <
      7: &, |, $
 

Data types and classes

CML2 supports the following data types:

  • Booleans. These may have the values `y' or `n'

  • Tristates or trits. These may have the values `y', 'm', or `n'

  • Decimal integers. 32-bit signed integers with decimal I/O formatting.

  • Hexadecimal integers. 32-bit signed integers with hexadecimal I/O formatting.

  • Strings. Strings are literal data in the ASCII character set encoding.

Support for trits may be disabled at runtime. See the section called Condition statement for discussion of the condition/on declaration.

There are four classes of symbols; constant symbols, query symbols, derivation symbols, and frozen symbols.

A constant is one of the boolean/tristate literals y or m or n, or an integer literal, or a string literal.

A query symbol is an ordinary, mutable symbol with a prompt string. Each query must occur exactly once in the menu tree. Query symbols may be set by the user.

A derivation is a symbol bound to an expression. Derivation symbols are immutable, but may vary as the symbols in their formula change value. Derived symbols have no associated prompt string and may not appear in the menu tree.

A frozen symbol is a query symbol which has been immutably bound to a particular value. Once frozen, the value of a symbol may not be changed.

Meaning of the language elements

Source statements

A source statement declares a file to be inserted in place of the source statement in the file, and treated as if the entire contents of that file were present in the current file at the point of their source statement.

Any implementation of CML2 must allow source statements to be nested to a depth of at least 15 levels. The reference implementation has no hard limit.

Symbol declarations

The body of a symbols section consists of pairs of tokens; a configuration symbol and a prompt string.

Rationale: Having these name-to-prompt associations be separate from the dependency rules will help make the text parts of the system easier to localize for different languages. Declaring all query symbols up front means we can do better and faster sanity checks. Some symbols (derivations) are not pre-declared.

Menu declarations

The body of a menus section consists of pairs of tokens; a menu name and a banner string. The effect of each declaration is to declare an empty menu (to be filled in later by a menu definition) and associate a banner string with it.

Any implementation of CML2 must allow menus to be nested to a depth of at least 15 levels. The reference implementation has no hard limit.

Rationale: Having these menu-to-banner associations be separate from the dependency rules will help make the text parts of the system easier to localize for different languages. Declaring all menu names up front means we can do better and faster sanity checks.

Helpfile declarations

A helpfile declaration tells the compiler to mine a given file for help texts. The compiler's assumption is that the file is in the format of a CML1 help file: entries are begun by two lines, the first containing a prompt string and the second beginning with the string CONFIG_.

The format of helpfiles may be changed in future releases.

Private declarations

A private declaration sets the private bit on each symbol of a list of symbol names. Symbols on which this bit is set are not written to the final configuration file.

Rationale: Sometimes you may want to make multiple queries, the results of which are not used directly in the configuration file but become independent variables in the derivation of a symbol that is used. In this kind of case, it is good practice to make the query symbols private.

Visibility rules

A visibility declaration associates a visibility predicate with a set of configuration symbols. The fact that several symbols may occur on the right side of such a rule is just a notational convenience; the rule

      unless GUARD suppress SYMBOL1 SYMBOL2 SYMBOL3

is exactly equivalent to

      unless GUARD suppress SYMBOL1
      unless GUARD suppress SYMBOL2
      unless GUARD suppress SYMBOL3

Putting a menu on the right side of a visibility rule suppresses that menu and all its children.

Dependence

Optionally, a rule may declare that the suppressed symbols are constrained in value by the predicate symbols. That is, if there is a rule

      unless GUARD suppress dependent SYMBOL

then the value of SYMBOL is constrained by the value of GUARD in the following way:

guard	trit	bool
-----	------	-------
  y      y,m,n	y,n
  m      m,n	y,n
  n      n	n

The reason for this odd, type-dependent logic table is that we want to be able to have boolean option symbols that configure options for modular ancestors. This is why the guard symbol value m permits a dependent boolean symbol (but not a dependent modular symbol) to be y.

If the guard part is an expression, SYMBOL is made dependent on each symbol that occurs in the guard. Such guards may not contain alternations or `implies'. Thus, if FOO and BAR and BAZ are trit symbols,

    unless FOO!=n and BAR==m suppress dependent BAZ

is equivalent to the following rules:

     unless FOO!=n and BAR==m suppress BAZ
     require BAZ <= FOO and BAZ <= BAR 

Putting a menu on the right side of a visibility rule with `dependent' puts the constraint on all the configuration symbols in that menu. Any submenus will inherit the constraint and pass it downward to their submenus.

Dependency works both ways. If a dependent symbol is set y or m, the value of the ancestor symbol may be forced; see the section called Symbol Assignment and Side Effects for discussion.

Rationale: The syntax is unless...suppress rather than if...query because the normal state of a symbol or menu is visible. The dependent construction replaces the dep_tristate and dep_bool constructs in CML1.

Menu definitions

A menu definition associates a sequence of configuration symbols and (sub)menu identifiers with a menu identifier (and its banner string). It is an error for any symbol or menu name to be referenced in more than one menu.

Symbol references in menus may have suffixes which change the default boolean type of the symbol. The suffixes are as follows:

	?      trit type
	%      decimal type
	@      hexadecimal type
	$      string type

A choices definition associates a choice of boolean configuration symbols with a menu identifier (and its banner string). It declares a default symbol to be set to y at the time the menu is instantiated.

In a complete CML2 system, these definitions link all menus together into a single big tree, which is normally traversed depth-first (except that visibility predicates may suppress parts of it).

If the list of symbols has subtrees in it (indicated by curly braces) then the symbol immediately before the opening curly brace is declared a visibility and dependency guard for all symbols within the braces. That is, the menu declaration

	menu foo 
	     SYM1 SYM2 {SYM3 SYM4} SYM5

not only associates SYM[12345] with foo, it also registers rules equivalent to

	   unless SYM2 suppress dependent SYM3 SYM4

Such subtree declarations may be nested to any depth.

It is perfectly legal for a menu-ID to have no child nodes. In CML2, this is how you embed text in menus, by making it the banner of of a symbol with no children.

Derivations

A derivation binds a symbol to a formula, so the value of that symbol is always the current value of the formula. Symbols may be evaluated either when a menu containing them is instantiated or at the time the final configuration file is written.

The compiler performs type inference to deduce the type of a derived symbol. In particular, derived symbols for which the top-level expression is an arithmetic operator are deduced to be decimal. Derived symbols for which the top level of the expression is a boolean operator are deduced to be bool. Derived symbols for which the top level of the expression is a trit operator are deduced to be trit.

Derived symbols are never set directly by the user and have no associated prompt string.

Defaults

A default definition sets the value a symbol will have until it is explicitly set by the user's answer to a question. The right-hand side of a default is not limited to being a constant value; it may be any valid expression.

Defaults may be evaluated either when a menu containing them is instantiated or at the time the final configuration file is written.

If a symbol is not explicitly defaulted, it gets the zero value of its type; n for bools and trits, 0 for decimal and hexadecimal symbols, and the empty string for strings.

The optional range part may be used to constrain legal values for decimal or hexadecimal-valued symbol. A range specification consists of any number of either single values or paired upper and lower bounds separated by a dash, interpreted as inclusive ranges. The symbol has a legal value if it either matches a specified single value or is contained in one of the intervals.

Requirements

Requirements as sanity checks

A requirement definition constrains the value of one or more symbols by requiring that the given expression be true in any valid configuration. All constraints involving a given configuration symbol are checked each time that symbol is modified. Every constraint is checked just before the configuration file is written.

It is up to individual CML2 front ends to decide how to handle constraint violations. Here are some possible policies:

  • Complain and die. Not recommended, but perhaps appropriate for a batch-mode front end.

  • Conservative recovery: Disallow the modification that would violate the constraint. (Thus, earlier answers have priority over later ones.)

  • Flag-and-continue: visibly flag all symbols involved in a constraint violation (and unflag them whenever a constraint violation is fixed). Require the user to resolve all constraint violations before the configuration file is saved.

  • Backtracking: Present all the menus involved in the constraint. Accept modifications of any of them, but do not allow the modifications to be committed until all constraints are satisfied.

A prohibit definition requires that the attached predicate not be true. This is syntactic sugar, added to accommodate the fact that human beings have troouble reasoning about the distribution of negation in complex predicates.

Using requirements to force variables

Requirements have a second role. Certain kinds of requirements can be used to deduce values for variables the user has not yet set; the CML2 interpreter does this automatically.

Every time a symbol is changed, the change is tried on each declared constraint. The constraint is algebraicly simplified by substituting in constant, derived and frozen symbols. If the simplified constraint forces an expression of the form A == B to be true, and either A is a query symbol and B is a constant or the reverse, then the assignment needed to make A == B true is forced.

Thus, given the rules

derive SPARC from SPARC32 or SPARC64
require SPARC implies ISA==n and PCMCIA==n and VT==y and VT_CONSOLE==y
	and BUSMOUSE==y and SUN_MOUSE==y and SERIAL==y and SERIAL_CONSOLE==y
	and SUN_KEYBOARD==y

when either SPARC32 or SPARC64 changes to y, the nine assignments implied by the right side of the second rule will be performed automatically. If this kind of requirement is triggered by a guard consisting entirely of frozen symbols, all the assigned symbols become frozen.

If A is a boolean or trit symbol and B simplifies to a boolean or trit constant (or vice-versa), assignments may be similarly forced by other relationals (notably A != B, A < B, A > B, A <= B, and A >= B). If forcing the relational to be true implies only one possible value for the symbol involved, then that assignment is forced.

Note that whether a relational forces a unique value may depend on whether trits are enabled or not.

Start declaration

The start definition specifies the name of the root menu of the hierarchy. One such declaration is required per CML2 ruleset.

Prefix declaration

A prefix declaration sets a string to be prepended to each symbol name whenever it is written out to a result configuration file. This prefix is also stripped from symbol names read in in a defconfig file.

Rationale: This was added so the CML2 rule system for the Linux kernel would not have to include the common CONFIG_ prefix. The alternative of wiring that prefix into the code would compromise CML2's potential usefulness for other applications.

Banner declaration

A banner definition sets the menu id banner string to used in the configurator greeting line. The string attached to the specified menu id should identify the system being configured.

Rationale: As for the prefix string.

Options

Note

The options statement sets options which will be passed to the configurator instance as though they had been specified on the command line of the option. These options are processed before actual command-line options.

Note that switch indicators beginning with - will need to be string-quoted to avoid being broken up by the lexical analyzer.

Rationale: this will typically be used to pre-set the locations of configurator output files.

Condition statement

The condition statement ties a CML2 feature flag to a query symbol; that is, the value of the feature flag is the value of the symbol. The initial value of the flag when a rulebase is read in is simply the associated symbol's default. If there is no symbol associated with the the flag, the flag's value is n.

At present only one flag, named "trits", is supported. When this flag is n, trit-valued symbols are treated as booleans and may only assume the values y and n.

This flag may affect the front end's presentation of alternatives for modular symbols. It also affects forcing of ancestor symbols. When the trits flag is on, setting a boolean symbol only forces its trit ancestors to the value m; when trits is off, they are forced to y. See the section called Symbol Assignment and Side Effects for discussion.

Warndepend declaration

The warndepend declaration takes a list of symbol names. All dependents of each symbol have their prompts suffixed with the name of the symbol in parentheses to indicate that they are dependent on it.

Rationale: Added to support the EXPERIMENTAL symbol in the Linux lernel configuration. This declaration is better than tracking experimental status by hand because it guarantees that subsidiary symbols dependent on an experimental feature will always be flagged for the user.

Icon declaration

An icon declaration associates graphic data (encoded in RFC2045 base64) with the rulebase. Front ends may use this data as an identification icon. All front ends are required to accept XPM data here.

The reference front-end implementation uses the image to iconify the configurator when it is minimized while running in X mode. The reference front-end also accepts GIF data.

Debug

This declaration enables debugging output from the compiler (it has no effect on front-end behavior). It takes an integer value and uses it to set the current debug level. It may change or be removed in future releases.

Expressions

All arithmetic is integer. The compiler permits some kinds of type promotion, described below.

For purposes of the relational operators, trit values are strictly ordered with y > m > n.

Boolean logical expressions may be used as parts of integer-valued formulas (e,g in derivations and constraints). The value of true is 1, and of false is zero.

It is a compile-time error to apply the logical operators or/and/implies to trit or numeric values. Also, expressions occuring in guards (in unless/suppress, or require/prohibit declarations) must yield a value of boolean type. The compiler does type propagation in expressions to check these constraints.

The purpose of these restriction is to enable compile-time detection of situations where confusion of trit or numeric with boolean values might induce subtle errors. For the same reason, if the symbol FOO is trit-valued it is a compile-time error to say just "FOO" in an expression, as opposed to "FOO!=n" or some other more explicit relational.

Thus, because the symbol SCSI is trit-valued:

    unless SCSI suppress A2091_SCSI

is illegal and will raise an error. Write an unambiguous test instead:

    unless SCSI>=m suppress A2091_SCSI

The obvious booleans operations (and, or) are supported; they are commutative and associative. An 'implies' operation is also supported:

    FOO implies BAR  <=>  not (FOO and (not BAR))

It is neither commutative nor associative.

The usual relational tests (==, !=, >=, <=, >, <) are supported. Relationals bind more tightly than boolean operators, so FOO!=n and BAR==m behaves as expected. Additionally, and binds more tightly than or, so that FOO or BAR and BAZ is FOO or (BAR and BAZ).

The following additional ternary-logic operations are available. It is an error to apply these to operands with types other than bool or trit.

Max or union; notation |. Here is a truth table:

    y m n
  +------
y | y y y
m | y m m
n | y m n

Min or intersection; notation &. Here is a truth table:

    y m n
  +------
y | y m n
m | m m n
n | n n n

Similarity; notation $. Here is a truth table:

    y m n
  +------
y | y n n
m | n m n
n | n n n

The operator precedence implied by the BNF above is implemented in the parser.

Symbol Assignment and Side Effects

Setting a symbol's value may have side effects on other symbols in two ways.

First, it may trigger a change in other variables through explicit requirements. See the section called Using requirements to force variables for discussion.

Second, each symbol has two implicit lists associated with it: of symbols it depends on (ancestors) and symbols that depend on it (dependents). Whenever a symbol is changed, any side effects are propagated through those lists. Changing the value of the symbol upward (n↝m, m↝y) may change the value of ancestors; changing it downard (y↝m, m↝n) may affect the value of dependents.

See also the section called Dependence for discussion of the two syntactically different ways dependencies can be created, and section V for discussion of the deduction algorithm.

CML2 interpreters are required to implement all-or-nothing side effects; that is, after an assignment, either the assignment and all its side effects have been performed, or (in the event the new values would violate a requirement) none of them are.

The reference implementation achieves this by implementing two-phase commit; the assignment and its side effects can be made tentatively, constraints checked, and then either committed or rolled back.

Side-effect bindings remain linked to the symbol whose value change triggered them, and are backed out whenever that symbol is changed again. Backing out a side effect may expose previous side effects on a symbol. To see how this works, consider the following sequence of actions given the constraints (FOO==y implies BAR==y) and (BAZ==y implies BAR==n):

   1. User sets FOO=y.  As a side effect, this sets BAR=y
   2. User sets BAZ=y.  As a side-effect, this sets BAR=n
   3. User sets BAZ=n.  This does not have a direct side-effect on BAR.
      However, since the value BAZ has changed, its side effect BAR=n
      is backed out.  The value of BAR is again y.

The reference implementation journals all side-effects and always looks for the most recent binding of a symbol when evaluating it.