Syntax of Symbolic Expressions

Syntax of lists

A list always starts with the left parenthesis (“(”, U+0028) and ends with a right parenthesis (“)”, U+0029). A list may contain a possibly empty sequence of elements, i.e. lists and / or atoms.

Internally, lists are composed of cells. A cell allows to store two values. The first value references the first element of a list. The second value references the rest of the list, or is the special value nil if there is no rest of the list.

However, it is possible to store an atom as the second value of the last cell. In this case, before the last element of a list of at least two elements, a full stop character (“.”, U+002E) signals such a cell as the last two elements. This allows a more space-efficient storage of data.

A proper list, which contains nil as the second value of the last element might be pictured as follows:

VNVNVElem1Elem2ElemN

V is a placeholder for a value, N is the reference to the next cell (also known as the rest / tail of the list). The list above is represented as a symbolic expression in the form (Elem1 Elem2 ... ElemN)

An improper list will have a non-nil reference to an atom as the very last element

VNVNVVElem1Elem2ElemNAtom

Above improper list will be represented as a symbolic expression as (Elem1 Elem2 ... ElemN . Atom)

Syntax of numbers (atom)

A number is a non-empty sequence of digits (“0” ... “9”). The smallest number is 0, there are no negative numbers.

Syntax of symbols (atom)

A symbol is a non-empty sequence of printable characters, except left or right parenthesis. Unicode characters of the following categories contain printable characters in the above sense: letter (L), number (N), punctuation (P), symbol (S). Symbols are case-sensitive, i.e. “ZETTEL” and “zettel” denote different symbols.

Syntax of strings (atom)

A string starts with a quotation mark (“"”, U+0022), contains a possibly empty sequence of Unicode characters, and ends with a quotation mark. To allow a string to contain a quotation mark, it must be prefixed by one backslash (“\”, U+005C). To allow a string to contain a backslash, it also must be prefixed by one backslash. Unicode characters with a code point less than or equal to U+FF are encoded using the sequence “\xNM”, where NM is the hexadecimal encoding of the character. Unicode characters with a code point up to and including U+FFFF are encoded using the sequence “\uNMOP”, where NMOP is the hexadecimal encoding of the character. Unicode characters with a code point less than or equal to U+FFFFFF are encoded using the sequence “\UNMOPQR”, where NMOPQR is the hexadecimal encoding of the character. In addition, the sequence “\t” encodes a horizontal tab (U+0009), the sequence “\n” encodes a line feed (U+000A).

See also