Sexpr Encoding
A zettel representation that is a s-expression (also known as symbolic expression).
It is (relatively) easy to parse and contain all relevant information of a zettel, metadata and content. For example, take a look at the Sexpr encoding of this page, which is available via the Info
sub-page of this zettel:
- /z/00001012920516?enc=sexpr&part=zettel,
- /z/00001012920516?enc=sexpr&part=meta,
- /z/00001012920516?enc=sexpr&part=content.
If transferred via HTTP, the content type will be text/plain.
Syntax of s-expressions
There are only two types of elements: atoms and lists.
A list always starts with the left parenthesis ((
, U+0028) and ends with a right parenthesis ()
, U+0029). A list may contain a possibly empty sequence of elements, i.e. lists and / or atoms. Before the last element of a list of at least to elements, a full stop character (.
, U+002E) signal a pair as the last two elements. This allows a more space economic storage of data.
There are three syntactic forms for an atom: numbers, symbols and strings.
A number is a non-empty sequence of digits (0
... 9
). The smallest number is 0
, there are no negative numbers.
A symbol is a non-empty sequence of printable characters, except left or right parenthesis. Unicode characters of the following categories contains printable characters in the above sense: letter (L), number (N), punctuation (P), symbol (S). Symbols are case-insensitive, i.e. ZETTEL
and zettel
denote the same symbol.
A string starts with a quotation mark ("
, U+0022), contains a possibly empty sequence of Unicode characters, and ends with a quotation mark. To allow a string to contain a quotations mark, it must be prefixed by one backslash (\
, U+005C). To allow a string to contain a backslash, it also must be prefixed by one backslash. Unicode characters with a code less than U+FF are encoded by by the sequence \xNM
, where NM is the hex encoding of the character. Unicode characters with a code less than U+FFFF are encoded by by the sequence \uNMOP
, where NMOP is the hex encoding of the character. Unicode characters with a code less than U+FFFFFF are encoded by by the sequence \UNMOPQR
, where NMOPQR is the hex encoding of the character. In addition, the sequence \t
encodes a horizontal tab (U+0009), the sequence \n
encodes a line feed (U+000A).
Atoms are separated by Unicode characters of category separator (Z).