SX

Files in sxhtml/ of tip
Login

Files in directory /sxhtml from the latest check-in


SxHTML - Generate HTML from S-Expressions

HTML can be represented as a symbolic expression, also called s-expression or sexpr (for short). This is a similar approach compared to SXML, an attempt to encode XML as S-expressions.

For example, the following simple HTML text:

<html>
  <head><title>Example</title></head>
  <body>
    <h1 id="main">Title</h1>
    <p>This is some example text.</p>
    <hr>
    <div class="small" id="footnote">Small text.</div>
  </body>
</html>

A s-expression representation could be:

(html
  (head (title "Example"))
  (body
    (h1 (@ {id "main}) "Title")
    (p "This is some example text.")
    (hr)
    (div (@ {class "small"} {id "footnote}) "Small text".)
  )
)

The s-expression representation has the advantage of easier parsing than the HTML text. In addition, a s-expression can be easier analysed and possibly optimized, compared to a string representation. For example, a ((p) (p)) can be simplified to ((p)). Similar there are circumstances, where a (li (p "text)) should be transformed to (li "text").

This library allows to generate HTML from s-expressions.

Often, HTML is generated by using string template libraries, like Mustache (many programming languages), Jinja (Python), or html/template (Go).

One problem area is to escape certain characters, which have a special meaning in various parts of the HTML text. Obviously, the less-than character "<" signals the beginning of a tag and cannot be used literally in normal text. It must be replaced by "&lt;". Now, the ampersand character "&" has a special meaning too. It must be replaced with "&amp;". But this is only true for ordinary HTML content. Within HTML attributes (for example "href" in "<a href="...">...</a>"), other characters must not occur. If you embed JavaScript in your HTML text, there is another set of rules.

Most string template libraries fail on certain scenarios. Mustache provide replacement characters only for HTML content, but not even for HTML attributes. Similar for Jinja. The html/template library for Go requires the developer to correctly specify the adequate escaping mode.

This is because string template libraries operates just at the string level. All structure of the HTML text is lost.

By using a structured representation of HTML, the HTML generator knows about the specific context and can automatically select the appropriate escape mode.

Language

SxHTML is relatively lenient about the supported HTML language. However, if in doubt, it is targeted for HTML5. All tag and attribute names must be lowercase symbols. Do not use strings to specify a tag or an attribute. SxHTML does not check, if a symbol specifies a valid HTML tag or attribute. Some tag and attribute symbols have a special meaning.

https://html.spec.whatwg.org/multipage/syntax.html#void-elements specifies the list of void elements that does not have and end tag. All other tags will haven an end tag.

https://html.spec.whatwg.org/multipage/indices.html#attributes-1 associates attribute names with expected content. This will result in an additional escaping mechanism for specific content type. Currently, only URL content is recognized and escaped.

In addition to the list above, the are some heuristics in detecting content type based on the attribute name.

SxHTML defines some additional symbols, all starting with "@":

Tags

HTML defines some tags as void elements. A void element has no content, they have a start tag only. End tags must not be specified, SxHTML will not generated them. Any content except attributes are ignored. Void elements are: area, base, br, col, embed, hr, img, input, link, meta, source, track, and wbr.

Attributes

Attributes are always in the second position of a list containing a tag symbol. For example (a (@ (href . "https://zettelstore.de/sx/")) "SxHTML) specifies a link to the page of this library. It will be transformed to <a href="https://zettelstore.de/sx/">SxHTML</a>.

The syntax for attributes is as follows:

Since the attribute list is just a list, there might be duplicate symbols as attribute names. Only the first occurrence of the symbol will create an attribute. For example, (input (@ (disabled "no") (disabled . "yes"))) will be transformed to <input disabled="no">. This allows to extend the list of attributes at the front, if you later want to overwrite the value of an attribute.

If you want to prohibit the generation of some attribute while still exntending the list of attributes at the front, use the nil value () as the value of the attribute. For example, (input (@ (disabled ()) (disabled . "yes"))) will be transformed to <input>.

Content

HTML is not just about tags and attributes. These are there to structure the content. To specify content, use preferabily strings. Numbers are allowed too, you don't habe to convert them into a string. Symbols are ignored, because their string representation may change (for example if symbols are case-insensitive).