Syntax Definition

Everything hsc knows about HTML, it retrieves from a file named hsc.prefs at startup. This file contains information about all tags, entities and icon entites. Additionally, some special attributes are set up there also.

The main advantage of this concept is that it's rather easy to add new syntax elements. For this purpose the hsc tags <$deftag>, <$defent>, <$defstyle> and <$deficon> can be used.

Default Preferences

It is a serious problem about HTML that no one can give you competent answer to the question ``Now which tags are part of HTML?''. On the one hand, there is w3c, which you meanwhile can ignore, on the other hand, there are the developers of popular browsers, which implement whatever they just like.

The hsc.prefs coming with this distribution should support most elements needed for everyday use. With the hsc V0.923 release, the prefs have been updated to HTML 4.01; since V0.925 there has also been support for automatic distinction between ``classic'' HTML and XHTML. If you run hsc in XHTML mode, some obsolete attributes will not be known any more, and new ones added.

Searching For The Preferences

If you do not explicitly specify certain preferences by means of the commandline option PrefsFile, hsc will look in several places when trying to open hsc.prefs:

If it is unable to find hsc.prefs anywhere, it will abort with an error message.

If you want to find out where hsc has read hsc.prefs from, you can use STATUS=VERBOSE when invoking hsc. This will display the preferences used.

Special Tags To Modify Syntax Definition

All the tags below should be used within hsc.prefs only.

defent: Define an entity

This tag defines a new entity. The (required) attribute NAME declares the name of the entity, RPLC the character that should be replaced by this entity if found in the hsc-source and NUM is the numeric representation of this entity. NUM may be in the range 128-65535, allowing for any Unicode (UCS-2 to be exact) character to be assigned a corresponding entity. Definitions in the range 128-255 are done in the prefs-file to allow users with character sets other than ISO-8859-1 (Latin-1) to change the replacement characters; some other characters such as mathematical symbols or typographical entities are predefined internally by hsc. They reside at fixed positions in the Unicode charset and are unlikely to ever change.

Example: <$defent NAME="uuml" RPLC="ü" NUM="252">

The ENTITYSTYLE commandline option affects the way hsc will render entities in the resulting HTML file. Setting the PREFNUM attribute for an entity will make it use the numeric representation if ENTITYSTYLE=replace, no matter what representation was used in the source text.

Unlike previous versions, hsc 0.931 and later allow redefinition of entities. In this case, symbolic and numeric representation must match the previous definition; only the PREFNUM flag and the RPLC character will be updated. This allows to change the default rendering/replacement of internally defined entities. Warning #92 will be issued and should be ignored if you really want to do this.

deficon: Define icon-entity

This tag defines a new icon-entity. The only (required) attribute is NAME which declares the name of the icon.

Example: <$deficon NAME="mail">

deftag: Define a tag

This tag defines a new tag, and is used quite similar to <$macro>, exept that a tag-definition requires no macro-text and end-tag to be followed.

Example: <$deftag IMG SRC:uri/x/z/r ALT:string ALIGN:enum("top|bottom|middle") ISMAP:bool WIDTH:string HEIGHT:string>

To fully understand the above line, you might also want to read the sections about attributes and options for tags and macros.

For those, who are not smart enough or simply to lazy, here are some simple examples, which should also work somehow, though some features of hsc might not work:

<$deftag BODY /CLOSE BGCOLOR:string>
<$deftag IMG SRC:uri ALT:string ALIGN:string ISMAP:bool>

defstyle: Define a CSS property

This tag lets you define a new CSS property and optionally a list of values that are allowed for it. If you omit the VAL attribute, any value will be permitted. Otherwise it should be a list in pretty much the same style as for enum parameters: words (which may include spaces) separated by vertical bars.

<$defstyle name="text-align" val="left|center|right|justify">
<$defstyle name="text-indent" val="%P">
<$defstyle name="clip" val="%r|auto">

The text-align property has a short list of four possible values, so they are simply listed as an enumeration. text-indent on the other hand is numeric, so its values cannot be listed exhaustively. Therefore, a special code resembling C-style format strings is used. The following are supported:

Decimal digits only, e.g. for the z-index property.
Positive numeric value with a unit (one of pt, pc, in, mm, cm, px, em or ex), e.g. for word-spacing.
Numeric value as above, but possibly negative. Currently just for completeness and not used in CSS 2.0
Numeric value as for %n, but also allows percentages e.g. font-size.
Numeric value as above, but possibly negative, e.g. for text-indent.
A color specification as for background-color. One of
  • Color name as defined in HSC.COLOR-NAMES.
  • Hexadecimal colorspec of the form ``#rgb'' or ``#rrggbb''.
  • RGB-style spec of the form ``rgb(r,g,b)'', where each of r, g and b may be a decimal value between 0 and 255 or a percentage between 0 and 100.
A URI of the form ``uri(...)'', e.g. for background-image.
A rectangle of the form ``rect(a,b,c,d)'' with a, b, c and d being numeric specs with a dimension, e.g. for clip.

Note: If both the above placeholdes and an enumeration of values are used, as for ``clip'', the placeholder must be the first element!

varlist: Define an attribute list shortcut

This tag defines an attribute list shortcut to support your laziness when editing the prefs file. It allows to collect an arbitrary number of attribute declarations under a single name that you can use later in <$deftag> or <$macro> tags by putting the shortcut name in square brackets.

<$varlist HVALIGN ALIGN:enum("left|center|right|justify|char") VALIGN:enum("top|middle|bottom|baseline")>
<$deftag THEAD /AUTOCLOSE /LAZY=(__attrs) /MBI="table" [HVALIGN]>

This is the same as:

<$deftag THEAD /AUTOCLOSE /LAZY=(__attrs) /MBI="table" ALIGN:enum("left|center|right|justify|char") VALIGN:enum("top|middle|bottom|baseline")>

Why It Can Not Read DTDs

DTD is short for Document Type Definition. One of the early concept of HTML was that the first line of a document should contain a line that tells which DTD has been used to create this document. This could look like

Browsers should read that line, obtain the DTD and parse the source according to it. The problem about DTDs: they are written in SGML. And the problem about SGML: It's awful. It's unreadable. It's a pure brain-wanking concept born by some wireheads probably never seriously thinking about using it themselves. Even when there is free code available to SGML-parse text.

As a result, only less browsers did support this because it was too easy to write a browser spitting on the SGML-trash, simply parsing the code ``tag-by-tag'', developers decided to spend more time on making their product more user-friendly than computer-friendly (which is really understandable).

These browsers became even more popular when they supported tags certain people liked, but were not part of DTDs. As DTDs were published by w3c, and w3c did not like those tags, they did not made it into DTDs for a long time or even not at all (which is really understandable, too).

This did work for a certain degree until HTML-2.0. Several people (at least most of the serious w3-authoring people) did prefer to conform to w3c than use the funky-crazy-cool tags of some special browsers, and the funky-crazy-cool people did not care about DTDs or HTML-validators anyway.

However, after HTML-2.0, w3c fucked up. They proposed the infamous HTML-3.0 standard, which was never officially released, and tried to ignore things most browsers did already have implemented (which not all of them were useless crap, I daresay.). After more than a year without any remarkable news from w3c, they finally canceled HTML-3.0, and instead came out with the pathetic HTML-0.32.

Nevertheless, many people were very happy about HTML-0.32, as it finally was a statement after that many things became clear. It became clear that you should not expect anything useful from w3c anymore. It became clear that the browser developers rule. It became clear that no one is going to provide useful DTDs in future, as browser developers are too lazy and incompetent to do so. It became clear that anarchy has broken out for HTML-specifications.

So, as a conclusion, reasons not to use DTDs but an own format are:

Quite unexpectedly, with HTML-4.0 this has changed to some extent, as the DTDs are quite readable and well documented. The general syntax of course still sucks, error handling is unbearable for ``normal'' users and so on. Although it will take them more than this to get back the trust they abused in the recent years, at least it is a little signal suggesting there are some small pieces of brain intact somewhere in this consortium.


There is also a disadvantage of this concept: reading hsc.prefs every time on startup needs an awful lot of time. Usually, processing your main data takes shorter than reading the preferences. You can reduce this time, if you create your own hsc.prefs with all tags and entities you don't need removed. But I recommend to avoid this because you might have to edit your preferences again with the next update of hsc, if any new features have been added.