<img src="hugo.gif" alt="hugo> <english>This is Hugo.</english> <suomi>Tämä on Hugo.</suomi>and as result, you want two documents: an English one
<img src="hugo.gif" alt="hugo> This is Hugo.and a Finnish one
<img src="hugo.gif" alt="hugo> Tämä on Hugo.This can easily be achieved by defined two macro sets, one being stored as english.hsc
<$macro english /close><$content></$english> <$macro suomi /close></$suomi>and another one stored as suomi.hsc
<$macro english /close></$english> <$macro suomi /close><$content></$suomi>
The first one defines two container macros, with <english>
simply every time inserting the whole content passed to it, and
<suomi>
always removing any content enclosed in it.
hsc english.hsc hugo.hsc to en-hugo.htmlit will look like the first output document described above. To gain a result looking like the second one, you only have to use
hsc suomi.hsc hugo.hsc to fi-hugo.html
This is simply because the macros declared in
suomi.hsc work just the other way round like those in
english.hsc: everything enclosed in <english>
will
be ignored, and everything being part of <suomi>
remains.
This version of hsc officially only supports Latin-1 as input character set. The exact definition of that is a bit messy, but basically it refers to most of those 255 characters you can input on your keyboard.
For this character set, all functions described herein should work, especially the commandline option RplcEnt.
Although Latin-1 is widely used within most decadent western countries, it does not provide all characters some people might need. For instance those from China and Japan, as their writing systems work completely different.
As the trivial idea if Latin was to use 8 bit instead of the rotten 7 bit of ASCII (note that the ``A'' in ASCII, stands for American), the trivial idea of popular encodings like JIS, Shift-JIS or EUC is to use 8 to 24 bit to encode one character.
Now what does hsc say if you feed such a document to it?
Unless you do not specify RPLCENT, it should work
without much bothering about it. However, you will need a w3-browser
that also can display these encodings, and some fiddling with
<META>
and related tags.
If you think you are funny and enable RPLCENT, hsc will still not mind your input. But with great pleasure it will cut all your nice multi-byte characters into decadent western 8-bit ``cripplets'' (note the pun). And your browser will display loads of funny western characters - but not a single funny Japanese one.
Recently an old western approach to these encodings problems has gained popularity: Unicode - that's the name of the beast - was created as some waste product of the Taligent project around 1988 or so, as far as I recall.
Initially created as an unpopular gadget not supported by anything, it is now in everybody's mouth, because Java, the language-of-hype, several MS-DOS based operating systems and now - finally - the rotten hypertext language-of-hype support it. At least to some limited extent. (Technical note: Usually you only read of UCS-2 instead of UCS-4 in all those specifications, and maybe some blurred proposals to use UTF-16 later.)
As hsc is written in the rotten C-language (an American product, by the way), it can not cope with zero-bytes in its input data, and therefore is unable to read data encoded in UCS-4, UTF-16 or (würg, kotz, reiha) UCS-2; it simply will stop after the first zero in the input.
Because the rotten C-language is so widely used, there are some
zero-byte work-around formats for Unicode, most remarkably UTF-8 and
UTF-7. These work together with hsc, although with the same
limitations you have to care for when using the eastern encodings
mentioned earlier. Read: Don't use the option RplcEnt
.
Note that it needs at least five encodings to make Unicode work with most software - again in alphabetical order: UCS-2, UCS-4, UTF-16, UTF-7 and UTF-8. I wonder what the ``Uni'' stands for...
Anyway, as conclusion: you can use several extended character sets, but you must not enable RPLCENT.Once upon a time, HTML-4.0 was released, and it sucked surprisingly less (as far as ``sucks less'' is applicable at all to HTML). Of course there still is no browser capable of displaying all these things, but nevertheless you can use hsc to author for it - with some limitations. This will shortly outline how.
As already mentioned, HTML now supports those extended character encodings. See above how to deal with input files using such an encoding, and which to avoid.
If your system does not allow you to input funny characters (for
instance one can easily spend ATS 500.000 on a Workstation just for
being absolutely unable to enter a simple ``ä''), you can use
numeric entities, both in their decimal or hexadecimal representation:
for example, to insert a Greek Alpha, you can use
Α
or Α
, hsc will accept
both. However, you still can not define entities beyond 8-bit range
using <$defent>
.
Some highlights are that the ALT
attribute of <IMG>
is no required and that there are now loads of ``URIs'' instead of
``URLs'' around. Nothing new for old hsc-users... he he he.
Another interesting thing is that the DTD now contains some meta-information that was not part of earlier DTDs so it maybe can make sense to use the DTD as a base for a hsc.prefs converter.
2002 update: seems the W3 committee has learned a thing or two. XHTML has
been out for a while now, and they are working on the 2.0 specification. While
the chances of turning the official DTD into an HSC prefs file using a dumb
ARexx script have gotten even slimmer (anyone for a real parser using Expat or
something?), XHTML seems a move in the right direction, regarding the
separation of content and presentation and putting an end to the ``tag
soup'' that much of the Web is today. It remains to be seen how successful
it will be. HSC now has some rudimentary support for authoring XHTML documents,
mainly regarding lowercase tag and attribute names and the new empty-tag syntax
with a trailing slash, as in ``<br />
''. CSS support should be
better though, perhaps some automatic rewriting of obsolete presentation
attributes to CSS <style>
tags...
As you can now optionally read this manual in a Postscript version, there might be some interest how it was done.
The rudimentarily bearable application used for conversion is (very originally) called html2ps and can be obtained from http://www.tdb.uu.se/~jan/html2ps.html. As common with such tools, ``it started out as a small hack'' and ``what really needs to be done is a complete rewriting of the code", but "it is quite unlikely that this [...] will take place''. The usual standard disclaimer of every public Perl-script. All quotes taken from the manual to html2ps.
Basically the HTML and the Postscript-version contain the same words. However, there are still some differences, for example the printed version does not need the toolbar for navigation provided at the top of every HTML page.
Therefore, I wrote two macros, <html-only>
and
<postscript-only>
. The principle works exactly like the one
described for <english>
and <suomi>
earlier in this
chapter, and you can find them in docs-source/inc/html.hsc and docs-source/inc/ps.hsc.
However, there is a small difference to the multi-lingual examples, as I do not really want to create two versions all the time. Instead, I prefer to create either a fully hypertext featured version or a crippled Postscript-prepared HTML document in the same location.
You can inspect docs-source/Makefile how this
is done: if make is invoked without any special options, the
hypertext version is created. But if you instead use make
PS=1 and therefor define a symbol named PS
, the
pattern rule responsible for creating the HTML documents acts
differently and produces a reduced, Postscript-prepared document
without toolbar.
$(DESTDIR)%.html : %.hsc ifdef PS @$(HSC) inc/ps.hsc $(HSCFLAGS) $< else @$(HSC) inc/html.hsc $(HSCFLAGS) $< endif
Needless to say that the conditional in the Makefile does not work with every make - I used GNUmake for that, your make-tool maybe has a slightly different syntax.
For my convenience, there are two rules called rebuild
and rebuild_ps
with their meanings being obvious: they
rebuild the whole manual in the desired flavour.
So after a successful make rebuild_ps
, everything only
waits for html2ps. Maybe you want to have a look at the
docs-source/html2ps.config used, although it is
strait forward and does not contain anything special. This should not
need any further comments, as there is a quite useful manual supplied
with it.
However, making html2ps work with an Amiga deserves some remarks. As you might already have guessed, you will need the Perl-archives of GG/ADE - no comments on that, everybody interested should know what and where GG is.
I suppose you can try the full Unix-alike approach with hsc compiled for AmigaOS/ixemul and GG more or less taking over your machine, and therefor directly invoke perl. This will require a rule likeps : html2ps -W l -f html2ps.config -o ../../hsc.ps ../docs/index.html
As I am a dedicated hater of this, I used the AmigaOS-binary, a SAS-compiled GNUmake and the standard CLI. A usually quite successful way to make such things work is with the help of ksh, which, for your confusion, is in a archive at GG called something like pdksh-xxx.tgz (for ``Public Domain ksh''). Invoking ksh with no arguments will start a whole shell-session (würg!), but you can use the switch -c to pass a single command to be executed. After that, ksh will automatically exit, and you are back in your cosy CLI, just as if nothing evil had had happened seconds before.
So finally the rule to convert all those HTML files into one huge Postscript file on my machine is:
ps : ksh -c "perl /bin/html2ps -W l -f html2ps.config -o ../../hsc.ps ../docs/index.html"
Note that html2ps is smart enough to follow those
(normally invisible) <LINK REL="next" ..>
tags being part of
the HTML documents, so only the first file is provided as argument,
and it will automatically convert the other ones.
Well, it least you see it can be done.