Previous Up Next

Chapter 10  Overview of HeVeA

10.1  Introduction

Canthology uses the HeVeA application [6] to convert LaTeX documents into HTML format.1 In this chapter, I provide background information about HeVeA that is relevant to users of Canthology. This chapter is not intended to be a substitute for reading the HeVeA manual, but rather complements it. My advice is to read this chapter to get an overview of HeVeA and how to use it with Canthology. Afterwards, you should read the HeVeA manual to learn how to use HeVeA properly.

Currently, the use of HeVeA with Canthology is not supported on Windows.

10.2  LaTeX-to-HTML Converters

There are several (competing) tools for converting LaTeX-based documents into HTML, including LaTeX2HTML, TtH, TeX4ht, PlasTeX and HeVeA. None of these tools has a 100% success rate in such conversions. This is because there are some LaTeX (and TeX) constructs that have no counterparts in HTML. Nevertheless, a large subset of commonly-used LaTeX commands do have counterparts in HTML. LaTeX-to-HTML converters tend to work well with LaTeX documents that restrict themselves to such commands.

Some converters provide a way for users to extend the converter with rules for processing LaTeX commands for which the converter does not provide built-in support. Thus, use of a LaTeX-to-HTML converter tends to involve an iterative approach:

  1. Run the converter on a LaTeX document.
  2. Examine the generated HTML to see if any LaTeX commands were not translated properly. If so, then:
  3. Extend the converter with a new rule that enables a troublesome LaTeX command to be correctly converted to HTML. (If that is not possible, then rewrite the LaTeX document to avoid the need to use the troublesome LaTeX command.) Then go back to step 1.

Use of a LaTeX-to-HTML converter can be frustrating, at least initially, because you are likely to spend a lot of time working on step 3 in the above list. However, you will eventually learn which LaTeX commands can be used without causing difficulty for a HTML conversion. This knowledge will make it far easier to write future LaTeX documents that are compatible with the converter.

My favourite LaTeX-to-HTML converter is called HeVeA. In this chapter, I explain how to use Canthology with HeVeA.

10.3  Obtaining and Installing HeVeA

HeVeA is bundled with some LaTeX distributions and available in the application repositories for some Linux distributions. If you cannot get HeVeA from those sources, then you can download it from the HeVeA website. HeVeA was designed for use on Linux and other UNIX-like operating systems. However, a Windows port of HeVeA is also available.

10.4  Running HeVeA

The HeVeA distribution contains several several applications that are intended to be used together.

hevea
This application translates a ".tex" document into a monolithic HTML file.
hacha
This application splits a monolithic HTML file produced by hevea into a collection of HTML files. By default, hacha creates a separate HTML file for each chapter of a book, or a separate HTML file for each section of an article. However, commands (defined in the hevea package) that you can embed in your ".tex" document enable you to choose a different granularity of division.
imagen
The hevea application detects when a ".tex" document uses image files, for example, in \includegraphics commands. Details of these image-using commands are written to a new temporary ".tex" file. The imagen application is then run on this temporary file to convert all the images into a HTML-friendly format, such as GIF or PNG.
esponja
A HTML file generated by hevea contains HTML markup that can sometimes be needlessly verbose. The esponja utility can be run to optimise the HTML markup, thus decreasing the size of the HTML file.
bibhva
LaTeX has a companion application called bibtex for managing bibliography information. Unfortunately, subtle differences between the behaviour of LaTeX and HeVeA mean that bibtex does not work unaided with HeVeA. The bibhva application acts as a wrapper around bibtex to resolve this incompatibility.

All but one of the commands described above are compiled applications. The exception is imagen, which is a UNIX shell script, and its correct working relies on the presence of many other commands (such as gs and pnmtopng) that are commonly available on UNIX systems. Such commands are not available by default on Windows. For this reason, HeVeA can be used with ".tex" documents that may contain graphics on UNIX, but can be used only with image-less ".tex" documents on Windows. Canthology makes this situation even worse: its use of HeVeA relies upon the use of some commands (such as GNU make and a Tcl interpreter) that are widely available on UNIX machines, but are not installed by default on Windows. For this reason, using Canthology with HeVeA is not currently supported on Windows.

10.5  How HeVeA Handles Unrecognised Commands

HeVeA implements a large subset of commonly-used LaTeX commands and a small subset of lower-level TeX commands. When HeVeA encounters a command it does not recognise, it prints a warning diagnostic message to the console and ignores the command’s name, but it copies the command’s arguments, if any, to the output file. For example, assume HeVeA encounters the following in a ".tex" document:


I am \emp{very} happy to see you.

In this case, HeVeA does not recognise the \emp command (it is a typo, and should be \emph), so it prints a warning message on the console and writes the following to the output HTML file:

I am very happy to see you.

(When using HeVeA with Canthology, you will need to run canthology with the "-d 3" command-line option to see the warning messages reported by HeVeA.)

In practice, many unrecognised commands are due to HeVeA implementing only a subset of LaTeX and TeX commands (rather than typos in the input document).

The “print a warning message and carry on” behaviour can be useful, especially if you are new to HeVeA. This is because it enables you to see that HeVeA correctly converts 90% of your document to HTML, and most of the fouled-up 10% is due to just a small number of unrecognised commands that are used frequently. Thus, if you can (somehow) extend HeVeA to handle those troublesome commands, then HeVeA will be able to process your entire document.

10.6  Extending HeVeA

Extending HeVeA with new commands usually revolves around implementing packages. You may recall from Section 9.5 that LaTeX packages are implemented with ".sty" files. However, HeVeA ignores ".sty" files because they are likely to make use of low-level LaTeX or TeX commands that are not supported by HeVeA. Instead, HeVeA packages are implemented in ".hva" files, which may make use of the subset of LaTeX commands implemented by HeVeA plus some additional CSS- and HTML-centric commands provided by HeVeA. Thus, if you want to write a ".hva" file, you are likely to need to have a working knowledge of HTML and CSS (cascading style sheets). The discussion in this section assumes such as working knowledge.

Among HeVeA’s CSS- and HTML-centric commands are the following:


\newstyle{name}{settings}
\@open{BLOCK}{attributes}
\@close{BLOCK}

The \newstyle command generates a new CSS style called name that has the specified settings. For example, the following code:


\newstyle{.example}{margin-left: 4ex; margin-right: 4ex;}

results in the following being added to the style sheet for the generated HTML document:


.example {margin-left: 4ex; margin-right: 4ex;}

The \@open command creates an opening tag, containing the specified attributes, for the specified BLOCK. Conversely, \@close creates a closing tag for the specified BLOCK. For example, the following code:


\@open{DIV}{class="example"}
...
\@close{DIV}

adds the following to the output HTML file:


<DIV class="example">
...
</DIV>

I will now provide a complete worked example to illustrate typical usage of the above HTML-centric commands. I will start by explaining the functionality offered by the framed package. Then, I will discuss how a HeVeA version of this package (that is, a file called framed.hva) can be implemented.

10.6.1  The framed package

The framed package [1] defines three environments called framed, shaded and leftbar. For example, the following code:


\setlength{\FrameSep}{3pt}
\begin{framed}
    \noindent
    This sentence is boring; it just contains some
    example text. This sentence is boring; it just
    contains some example text.  This sentence is
    boring; it just contains some example text. This
    sentence is boring; it just contains some example
    text.
\end{framed}

produces the following output:

This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text.

As you can see, the framed environment draws a frame around the contents of the environment. By default, there is a lot of space between the frame and the text it contains. However, in the above example I used the command \setlength{\FrameSep}{3pt} to reduce that space, so the frame fits more snugly. In addition, I used \indent to prevent the first (and, in this case, only) paragraph within the frame starting with an indented line.

The shaded environment does not draw a frame around the contents of its environment. Instead, it paints the background with the color specified by shadecolor, which the author of a document must previously have defined via the \definecolor command [3]. For example, the following code:


\definecolor{shadecolor}{rgb}{0.9,0.9,1.0}
\begin{shaded}
    \noindent
    This sentence is boring; it just contains some
    example text. This sentence is boring; it just
    contains some example text.  This sentence is
    boring; it just contains some example text. This
    sentence is boring; it just contains some example
    text.
\end{shaded}

produces the following output:

This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text.

The leftbar environment draws a thick black line on the left side of the contents of the environment. For example, the following code:


\begin{leftbar}
    \noindent
    This sentence is boring; it just contains some
    example text. This sentence is boring; it just
    contains some example text.  This sentence is
    boring; it just contains some example text. This
    sentence is boring; it just contains some example
    text.
\end{leftbar}

produces the following output:

This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text. This sentence is boring; it just contains some example text.

In addition to the framed, shaded and leftbar environments, the package defines commands that can be used as building blocks for users to define other environments. However, a discussion of those additional commands is outside the scope of this manual.

10.6.2  Implementing framed.hva

Although the HeVeA distribution provides ".hva" implementations of more than twenty popular packages, it does not provide an implementation of the framed package. To work around this, I wrote the HeVeA implementation shown in Figure 10.1.

Figure 10.1: The framed.hva file
 1  \ProvidesPackage{framed}
 2  \RequirePackage{color}
 3  
 4  \newenvironment{framed}{%
 5      \@open{div}{class="framed"}%
 6  }{%
 7      \@close{div}%
 8  }
 9  \newstyle{.framed}{
10      border: 1px solid black; 
11      padding-left: 8pt;
12      padding-right: 8pt;
13      padding-top: 0pt;
14      padding-bottom: 0pt;
15  }
16  
17  \newenvironment{leftbar}{%
18      \@open{div}{class="leftbar"}
19  }{%
20      \@close{div}
21  }
22  \newstyle{.leftbar}{
23      border-left: 4px solid black; 
24      padding-left: 6pt;
25      padding-right: 6pt;
26      padding-top: 0pt;
27      padding-bottom: 0pt;
28  }
29  
30  \newenvironment{shaded}{%
31      \@open{TABLE}{BORDER="0" CELLPADDING="8" WIDTH="100\%"
32                    BGCOLOR=\@getcolor{shadecolor}}
33      \@open{TR}{}
34      \@open{TD}{}
35  }{
36      \@close{TD}
37      \@close{TR}
38      \@close{TABLE}
39  }

The \newenvironment command takes three parameters. The first is the name of the environment being defined. The second and third parameters specify commands to be executed at the beginning and ending of the environment.

The definition of the framed environment (lines 4–8) uses the \@open and \@close commands to open and close an HTML DIV element whose class attribute has the value "framed". The \newstyle command on lines 9–15 defines the framed CSS class that specifies a one-pixel-wide, solid black border is drawn around the DIV element, with 8 pixels of padding between the border and the left and right margins of the text.

The leftbar environment (lines 17–21) and its accompanying CSS style (lines 22-28) are defined in a similar manner.

The definition of the shaded environment (lines 30–39) is more verbose. It creates a TABLE element, containing a single cell, in which the background color (that is, the BGCOLOR attribute) is set to the value of \@getcolor{shadecolor}. The low-level \@getcolor command converts the specified color (passed as a parameter) from the specification syntax used by the color package into the syntax used in HTML files.

HeVeA does not implement the LaTeX concept of a “length”. Instead, it ignores all uses of the \newlength, \setlength and \addtolength commands. For this reason, the framed.hva file does not (and cannot) implement support for the \FrameSep length. Thus, there is no way for the author of a document to alter the spacing between the box of the framed environment and the text inside it. Instead, this spacing is fixed by the CSS framed class.

The above worked example illustrates several issues that commonly arise when implementing ".hva" versions of packages.

First, a person implementing a ".hva" file may need to have a working knowledge of HTML and CSS.

Second, the implementation of commands in a ".hva" file is often simpler than their corresponding implementation in a ".sty" file. This is because HTML and CSS are much simpler (and less powerful) markup languages than LaTeX. In the case of the framed package, the LaTeX version is complex because it needs to deal with the possibility of an environment spanning a page break. The HTML implementation of the package is simpler because it does not have to worry about such page breaks.

Finally, it is common for a ".hva" file to implement a subset of the functionality provided in a package. For example, framed.hva ignores attempts to set the \FrameSep length, and it does not provide building-block commands for users to define other framed-like environments.

10.7  The hevea Package

A package called hevea is distributed with HeVeA. The HeVeA implementation of the hevea package defines about 30 commands that enable you to fine-tune how a document is converted into HTML. The LaTeX implementation of the package provides empty implementations of those same commands. For example, the LaTeX implementation of the \newstyle command (discussed in Section 10.6) does nothing.

10.8  The latexonly and htmlonly Environments

The hevea package defines environments called latexonly and htmlonly. Any text you place in a latexonly environment is used in the document only if you process the document with a LaTeX-related command, such as latex or pdflatex. Conversely, any text you place in a htmlonly environment is used in the document only if you process the document with hevea. The following example illutrates the use of those commands:


Sentence~1 is in all versions of this manual.
\begin{latexonly}
Sentence~2 is in only the PDF version of this manual.
\end{latexonly}
\begin{htmlonly}
Sentence~3 is in only the HTML version of this manual.
\end{htmlonly}

That example results in the following output in the version of this manual you are currently reading:

Sentence 1 is in all versions of this manual. Sentence 3 is in only the HTML version of this manual.

Although it is good to know that the latexonly and htmlonly environments exist, in practice, you are likely to use them very infrequently, if at all. For example, this manual is over 200 pages long, yet aside from the above example, I have used the latexonly and htmlonly environments just three times. In each case, it was to make a minor adjustment to the visual appearance of the formatted document.2

It is important to note that environments work only in the main part of a document, that is, between \begin{document} and \end{document}. In particular, you cannot use the latexonly and htmlonly environments in the preamble of a document to help you define LaTeX- and HeVeA-specific versions of commands. Instead, if you need to define LaTeX- and HeVeA-specific versions of some commands, then you should write ".sty" and ".hva" versions of a package. You can find a discussion of how to write packages in Section 9.5 and also Section 10.6.

10.9  Specifying the Names of HTML Files

Let’s assume you run hacha to convert a document into a monolithic HTML file, and afterwards you run hacha on that file to split it into multiple HTML files. If your document uses the book class, then hacha creates a separate HTML file for each chapter. Conversely, if your document uses the article class, then hacha creates a separate HTML file for each section.

By default the names of the HTML files will be the name of the root file of your document (without the ".tex" extension) followed a three-digit number and then ".html". For example, if the root file of your document is short-stories.tex, then the HTML files created by hacha will have names like short-stories001.tex, short-stories002.tex, short-stories003.tex, and so on.

You can override the default naming scheme by using the \cutname command, which is defined in the hevea package. This command takes one parameter that specifies the file name that hacha should use when splitting the current part of the document into a separate HTML file. You should put a \cutname command after each \chapter command in a book, or after each \section command in an article. For example:


\chapter{The Adventure Starts}
\cutname{start.html}
...
\chapter{Disaster Strikes}
\cutname{disaster-strikes.html}
...
\chapter{A Happy Ending}
\cutname{happy-ending.html}
...

10.10  Misfeatures of HeVeA

Although HeVeA is very useful, it does have some misfeatures. In this section, I discuss the workarounds that I have developed for some of these.

10.10.1  The hevea-fix package

I have implemented a package called hevea-fix, which I distribute with Canthology. The ".hva" implementation of the package redefines some HeVeA commands so they mimic their LaTeX counterparts more closely. The ".sty" implementation of this package is empty because it does not need to modify LaTeX. What follows is a brief summary of the capabilities provided by hevea-fix.hva.

Unfortunately, HeVeA does not implement \frontmatter, \mainmatter or \backmatter. Because of this, front and back matter such as a preface or glossary appear as numbered (rather than as unnumbered) chapters or sections. The hevea-fix.hva file fixes this problem.

The HeVeA implementation of the hyperref package fails to implement the \phantomsection command; hevea-fix.hva fixes this problem.

The hevea application inserts comments into the generated HTML file. These comments instruct hacha how to divide the monolithic HTML file into a collection of HTML files—by default, a separate file for each chapter. A bug or misfeature in the hacha application causes it to process comments relating to \part commands in a strange and unsatisfactory manner. The hevea-fix.hva file resolves this by redefining the HeVeA command that inserts comments into the generated HTML file. The result is that \part-related comments are now identical to \chapter-related comments, so hacha processes them in a satisfactory manner.

LaTeX uses (only) vertical spacing to separate a floating figure or table from the main body of text. Some authors prefer to have a more visible boundary between a figure or table and the main body of text. For example, the visible boundary might be a box (such as that provided by the framed environment) surrounding the contents of the figure, or perhaps horizontal lines drawn above and below the figure or table. Such authors have to explicitly use LaTeX commands to draw such boundaries. In contrast, HeVeA automatically puts a visible boundary around a figure or table. This boundary takes the form of horizontal lines above and below the figure or table. This is a problem because the presence of the HeVeA-provided boundary often visually clashes with whatever boundary has been provided by the author using LaTeX commands. The hevea-fix.hva file fixes this problem by redefining a low-level HeVeA command so the horizontal lines are not drawn.

LaTeX has commands that can be used to control the co-existence of text and floating figures or tables on a page: \topfraction, \bottomfraction, \textfraction and \floatpagefraction. HeVeA neglects to define those commands, which results in warning messages being printed if those commands are used in a document. The hevea-fix.hva file fixes this problem by defining dummy versions of those commands.

In LaTeX, there are subtle differences in how the quote, quotation and verse environments are typeset. HTML is not flexible enough to reproduce those subtle differences. For this reason, HeVeA maps both the quote and quotation environments into a BLOCKQUOTE HTML element. That is perfectly reasonable. What I find strange is that HeVeA chooses to not support the verse environment. The hevea-fix.hva file rectifies this by also mapping the verse environment into a BLOCKQUOTE HTML element.

10.10.2  Removing the Pseudo Table of Contents

HeVeA implements the \tableofcontents command. Figure 10.2 shows a table of contents generated for a document that contains two parts, each with three chapters.

Figure 10.2: Table of contents produced by hevea

I think you will agree that the table of contents looks nice. Unfortunately, something bizarre happens when you run hacha to split a HTML file created by hevea into a collection of smaller files. The hacha application adds HTML like that shown in Figure 10.3 to the first HTML file, which is typically the title page of your document.

Figure 10.3: Pseudo table of contents produced by hacha

The result is that your document contains a badly-formatted, pseudo table of contents on the title page, plus a real table of contents on another HTML page. By the way, the first entry in the pseudo table of contents is a link to the real table of contents!

Unfortunately, hacha does not have a command-line option to disable the generation of the pseudo table of contents. However, I have written a script called remove_pseudo_toc.tcl that can remove the pseudo table of contents and (optionally) replace it with a link to the real table of contents. This script is located in the etc/html-common/scripts directory of Canthology. If you copy the script into a directory containing HTML files generated by hacha, then you can use it as follows (assuming index.html contains the pseudo table of contents):


tclsh remove_pseudo_toc.tcl -contentsname "Contents" \
                 index.html -link "Table of Contents"

The -contentsname command-line option instructs the script to look in the pseudo table of contents in the specified file for a link called Contents. The script will remove the pseudo table of contents and replace it with a link to the real table of contents. The name of this new link is specified by the -link command-line option.

If you want to remove the pseudo table of contents and not provide a link to the real table of contents, then you run the script as follows:


tclsh remove_pseudo_toc.tcl -contentsname "Contents" \
                 index.html -nolink

10.10.3  Automating the Workarounds

As I will discuss in Chapter 11, when you use Canthology with HeVeA, Canthology adds hevea-fix to the list of packages used by your document, and also arranges for remove_pseudo_toc.tcl to be run after hacha. In these ways, Canthology automates use of the workarounds.


1
The name HeVeA is a pun on LaTeX. Hevea Brasilliensis, also known as the Pará rubber tree or simply the rubber tree, produces a sap-like substance called latex, which is a source of natural rubber.
2
In one case, I used the latexonly environment to introduce some extra vertical spacing at the top of the dedications page in the PDF version of this manual. In the other two cases, I wanted to force a line break in the PDF version of this manual, but not have such a line break in the HTML version.

Previous Up Next