Chapter 4LaTeX History and Variants

4.1  Introduction

Although Canthology provides a simplification wrapper around LaTeX, Canthology does not hide LaTeX completely. Because of this, you will need to be familiar with some LaTeX concepts and terminology. This chapter explains some LaTeX issues that will make it easier to understand how Canthology works.

4.2  A Short History of TeX and LaTeX

Within the printing industry, type means a printed character, and to typeset means to combine individual types to form words, lines and (eventually) a complete page.

In 1977, a mathematician and computer scientist called Donald Knuth was frustrated at the poor-quality typesetting of a book he had written. He decided to design his own typesetting system so his future books could look better. He spent more than a decade designing and implementing a computer program called TeX (usually pronounced “tech” by English speakers) for typesetting documents.

TeX was developed at a time when there was no standardisation of how to control different types of printers. Knuth worked around this problem with a two-step approach.

• A person writes a TeX-based document in a file called, for example, my-document.tex, and processes this with TeX by executing the command:

tex my-document.tex


Doing that produces a file called my-document.dvi. The ".dvi" extension stands for “device independent”.
• Another program is executed to convert my-document.dvi into the format required to drive a particular type of printer.

This two-step approach has enabled TeX to survive technological changes. For example, when computers with high-resolution displays were introduced, on-screen previewers for DVI files were developed. When the PostScript printer-control language was invented, DVI-to-PostScript converters were developed. And when PDF was invented, DVI-to-PDF converters were developed.

Although TeX is powerful and can produce beautiful-looking documents, many people find it difficult to use because the TeX markup commands are low level. Over the years, this has resulted in attempts by different people to implement simplification wrappers around TeX. The most popular of these is called LaTeX, which was initially developed during the 1980s. LaTeX provides high-level markup commands—such as \chapter and \section—that are easier to use than the more primitive markup commands provided by TeX.

Processing a LaTeX-based document is similar to processing a TeX-based document, except that you use the latex command instead of tex. For example, the following commands convert my-document.tex into my-document.dvi and then into my-document.ps (that is, a PostScript file):


latex my-document.tex
dvips my-document.dvi



4.3Structure of a LaTeX Document

Figure 4.1 shows the high-level structure of a book written in LaTeX.

 Figure 4.1: Structure of a LaTeX document
\documentclass[12pt,oneside]{book}

\usepackage{color}
\usepackage[english]{babel}
... % Other preamble commands go here

\begin{document}
\frontmatter
\input{titlepage.tex}
\tableofcontents
\input{preface.tex}

\mainmatter
\input{chapter-1.tex}
\input{chapter-2.tex}
\input{chapter-3.tex}
\input{chapter-4.tex}
\appendix
\input{appendix-a.tex}
\input{appendix-b.tex}

\backmatter
\input{glossary.tex}
\end{document}


The file starts with a \documentclass command that specifies the document’s class (that is, type) is a book. This causes LaTeX to load a file called book.cls and execute its contents. Doing that causes commands such as \part, \chapter and \section to be defined. The book.cls file also defines typographical rules for typesetting a book. By default, those rules specify, among other things, that the document is typeset in a 10 pt font for two-sided printing on US Letter-size paper. The optional arguments (indicated between [ and ]) override some of those defaults by specifying use of a 12 pt font and one-sided printing.

The standard distribution of LaTeX contains several class files, including book.cls, report.cls, article.cls and letter.cls. This makes it possible to use LaTeX to create several types of document easily.

The term package refers to a file that defines add-on commands for LaTeX. Package files have a ".sty" extension (because package files were originally known as style files). The \usepackage command instructs LaTeX to load the ".sty" file specified as a parameter and execute its contents. For example, \usepackage{color} instructs LaTeX to load the file color.sty. Options, if any, to a package are specified between [ and ].

Preamble refers to the part of a LaTeX file between the \documentclass and \begin{document} commands. As shown in Figure 4.1, the preamble can contain \usepackage commands. The preamble can contain other types of commands too. Some people define a few additional commands in the preamble; but if a lot of new commands are to be defined, then it is best to put them in a package.

The document environment (everything between \begin{document} and \end{document}) contains the actual text of the document. If the document is just a few pages long, then its text might be written directly between \begin{document} and \end{document}. But, as shown in Figure 4.1, multi-chapter documents are often written using a separate file for each chapter, in which case a series of \input commands is used to read and process those separate files.

If the document is a book, then its contents can be grouped into front matter, main matter and back matter. Only chapters and sections in the main matter are numbered. Unnumbered chapters sometimes found in the front matter include Acknowledgements, Preface or Foreword. Those found in the back matter might include Glossary, Index and Bibliography.

Any main-matter chapter or section appearing after \appendix has its title typeset as an appendix rather than as a chapter or section.

4.4The memoir Class

LaTeX offers the benefit of being significantly easier to use than TeX. However, it also suffers from a drawback: the predefined document classes (book, article, and so on) adopt a “take it or leave it” attitude to typesetting. For example, users do not have an easy way to adjust the typesetting of chapter or section headings.

Partially to overcome this drawback, Peter Wilson implemented the memoir class. This class provides users with commands to adjust many aspects of document typesetting. Another benefit of the memoir class is that it incorporates the functionality of dozens of popular packages. This means a memoir-based document tends to have fewer \usepackage commands cluttering up its preamble than a similar book-based document. More importantly, it means that documentation for the functionality of many packages can be centralised in the manual for the memoir class [14] rather than be scattered over dozens of short documents (a separate document for each package).

The functionality of the memoir class is a superset of that of the book class. Thus, if you start writing a document using the book class and eventually discover it is too inflexible to suit your needs, then you should be able to switch to using the memoir class without having to make anything more than (at most) a few trivial changes in your existing document.

4.5  Support for Colour and Graphics

The initial development of TeX predated widespread support for colour in computers and printers. It also predated most graphic-file formats, such as EPS, TIFF, GIF, JPEG, PDF and PNG. Despite this, TeX and LaTeX provide good support for the use of colour and graphics in documents. This is because Donald Knuth had the foresight to provide a \special command in TeX. This command writes its argument directly into the DVI file being generated, and it is then up to a DVI-to-whatever converter to interpret that argument. This provides an open-ended extension mechanism that permits TeX (and LaTeX) to support, among other things, the use of colour and graphics in documents. The \special command is also used to implement hypertext links when a document is converted into PDF format.

Over the years, individuals have developed DVI-to-whatever converters independently of each other. The independent nature of their work has resulted in some DVI converters expecting the argument to a \special command to formatted one way, and other DVI converters expecting the argument to be formatted a different way. Obviously, this has the potential to result in widespread incompatibilities. However, a combination of two things save the day.

First, DVI converters ignore any \special commands that they do not understand. This allows the conversion of documents containing unsupported \special commands to degrade gracefully.

Second, packages that provide a simplification wrapper around the low-level \special command also protect document authors from the \special command’s potential to introduce incompatibilities. This can be illustrated with the graphicx package [3], which provides commands to scale and rotate text, and to include graphic files. An option to this package specifies which DVI converter you plan to use. For example:


\usepackage[dvips]{graphicx}



One of the commands provided by this package is \includegraphics, which you can use to include (and, optionally, scale) a graphics file:


\includegraphics[scale=0.7]{my-diagram}



That command executes the \special command appropriate for the DVI converter specified in the \usepackage command. If the document’s author switches from dvips to using another DVI converter, then changing the option to the \usepackage command will be sufficient to change the \special commands executed by \includegraphics.

The main irritation I have when using graphics in a LaTeX document is that each DVI converter supports a subset of graphic-file formats. For example, I used to use on-screen DVI previewer that could process only ".eps" files, while the DVI converter I used to produce a PDF file could process ".jpg", ".png" and ".pdf" files. The lack of overlap in graphic file formats supported by the two converters meant I had to provide the same graphic in several file formats, for example, my-diagram.eps and my-diagram.jpg. If you encounter this issue, then there are several ways to deal with it.

First, some drawing editor applications can save a graphic in a variety of file formats.

Second, ImageMagick is a collection of command-line utilities for manipulating graphic files. Its convert utility can read a graphic file in one format and convert it to another file format. For example:


convert my-diagram.eps my-diagram.jpg



Finally, you could decide to forego use of a DVI previewer and instead use PDF documents for both on-screen previewing and printing.

4.6  Variations of TeX and LaTeX

Over the years, alternative implementations of TeX (and LaTeX) have been developed to adapt to changes in technology.

4.6.1  pdfTeX and pdfLaTeX

pdfTeX (and pdfLaTeX) can generate a PDF file directly from a ".tex" file. For example:


pdflatex my-document.tex



Many people find this to be more convenient than the two-step process of generating a ".dvi" file, and then converting this into a ".pdf" file. The out-of-the-box configuration for Canthology uses pdflatex.

4.6.2  XeTeX and XeLaTeX

The nice thing about standards is that there are so many of them to choose from. Furthermore, if you do not like any of them, you can just wait for next year’s model.
— Andrew S. Tanenbaum

This has certainly been true for character sets. In brief, a (coded) character set is a convention in which numbers are used to denote characters (that is, letters, digits, punctuation, and so on). For decades, there were almost as many character sets as there were countries in the world, and this made it difficult to write, say, a Greek document on a French computer, or vice versa. The lack of an international, standardised character set posed difficulties for writing TeX- or LaTeX-based documents that contained, say, accented characters such as á and ö. A two-part approach was used:

1. Commands such as \’{a} and \"{o} were used in a ".tex" file to represent accented characters like á and ö. This made it difficult for a person to write non-English text, and also made it difficult for a spell checking program to verify the spelling of words in a ".tex" file.
2. The output file (in, say, DVI, PostScript or PDF format) would “fake” an accented letter by drawing an unaccented letter, and then drawing an accent character over it. Such “fake” accented characters made it difficult, if not impossible, to search through a document for text containing an accented character.

Unicode is a standardised, universal (coded) character set that has been gaining popularity since its introduction in the early 1990s. XeTeX (and XeLaTeX) is an redesigned implementation of TeX (and LaTeX) that provides built-in support for Unicode. This support eliminates the problems associated with TeX’s traditional two-part approach for dealing with accented characters. The ".dvi" file format does not support the use of Unicode characters, so the designers of XeTeX have defined their own output file format, which has a ".xdv" extension. An XDV-to-PDF converter is available but on-screen previewers for XDV files are not yet widely available.

4.6.3  Generating HTML from LaTeX

The author of a document might want to provide the document as a PDF file for printing, and also as a collection of HTML pages that can be browsed casually on a website. To satisfy this need, several people have (independently) developed LaTeX-to-HTML translators. Unfortunately, all LaTeX-to-HTML translators have limitations. This is unavoidable because HTML is not as powerful as LaTeX, particularly when it comes to typesetting mathematical formulas.

For example, consider the following formula:

f =
π
 x2y + 5
z

That formula is simple enough that a LaTeX-to-HTML translator should be able to translate it into HTML that displays an accurate representation. However, if you write increasingly complex formulas in LaTeX, then there will come a point where a formula cannot be translated into HTML that displays an accurate representation—simply because of limitations in HTML. At this point, rather than generate HTML that displays an inaccurate representation of the formula, a LaTeX-to-HTML converter might use LaTeX to typeset the formula, and then use a utility to render the output as a graphic image. This graphic image is then displayed in a HTML page. The resulting HTML page may look nice. But, if a user “zooms in” to display the web page’s text in a larger font size, then the graphic image of the mathematical formula will also be magnified, which will result in it appearing pixelated, as shown in Figure 4.2.

Another common limitation is that a LaTeX-to-HTML converter will know how to process built-in LaTeX commands plus the commands defined in a small number of well-known packages, but will not be able to process LaTeX documents that make use of arbitrary packages. Some LaTeX-to-HTML converters permit users to write code to “teach” the converter about commands defined in other packages. Unfortunately, there is no standardisation across LaTeX-to-HTML converters of how to write such teaching code.