# Chapter 13Architecture of Canthology

## 13.1  Introduction

This chapter is aimed at people who want to modify Canthology, for example, to fix a bug or extend its functionality. To be able to modify Canthology, you will first need to be familiar with its architecture. Explaining that architecture is the focus of this chapter.

Canthology is comprised of three interacting parts: (1) a Java-based application; (2) supporting files; and (3) the defaults.cfg file, which (among other things) configures Canthology to use the subset of supporting files appropriate for the chosen output format (PDF or HTML).

I have already discussed the defaults.cfg file in Chapters 8 and 11, so this chapter focusses on the Java-based application and supporting files.

## 13.2  The Java Application

The anthology application contains approximately 1200 lines of Java code (excluding comments and blank lines). Some readers may consider this to be surprisingly concise, considering the functionality provided by the application. This conciseness is due to a combination of two issues.

First, some of Canthology’s functionality is not implemented in Java code, but rather is provided by support files, as I will discuss in Section 13.3.

Second, and more significantly, the parsing and semantic checking of the configuration file is performed by a separate library (which contains over 8000 lines of code). Thus, Canthology gains the rich functionality of that configuration-file parser “for free”.

The configuration parser library is called Config4J; this is the Java implementation of Config4* (pronounced “config for star”). If you want to modify Canthology, then it is useful to first gain an overview of the API of Config4J. You can find such an overview in the Chapter 3 of the Config4* Getting Started Guide, which is available on the Config4* website.

### 13.2.1  Packages and Source-code Files

The Java source code of the canthology application resides in a package called org.canthology.canthology. The repetition of “canthology” in the package name might appear redundant to some readers. However, it is there to make it easy to create future utility applications that will complement Canthology. For example, if a future version of Canthology provides utility applications called foo and bar, then the package hierarchy might be as follows:

 org.canthology.canthology (source code of canthology) org.canthology.foo (source code of foo) org.canthology.bar (source code of bar)

### 13.2.2  Limited Use of Java Language Features

During my career as a software consultant, I have worked with numerous companies and have noticed that although some companies are quick to upgrade to new versions of development tools, other companies are much slower to do so. It is not unusual for a company to be using a compiler that is five or even ten years old. For this reason, when I develop open-source software, I like to avoid use of relatively new features in a programming language. This approach has the drawback that the source code of my applications may be slightly more verbose than necessary, but it offers the benefit that my applications can be compiled and used by a wide range of companies and individuals, regardless of whether they use new or older compilers.

In practice, I try to limit myself to features available in Java 1.3. One consequence of this is that I avoid using the Java assert statement (it was introduced in Java 1.4). Instead, I have written an assertion() method that serves a similar purpose.

Another consequence of restricting myself to language features available in Java 1.3 is that I avoid use of generics (they were introduced in Java 5). This results in verbosity when retrieving items from a collection, due to the need for typecasts.

### 13.2.3  Algorithms and Source-code Files

The source code of the canthology application consists of the following five files:


Util.java
AssertionError.java
Canthology.java
StartingPointConfig.java    (generated from StartingPointConfig.cfg)
DocumentInfo.java



I will now discuss each of those briefly.

#### Util.java and AssertionError.java

The Util class defines some static utility methods. One of these is Util.assertion(), which, as I already discussed, I use instead of the Java assert statement. This method throws an AssertionError if the assertion check fails.

#### Canthology.java

The Canthology class defines the main() method for the canthology application. The code in this class is straightforward. First, it parses command-line arguments. If a -create command-line option is encountered, then it generates a starting-point configuration file and terminates. Otherwise, it creates an empty Configuration object (this type is defined by the Config4J library), populates it with name=value pairs obtained from "-set name value" command-line options, and then parses the configuration file. Finally, for each configuration scope that defines a document, it creates a DocumentInfo object (whose constructor validates the configuration information within a configuration scope) and calls DocumentInfo.process() to perform the “real work” of Canthology.

#### StartingPointConfig.cfg

A simple, but tedious, way to create a starting-point configuration file is to open the file for writing, use lots of print statements to generate the contents of the file, and then close the file. The problem with this technique is that writing the print statements is tedious and error-prone. Config4J alleviates this tedious work as follows. The config2j utility (documented in Chapter 6 of the Config4* Getting Started Guide) can read a text file and generate a Java class that provides the contents of the file as an embedded string (accessible through a getString() method). The build system uses config2j to convert StartingPointConfig.cfg into StartingPointConfig.java, and that generated file is compiled into the application. Thus, the application code to create a starting-point configuration file becomes trivial:


out = new FileWriter(cfgFileName);
out.write(StartingPointConfig.getString());
out.close();



If you want canthology to generate a starting-point configuration file with different contents, then you should modify StartingPointConfig.cfg, and run ant to rebuild the application.

#### DocumentInfo.java

The DocumentInfo class performs the “real work” of Canthology. Its public API consists of a constructor and the process() method.

The constructor creates a schema that specifies what contents are allowed in a document scope, and uses the SchemaValidator class (provided by the Config4J library) to validate the document scope against this schema. Finally, the constructor copies configuration variables into instance variables for more convenient access when the process() method is invoked.

The process() method uses configuration information to perform the following steps:

• Create a root ".tex" file in the working directory. Search this file for commands such as \input that specify the names of other required files, and copy those files into the working directory. Those copied files are recursively searched for commands such as \input.
• Files listed in the copy.extra_files_to_copy configuration variable are copied into the working directory. These copied files are recursively searched for commands such as \input to find other files that need to be copied.
• When the root ".tex" file is being created, and when other files are being copied, the substitutions.search_replace_pairs configuration variable is used to perform a global search-and-replace on the files’ contents.
• Finally, each command listed in the build_commands configuration variable is executed.

The above steps are straightforward. The only complication is that some steps in the algorithm are inherently recursive.

## 13.3Support Files

Support files used by Canthology are stored in the following subdirectories of a Canthology installation:


etc/latex
etc/html-common
etc/html-many-pages
etc/html-one-page



I now discuss each of these briefly.

### 13.3.1  The etc/latex Subdirectory

The etc/latex directory contains some ".sty" files (that is, packages) plus some ".tex" files. The ".sty" files are:


canthology.sty
hevea.sty
hevea-fix.sty



The canthology package defines commands such as \chapterAuthorInfo for specifying information about the author of a contribution in an anthology, and \thisPageBackgroundColor for specifying a background colour to be used on, say, the title page of a document. Appendix A provides a complete list of the commands defined in the canthology package.

The hevea package defines dummy versions of HeVeA commands, so a ".tex" document that makes use of those commands can be processed by either LaTeX or HeVeA. The hevea package is distributed as part of the HeVeA distribution. However, a copy of hevea.sty is also distributed with Canthology. Doing this enables Canthology to provide an out-of-the-box integration with HeVeA without having to worry about whether HeVeA is installed.

As I discussed in Section 10.10.1, I wrote hevea-fix.hva to overcome some misfeatures of HeVeA. The hevea-fix.sty file is a dummy version of the package for compatibility with LaTeX.

The ".tex" files in the directory are as follows:


titlepage-template-1.tex
titlepage-template-2.tex
titlepage-template-3.tex
titlepage-hevea-template.tex
dedication-template.tex
example-praise-template.tex



The files with names of the form "titlepage-template-*.tex" provide various layouts for the title page of a book. These are called template files because they contain placeholder text that can be replaced with text specified in the substitutions.search_replace_pairs configuration variable.

You can see an example of a template file in Figure 13.1, which shows the contents of the titlepage-template-1.tex file. For ease of reference, placeholder text is shown in a bold font.

 Figure 13.1: The titlepage-template-1.tex file
 1  \ifthenelse{\boolean{hevea}}{
2      %--------
3      % Hevea version
4      %--------
5      \input{titlepage-hevea-template.tex}
6  }{
7      %--------
8      % LaTeX version
9      %--------
10      \thispagestyle{empty}
11      \begin{center}
12          \vspace*{0.2\textheight}
13
14          \Huge \textbf{(TITLE-PLACEHOLDER)}
15
16          \ifthenelse{\equal{}{(SUBTITLE-PLACEHOLDER)}}{}{
17              \vspace{1cm}
18              \Large \textbf{(SUBTITLE-PLACEHOLDER)}
19          }
20
21          \vspace{1cm}
22          \Large \textbf{(AUTHOR-PLACEHOLDER)}
23      \end{center}
24
25      \vfill
26
27      \noindent
28      {\large (DESCRIPTION-PLACEHOLDER)}
29
30      \ifthenelse{\equal{}{(PUBLISHER-PLACEHOLDER)}}{}{
31          \noindent\rule{\textwidth}{0.2mm}\\
32          \noindent{\rule{0mm}{1.1em}(PUBLISHER-PLACEHOLDER)}
33      }
34      \newpage
35  }


The title page of a document is often typeset with commands that specify rules and vertical spaces. Such commands work fine for a page of fixed dimensions, but often do not make sense for a HTML document. For this reason, it is useful to typeset the title page one way if the document is being processed with LaTeX, but typeset the title page another way if the document is being processed with HeVeA. This is the reason for the if-then-else statement on line 1 of Figure 13.1. If HeVeA is being used, then the simpler, HTML-friendly formatting commands in titlepage-hevea-template.tex are used.

The example-copyright-template.tex file provides example text for several copyright licenses. The intention is that the editor of an anthology will make a local copy of this file and edit its contents to suit his or her needs.

One of the example paragraphs in example-copyright-template.tex is for version 1.3 of the GNU Free Documentation License (FDL). If the editor of an anthology wishes to use this license, then it is a legal requirement to provide the full text of the FDL as, say, an appendix in the document. The copyright-GNU-FDL-1.3.tex file provides the full text of that license and can be used for that purpose. For a similar reason, the copyright-GPL-2.0.tex and copyright-GPL-3.0.tex files provide the full text of versions 2 and 3 of the Gnu General Public License (GPL), and copyright-LPPL-1-3c.tex provides the full text of the LaTeX Project Public License.

The dedication-template.tex file can be used to typeset a dedications page near the start of a book.

The example-praise-template.tex file provides examples of how to use the \praise command to typeset a page of praise for a book.

### 13.3.2  The etc/html-* Subdirectories

Support files used with HeVeA are spread across the following subdirectories:


etc/html-common
etc/html-many-pages
etc/html-one-page



The html-common directory contains the ".hva" implementations of several packages, including: canthology, framed, hevea-fix and verse. This directory also contains: canthology.css, which is used for customising the look-and-feel of generated web pages; tidy.conf, which is used to configure the tidy program when converting generated HTML files into XHTML format; and some Tcl scripts that were discussed in Section 11.3.

The html-one-page directory contains a Makefile that runs hevea to create a single-page HTML document. This directory also contains supporting html-header.txt and html-footer.txt files suitable for use in a single-page HTML document.

The html-many-pages directory contains a Makefile that runs hevea to create a single-page HTML document, and then runs hacha to split it into multiple HTML pages. This directory also contains supporting html-header.txt and html-footer.txt files suitable for use in a multi-page HTML document.

## 13.4  How Canthology Searches for Support Files

The copy.search_path configuration variable specifies which directories will be searched to find supporting files.

Different scopes in the etc/defaults.cfg file define different values for the copy.search_path variable. For example, scopes used to generate PDF files set copy.search_path as follows:


copy.search_path = [
".",
getenv("CANTHOLOGY_HOME") "/etc/latex",
];



In contrast, scopes used to generate a single-page HTML document set copy.search_path as follows:


copy.search_path = [
".",
getenv("CANTHOLOGY_HOME") + "/etc/html-one-page",
getenv("CANTHOLOGY_HOME") + "/etc/html-common",
getenv("CANTHOLOGY_HOME") + "/etc/latex",
];



Scopes that generate a multi-page HTML document use the following setting for copy.search_path:


copy.search_path = [
".",
getenv("CANTHOLOGY_HOME") + "/etc/html-many-pages",
getenv("CANTHOLOGY_HOME") + "/etc/html-common",
getenv("CANTHOLOGY_HOME") + "/etc/latex",
];