Chapter 8How Canthology Operates

8.1Introduction

Chapter 7 used a demonstration-driven approach to illustrate (a subset of) what Canthology can do. This chapter explains how Canthology does those things.

8.2Configuration File and Scopes

Consider the following command:


canthology -f my-file.cfg -scope doc-1 -scope doc-2



The -f option instructs Canthology to use my-file.cfg as its configuration file. If you omit this option, then Canthology defaults to using Canthology.cfg as its configuration file.

The -scope option instructs Canthology to generate a document from the configuration variables in the specified scope. You can specify the -scope option multiple times; this causes Canthology to generate documents for each of the specified scopes. If you do not specify any scopes, then Canthology generates documents from all the top-level scopes that contain a variable called root_file.base_name.

8.3  Configuration Variables

Figure 8.1 shows the starting-point configuration file you can obtain by running the command:


canthology -create Canthology.cfg


 Figure 8.1: Starting-point configuration file
@include getenv("CANTHOLOGY_HOME") + "/etc/defaults.cfg";
anthology1 {
@copyFrom "book:a4";
root_file {
base_name = "my-anthology" + macro.paperSizeSuffix;
preamble = [
] + preamble;
front_matter = [
"\input{titlepage-template-1.tex}",
macro.tableofcontents,
];
main_matter = [
];
back_matter = [
];
}
substitutions.search_replace_pairs = [
# search string            replace string
#---------------------------------------------
] + substitutions.search_replace_pairs;
}


The starting-point configuration file is short because the @copyFrom statement copies a lot of configuration variables from a scope defined in the @include-d file. Figure 8.2 shows a configuration file that explicitly sets all configuration variables used by Canthology, although not all the values shown for those variables are the actual default values. Instead, the variables’ values have been chosen to facilitate the discussion that follows.

 Figure 8.2: Configuration variables
anthology1 {
working_dir = "output-pdf-a4";
root_file {
base_name = "my-anthology-a4";
macro {
tableofcontents = "\clearforchapter %n"
+ "\tableofcontents*";
appendix = "\clearforchapter %n"
+ "\appendix %n"
+ "\phantomsection %n"
+ "\appendixpage";
}
documentclass {
name = "book";
options = ["12pt", "twoside"];
}
package {
names = ["appendix", "canthology", "geometry"];
geometry.options = ["paper=a4paper"];
}
preamble = [
"\setcounter{tocdepth}{4}",
];
front_matter = [
"\input{titlepage.tex}",
macro.tableofcontents,
];
main_matter = [
"\input{chapter-1.tex}",
"\input{chapter-2.tex}",
macro.appendix,
"\input{appendix-a.tex}",
"\input{appendix-b.tex}",
];
back_matter = [
"\input{glossary.tex}"
];
}

copy {
commands = ["\input", "\usepackage", "\includegraphics"];
file_extensions = [".tex", ".sty", ".png", ".jpg"];
search_path = [
".",
getenv("CANTHOLOGY_HOME") + "/etc/latex"
];
extra_files_to_copy = [];
look_for_copy_commands {
in_matching_files = ["*.tex", "*.sty"];
not_in_matching_files = [];
}
}
substitutions {
in_matching_files = ["*.tex"];
not_in_matching_files = [];
search_replace_pairs = [
"(AUTHOR-PLACEHOLDER)", "Jane Doe",
"(TITLE-PLACEHOLDER)",  "Modern Fairy Tales",
];
}
build_commands = [
"pdflatex -interaction=errorstopmode (ROOT_FILE_BASE_NAME).tex",
"pdflatex -interaction=errorstopmode (ROOT_FILE_BASE_NAME).tex"
];
}


At a high level, Canthology operates as follows:

1. Canthology creates a “working directory” in which it will create some LaTeX files. In particular, Canthology uses information in its configuration file to build a root ".tex" file in the working directory. This root file will contain \input commands to add the contributions of the anthology.
2. Canthology copies required support files into the working directory. These support files include all the contributions, graphic files (if any) and some package (".sty") files.
3. Canthology runs some LaTeX-related commands (such as latex or pdflatex) on the root ".tex" file to produce the ready-to-print anthology.

Those steps are discussed in the following subsections.

8.3.1  Creating the Root ".tex" File

The working_dir configuration variable specifies the name of the working directory that Canthology should create. By convention, the value of this variable starts with "output-" and the remainder of the value indicates how the document is formatted. For example, the value might be "output-pdf-a4" if you are creating an A4-formatted PDF version of the anthology. This convention makes it easy to create multiple versions of the documents, each one formatted for a different paper size.

The root_file scope contains variables that are used to create a root ".tex" file inside the working directory. The remaining discussion in this subsection is for variables within the root_file scope.

The base_name configuration variable specifies the “base” file name (that is, the file name without any extension) for the root ".tex" file. For example, if base_name has the value "my-anthology-a4", then the root file name is my-anthology-a4.tex.

The configuration variables within the documentclass sub-scope are used to create the \documentclass command at the start of the root ".tex" file. For example, consider the following documentclass scope:


documentclass {
name = "book";
options = ["12pt", "twoside"];
}



Those settings result in the following being generated:


\documentclass[12pt, twoside]{book}



The configuration variables within the package sub-scope are used to generate \usepackage commands. For example, consider the following package scope:


package {
names = ["appendix", "canthology", "geometry"];
geometry.options = ["paper=a4paper"];
}



The names variable specifies that three packages are being used: appendix [15], canthology and geometry [12]. For each of these specified packages, you can optionally specify a list of package options. The above example shows a list of options being specified for the geometry package, but no options for the appendix or canthology packages. Those configuration settings result in the following being generated:


\usepackage{appendix}
\usepackage{canthology}
\usepackage[paper=a4paper]{geometry}



The value of the preamble variable is written to the root ".tex" file immediately after the \usepackage commands.

Canthology ignores everything in the macro sub-scope. The intention is that you can use this sub-scope to define variables whose values are a sequence of LaTeX commands. Then, you can use those variables (as a form of shorthand) in preamble or (more commonly) the front_matter, main_matter or back_matter variables. Figure 8.2 uses a bold font to illustrate this.

Having written the value of the preamble variable to the ".tex" file, Canthology then writes "\begin{document}" and "\frontmatter" to the file and follows it with all the strings contained in the front_matter configuration variable. For example, the configuration shown in Figure 8.2 would result in the following being generated:


\begin{document}
\frontmatter
\input{titlepage.tex}
\clearforchapter
\tableofcontents



Then "\mainmatter" and all the strings in the main_matter configuration variable are written to the ".tex" file. After that, "\backmatter" and all the strings in the back_matter configuration variable are written to the ".tex" file.

Canthology finishes the ".tex" file by writing "\end{document}" to it. Figure 8.3 shows the complete ".tex" file that is generated.

 Figure 8.3: The generated my-anthology-a4.tex file
\documentclass[12pt,twoside]{memoir}

\usepackage{canthology}
\usepackage[paper=a4paper]{geometry}
\setcounter{tocdepth}{4}

\begin{document}

\frontmatter
\input{titlepage.tex}
\clearforchapter
\tableofcontents

\mainmatter
\input{chapter-1.tex}
\input{chapter-2.tex}
\clearforchapter
\appendix
\phantomsection
\appendixpage
\input{appendix-a.tex}
\input{appendix-b.tex}

\backmatter
\input{glossary.tex}
\end{document}


8.3.2Copying Support Files

It is not enough for Canthology to create just the root ".tex" file in the working directory. Canthology must copy some other files into the working directory too. Here are some examples of other files that may need to be copied:

• Any ".tex" files that are \input-ed by the root file. (And if those \input-ed files themselves contain \input statements, then Canthology must recursively follow the chain of \input commands.)
• Any graphic files, such as diagrams or digital photographs, that are used in an \includegraphics command inside a ".tex" file.
• BibTeX-based bibliographies, if any, that are used by the document.

LaTeX is flexible enough that it is impossible for Canthology to accurately predict the entire set of files that must be copied into the working directory. For this reason, Canthology does not have a hard-coded set of rules for deciding which files need to be copied. Instead, the copying of files is driven by configuration variables in the copy scope, as shown in Figure 8.2.

The copy.commands variable specifies a list of LaTeX commands whose first parameter between braces (that is, between { and }) specifies a file to be copied. For example, consider the following setting of this variable:


commands= ["\input", "\usepackage", "\includegraphics"];



If Canthology finds the statement \input{chapter-1.tex} in a file, then it will copy the chapter-1.tex file to the working directory (and recursively search the copied file for other files to be copied). Now let’s assume Canthology finds the following statement:


\includegraphics[scale=0.7]{my-photograph}



Canthology ignores optional parameters (enclosed between [ and ]) and instead looks at the first parameter enclosed in braces: my-photograph. This parameter to the \includegraphics command is a file name that may or may not contain a file-name extension (such as ".jpg" or ".png"). The copy.file_extensions configuration variable enables Canthology to cater for both possibilities:


file_extensions = [".tex", ".pdf", ".png", ".jpg"];



When Canthology encounters the name of a file (such as chapter-1.tex or my-photograph) to be copied, Canthology first tries to copy the file whose name is specified. If that fails, then Canthology suffixes the file name with each of the extensions specified in copy_file_extensions and tries to copy the resulting file.

The approach discussed above works if the file to be copied is specified as the first non-optional parameter to a command. But what if the file is specified in, say, the second or third parameter to a command? As a hypothetical example:


\exampleCommand{11}{42}{another-photograph}



Or perhaps the file to be copied is a non-LaTeX file required to control the build process. An example of this might be a Makefile or an Ant build.xml file. You should list such files in the copy.extra_files_to_copy configuration variable. For example:


extra_files_to_copy= ["another-photograph", "Makefile"];



However, it is unlikely you will need to do that frequently. This is because the copy.commands variable will be sufficient most of the time.

When Canthology is looking for a file that it must copy, it looks for that file in the list of directories specified in the search_path configuration variable:


search_path = [
".",
getenv("CANTHOLOGY_HOME") + "/etc/latex"
];



The first entry in the list instructs Canthology to look in the current directory. The second directory instructs Canthology to look in the etc/latex subdirectory of the Canthology installation. Among other things, that directory contains the template title pages that I discussed in Section 7.4.

If Canthology is unable to copy a file because the file is not in any of the directories listed in copy.search_path, then Canthology does not consider this to be an error. Instead, Canthology assumes that the “missing” file is bundled with a LaTeX distribution, so LaTeX will be able to find it. This is commonly the case with package (".sty") files.

If you are editing an anthology that contains, say, 50 contributions, then you might find it awkward to store all the contributions in a single directory. You might prefer to spread the contributions over several directories, so that each directory contains a subset of the ".tex" files. (You might have one directory for poems, another for short stories, and so on.) You can do this by listing each of those directories in the search_path configuration variable. For example:


search_path = [
"poems", "short-stories, "plays",
getenv("CANTHOLOGY_HOME") + "/etc/latex"
];



When Canthology is copying a file, it needs to decide whether it should search inside the file for nested copy commands. Canthology uses the configuration variables in the look_for_copy_commands scope to make this decision:


look_for_copy_commands {
in_matching_files = ["*.tex", "*.sty"];
not_in_matching_files = [];
}



Canthology will search inside a copied file for nested copy commands if: (1) the file name matches at least one pattern in in_matching_files, and (2) does not match any patterns in not_in_matching_files. In a pattern, * acts as a wildcard that can match zero or more characters. For example, "*.tex" matches file names that end in ".tex". These configuration variables provide a simple, yet effective, way to instruct Canthology to look for nested copy commands in ".tex" and ".sty" files but not in, say, ".jpg" or ".png" files.

8.3.3Performing Text Substitutions

Variables in the nested substitutions scope control how Canthology performs search-and-replace on files. For example:


substitutions {
in_matching_files = ["*.tex"];
not_in_matching_files = [];
search_replace_pairs = [
"(AUTHOR-PLACEHOLDER)", "Jane Doe",
"(TITLE-PLACEHOLDER)",  "Modern Fairy Tales",
];
}



Canthology performs substitutions on a copied file if: (1) the file name matches at least one pattern in in_matching_files, and (2) does not match any patterns in not_in_matching_files.

The search_replace_pairs variable is used to specify pairs of search and replace strings. Canthology automatically extends search_replace_pairs so that occurrences of "(ROOT_FILE_BASE_NAME)" are replaced with the value of the root_file.base_name configuration variable.

8.3.4  Running LaTeX-related commands

After Canthology has generated the root ".tex" file and copied supporting files to the working directory, the only task remaining for Canthology is to run one or more LaTeX-related commands to turn the files into a nicely formatted document in, for example, PDF format. To do this, Canthology executes each command specified by the build_commands configuration variable. For example:


build_commands = [
"pdflatex -interaction=errorstopmode "
+ "(ROOT_FILE_BASE_NAME).tex",
"pdflatex -interaction=errorstopmode "
+ "(ROOT_FILE_BASE_NAME).tex"
];



Because of the way LaTeX works, some commands (such as latex or pdflatex) have to be run twice to correctly resolve cross references and to produce a table of contents. This is why the above example runs pdflatex twice.

Canthology uses substitutions.search_replace_pairs to perform substitutions on each command that is about to be executed. By doing this, the "(ROOT_FILE_BASE_NAME)" text within a command will be replaced with the value of the root_file.base_name configuration variable.

If latex or pdflatex encounters an error in an input file, then, by default, it goes into an interactive mode to ask the user what it should do. The "-interaction=errorstopmode" option instructs latex and pdflatex to not go into an interactive mode if an error occurs, and instead just print an error message and exit.

8.4The etc/defaults.cfg File

The first line in a starting-point configuration file is:


@include getenv("CANTHOLOGY_HOME")+ "/etc/defaults.cfg";



An outline of the included etc/defaults.cfg file is shown in Figure 8.4. The file contains many scopes; these enable it to provide default values suitable for many combinations of document classes (book, report, article and memoir) and paper sizes.

 Figure 8.4: Outline of the etc/defaults.cfg file
default.common { ... } # details omitted for brevity
default.html {
@copyFrom "default.common";
... # details omitted for brevity
}

default.a4-geometry-options { root_file.package.geometry.options = [...]; }
default.a5-geometry-options { ... }
default.a5-trimmed-geometry-options { ... }
default.letter-geometry-options { ... }

book:a4 {
@copyFrom "default.common";
@copyFrom "default.a4-geometry-options";
working_dir = "output-pdf-a4";
root_file {
documentclass.name = "book";
macro { ... } # details omitted for brevity
package.names = [...];
}
}
book:a5 {
@copyFrom "book:a4";
@copyFrom "default.a5-geometry-options";
working_dir = "output-pdf-a5";
macro.paperSizeSuffix = "-a5";
root_file.documentclass.options = ["10pt", "twoside"];
}
book:a5-trimmed {
@copyFrom "book:a4";
@copyFrom "default.a5-trimmed-geometry-options";
working_dir = "output-pdf-a5-trimmed";
macro.paperSizeSuffix = "-a5-trimmed";
root_file.documentclass.options = ["10pt", "twoside"];
}
book:letter {
@copyFrom "book:a4";
@copyFrom "default.letter-geometry-options";
}

book:html-one-page {
@copyFrom "default.html";
working_dir = "output-html";
... # details omitted for brevity
}
book:html-many-pages {
@copyFrom "default.html";
... # details omitted for brevity
}

report:a4 { ... }
report:a5 { ... }
report:a5-trimmed { ... }
report:letter { ... }
report:html-one-page { ... }
report:html-many-pages { ... }

article:a4 { ... }
article:a5 { ... }
article:a5-trimmed { ... }
article:letter { ... }
article:html-one-page { ... }
article:html-many-pages { ... }

memoir:a4 { ... }
memoir:a5 { ... }
memoir:a5-trimmed { ... }
memoir:letter { ... }
memoir:html-one-page { ... }
memoir:html-many-pages { ... }

memoir-article:a4 { ... }
memoir-article:a5 { ... }
memoir-article:a5-trimmed { ... }
memoir-article:letter { ... }
memoir-article:html-one-page { ... }
memoir-article:html-many-pages { ... }


The etc/defaults.cfg file serves several important purposes.

First, Canthology makes use of over twenty configuration variables. The defaults.cfg file provides sensible default values for most of those variables. This greatly simplifies Canthology for new users.

Second, some people like to write LaTeX-based documents with, say, the book class, while other people prefer to use the newer and more flexible memoir class. Although the memoir class is mostly backwards compatible with the book class, there are a few incompatibilities, which often manifest themselves as slightly differing contents of the root ".tex" file. For example:

• A book-based document might need more \usepackage commands than a memoir-based document, because the memoir class has the functionality of many popular packages built into it.
• In a memoir-based document, the sequence of commands used to achieve a particular result—such as guarantee the table of contents appears on a right-hand page, or ensure that the table of contents lists the start of the appendices—is sometimes different to the required sequence of commands in a book-based document.

The etc/defaults.cfg file can encapsulate users from these subtle differences between book- and memoir-based documents. This is because the memoir-related scopes specify the packages required by memoir and also define macro.tableofcontents and macro.startAppendices in a memoir-compatible way. Likewise, the book-related scopes specify book-required packages and define the macro variables in a book-compatible way.

Of course, many of the configuration settings for memoir are identical to those required for book. This is why the etc/defaults.cfg file has a default.common scope that defines variables with common values. The memoir- and book-based scopes use the @copyFrom command to access those variables.

Subtle differences are not limited to just use of the book or memoir classes. Subtle difference in how to write a LaTeX document also arise when creating PDF or HTML documents. The etc/defaults.cfg file does its best to encapsulate many of those output-format differences too.

The third and final purpose of the etc/defaults.cfg file is to make it possible for people to extend Canthology without having to modify its Java source code. For example:

• There are other document classes that some users might wish to use to write a document.
• The etc/defaults.cfg file assumes users want to use pdflatex to convert a LaTeX document into a PDF file. Some users might prefer to use, say, xelatex or latex and dvipdfmx to produce a PDF file. Or perhaps a user will want to use latex and dvips to produce a PostScript file.
• The etc/defaults.cfg uses HeVeA to generate HTML (this is discussed in Part IV). However, some people may prefer to use another LaTeX-to-HTML converter.

It should be possible to customise Canthology to support the above by modifying etc/defaults.cfg.

8.5  Extending Configuration Variables

As discussed in Section 8.4, one purpose of etc/defaults.cfg is to encapsulate subtle differences in the use of particular document classes (such as book and memoir) or in creating documents for different output formats (such as PDF and HTML).

One way this encapsulation occurs is by assigning suitable values to the following variables:


root_file.package.names
root_file.preamble
substitutions.search_replace_pairs



Because of this, it is important to extend (rather than replace) the values of these variables in the configuration scope for a document. For example, the lines shown in bold in Figure 8.5 are likely to result in LaTeX error messages or badly-formatted output when you run canthology.

 Figure 8.5: Some configuration variables should not be replaced
@include getenv("CANTHOLOGY_HOME") + "/etc/defaults.cfg";
anthology1 {
@copyFrom "book:a4";
root_file {
base_name = "my-anthology" + macro.paperSizeSuffix;
package {
}
front_matter = [...];
main_matter = [...];
back_matter = [...];
}
substitutions {
}
}


Instead, it is better to extend the values of those variables by using the list concatenation operator (+) to merge a new value with the existing value of a variable. This is illustrated in Figure 8.6.

 Figure 8.6: Some configuration variables should be extended
@include getenv("CANTHOLOGY_HOME") + "/etc/defaults.cfg";
anthology1 {
@copyFrom "book:a4";
root_file {
base_name = "my-anthology" + macro.paperSizeSuffix;
package {
names = [...] + names;                            % good
}
preamble = [...] + preamble;                          % good
front_matter = [...];
main_matter = [...];
back_matter = [...];
}
substitutions {
search_replace_pairs = [...] + search_replace_pairs;  % good
}
}