The Epos Speech System: Source Code Documentation

8. Source Code Documentation

This section is of little use for anyone except for programmers willing to contribute to the development of Epos or going to modify its source code. It is also not trying to become a beginner's guide to Epos. Anyway, if you are personally missing something here or elsewhere, tell me and I may add it; that will become almost the only source of progress in this section of documentation. The section may also slowly become outdated due to lack of interest, except for the subsection on portability.

8.1 Design Goals

Overall coding priorities, approximately in order of decreasing precedence:

language independence and generality
no undocumented or implicit "features" (except for error handling)
portability
maintainability, clean decomposition
clean (intuitive) protocols and programming interfaces
scalability
intuitive configuration
fault tolerance
simple algorithms
code readability
speed
space
possible parallelizability

8.2 Isolated Classes

class parser, unit, rules, text and maybe a few others are isolated classes that take no advantage from inheritance. The reason for the class-oriented design is just a matter of code readability and decomposition in this case.

Class `simpleparser`

This class takes some input (such as a plain ASCII or STML text) and then can be used in conjunction with the class unit constructor to build the text structure representation. Its purpose is to identify the Latin text tokens (usually ASCII characters, but some traditional tokens like "..." would be difficult to identify later, as well as numerous other context dependent uses of "."). The parser also identifies the level of description which corresponds to the token and this is the information needed by the class unit constructor to correctly build the TSR. In this process, the parser skips over any empty units, that is, units that contain no phones (simple letters) at all.

Note that it is unnecessary and counterproductive to distinguish between homographic tokens used at the same level of description here; such intelligence can be handled more flexibly by the language dependent rules. In fact, they tend to be usually language dependent. The parser only avoids losing information (through empty unit deletion) by the minimum necessary tokenization.

The STML parser is still unimplemented.

Class `unit`

This class is the fundamental element of the text structure representation. Its methods are listed in unit.h. Every object of this type represents a single text unit. Every unit includes pointers to its immediate container, and to its contents. The contents are organized in a bidirectional linked list; pointers to the head and tail units of this lists are stored in the unit. These links, i.e. prev and next, also serve to locate the neighboring units; they may be NULL, indicating that this is the first/last unit in the immediate container. For most uses, these pointers are not suitable to be used directly; the Prev and Next methods find the neighbor, even if a higher level boundary lies in between. It is also possible to mark a unit as a scope one. In this case, the Next and Prev methods will be unable to cross its boundary from inside out (they will return NULL if this is attempted). If you need to modify the TSR directly, you will benefit from calling unit::sanity occasionally. This method checks the TSR structure near the unit which has called it and will report a severe error, if an invariant is violated, thus saving you from misguided error messages or crashes later.

To extract the prosodic information from a TSR, call the effective method. It will combine the prosodic adjustments present at all the levels of description above the current unit.

Class `text`

This class represents a logical line-oriented text file. It handles things like the @include directive, backslash-escaped special characters, initial whitespace and comment stripping. It is used for rule files, configuration files, and also for the dictionaries.

Class `file`

This class represents a physical data file. Its main purpose is to cache and share files repeatedly needed by Epos. The claim function (to be found in interf.cc) should be used for opening the file (or only sharing an existing copy if the file is already open) and reading the data out of the file. The unclaim function is called separately for every claim call whenever the file is no more needed.

Any code which uses this class should never extract the data member out of it and use it independently, even if the class itself remains claimed. This is because if the content of the file has changed, the data in memory will be reallocated and re-read upon the next call to claim or possibly even sooner. This may cause invalidation of the original data member at any point of a control switch to another Epos agent. It is possible to call reclaim at any time to force re-reading any file if its time stamp has changed.

Class `hash`

class hash is derived from class hash_table<char,char>. The hash_table template is a generic hash table, keys and associated data items being its class parameters. This implementation uses balanced (AVL) trees to resolve collisions and is able to adjust (rehash) itself when it gets too full or too sparse. It is a very robust and fast implementation and it is independent of the rest of Epos, so you may use it in other projects if you want to (subject to GPL). If you want to have the hash table keep a copy of its contents, the key and/or data may only be of a fixed size type, or a C-style string. Alternatively, the hash table will only store pointers to these items. These approaches can be mixed in any reasonable sense of "mixing".

The hash tables are used frequently in Epos in various type combinations (see hash.cc for a list. They're also used for parsing the dictionary files.

Class `rules`

Note the difference between class rules and class rule. Every set of rules in Epos (there is one per language) is a class rules, which contains a single r_block, which in turn contains the individual rules. The class rules serves as the only communication interface between the rule hierarchy and the rest of Epos, but there is no inheritance relation between them.

8.3 Class Hierarchies

Class `rule`

Each rule object represents a rule to be applied to a structure of units. The class hierarchy:

rule

r_regress
- r_progress
r_raise
r_syll
r_contour
r_smooth
r_regex
r_debug
hashing_rule
- r_subst
  - r_prep
  - r_postp
- r_diph
- r_prosody
cond_rule
- r_inside
- r_if
- r_with
block_rule
- r_block
- r_choice
- r_switch

Classes not beginning in r_ can be considered abstract.

Class `agent`

Epos can be configured to support multiple simultaneous TTSCP connections and except for bugs, no single unauthorized connection should be able to create a Denial of Service situation, such as long delays in processing other connections. To achieve this, Epos uses a simple cooperative multitasking facility called agents. An agent (process) is an entity, which is responsible for carrying out some task, such as reading a few bytes from a file descriptor. At any moment (except for the startup and the very moments of a transfer of control), exactly one agent is active (Epos doesn't support SMP to avoid the unnecessary overhead and complexity in the typical case). If an agent has to wait for some event before its job is finished, for example, when the sound card reaches full buffers or not enough data has arrived through a network connection, the agent calls the block method (reading) or push method (writing) with the offending file descriptor. It is also possible for an agent to wait until some other agent executes; see the block and push methods' implementation for an example. If an agents wants to have another agent running, it can call the schedule method to add it to the queue of runnable processes. The scheduled agents always acquire control through the run method; when this method returns, another agent is chosen. If there are no more runnable agents, Epos will wait until an agent becomes runnable through a status change of the file descriptor the agent is blocking for.

Most agents get their data input through the inb data member and place their output into the outb data member. Whenever the agent has completed a stand-alone chunk of output, the agent calls the pass method to pass it to its successor and to schedule it for processing. The output agent never calls pass (it has actually no successor and it is responsible for writing the data somewhere outside Epos), but it calls finis when the data has been successfully written.

Most agents are organized into streams of interconnected agents. See the strm command for the semantics of that. Other agents are responsible for individual TTSCP connections, for accepting new connections and other tasks. A special agent is used for deleting other agents when they need to delete themselves.

The chunk agent may perform utterance chunking, that is, splitting the text being processed at convenient points, such as after a period or at end of paragraph. Such chunks travel through the rest of the stream independently and they are queued between consecutive agents if necessary. Such a queue (if non-empty) is a linked list starting with pendin of the latter agent while the end is pointed to by pendout of the former agent. The pendcount member of the latter agent stores the current number of data chunks in the queue, which is used for sanity checking and flow control.

The current agents are:

agent

stream
a_accept
a_protocol
- a_ttscp
a_disconnector
a_ascii
a_stml
a_rules
a_print
a_diphs
a_synth
a_io
- a_input
- a_output
  - oa_ascii
  - oa_stml
  - oa_diph
  - oa_wavefm

8.4 Testing

The Epos package contains three TTSCP clients. One of them is the standalone say-epos utility, which is provided as a good and simple example of a TTSCP client. We suggest to use it as a starting point for developing specialized TTSCP clients, even though it is already somewhat crufty.

The tcpsyn virtual synthesizer also embeds a TTSCP client; it is wise to check its proper functioning after making changes to the TTSCP server implementation.

A standalone test suite is compiled under the name vrfy. It is currently only trying a few standard tricks to crash the server and is far from being a rigorous test suite. However, it manages to catch much more programming errors than say-epos and we recommend to run it after making any changes to the source code of Epos. This test suite assumes Epos has been configured correctly and is listening at the standard TTSCP port. Don't be surprised if a bug found by vrfy turns out to be a false alarm because of a bug in vrfy itself.

No part of the vrfy TTSCP client should be mimicked by other software or be used as a study material. This client tries to be as ugly as possible and to crash any crashable server. Adding some ugly tests to this piece of code might raise the average code quality of Epos significantly.

8.5 Portability

Epos is written to be as portable as possible. It is however also written with UNIX developer's tastes, and it is also partly true of this documentation. The following should give you an approximate look at the degree of support for some most common operating systems.

Linux

The primary development OS. The most of the testing is done under Debian and Red Hat distributions. Please report to us any compilation issues which may be distribution related, these will be the easiest ones to solve.

Other UNIX Clones

Epos is ported to other unices from time to time as well, but there may be minor incompatibilities in recent code. In this documentation, references to UNIX should be read as "tested on Linux, implemented using POSIX compliant interfaces and expected to be easy to get working on any other UNIX clone".

Epos uses the autoconf package to avoid portability pitfalls within the UNIX world. Features like syslog are welcomed and used, but only if the corresponding system header file is detected by autoconf.

For sound output, OSS is preferred (if detected); otherwise, the Portable Audio library conveniently provided with Epos is used.

QNX

On the QNX operating system, Epos can be controlled not only over a TCP-based TTSCP implementation, but also using a QNX specific interprocess communication interface. See src/qnxipc.cc for details; be however aware that this code has never been completely debugged because of a drop in our motivation. You could help debug this easily if you really need this and provide us with a QNX machine.

Windows NT, 2000, XP

See the arch/win directory for architectural differences from UNIX. Be aware of the following three differences of Epos's behavior on these operating systems: the mmsystem (Microsoft Multimedia System) library is used instead of /dev/dsp (Open Sound System) for speech output; Epos compiles and runs as an NT service named ttscp, instead of a UNIX-style daemon; you can use registry to locate the configuration files.

In order to make service installation and registry access available, it is necessary to build and run the instserv utility before running Epos. That utility, if run with the letter u on its command line, can also uninstall the ttscp service, but it doesn't remove any registry values.

You should use the Visual C++ compiler for compiling Epos, but you don't need it for running Epos. The Borland C++ Builder and Watcom C++ used to work a long time ago, too. Ask us for help with these compilers if necessary. Please refer to the WELCOME file on how to proceed step by step with Visual C++.

Advanced Notes for Windows NT

File input and output modules are not going to work with Windows sockets (whose incompatible implementation of the select call doesn't allow file decriptors at all). If you do enable the writefs option, Epos will crash after the first writing error such as disk full. Don't try to enable the readfs option.

Windows CE

The port was roughly done and found possible, but it is not maintained. Ask us if you need it. Files specific for this port can be currently found at arch/win-ce.

An experienced Windows user can get a good estimate of this port's behavior from reading the sections on other versions of Windows. The same holds for Windows XP embedded.

Windows 95, 98 etc.

We don't support these DOS successors very strongly now, but these ports used to work. If you want to try out, you should probably comment out the HAVE_WINSVC_H line in src/config.h after running arch/win/configure.bat. This will force Epos to compile not as a Windows NT service, but as an ordinary UNIX-style daemon. In fact, the way Epos is written, it will decide to run as a daemon if it can't connect to the service controller anyway.

The same holds for MS DOS, but as MS DOS offers no sound playback interface, you'll have to comment out portions of source code here and there to make Epos e.g. produce wave files. Good luck and don't even try to use 16-bit compilers, please.

Other OSes

Please contact the authors for advice with any OS significantly different from the UNIX and Windows families. However, the approximate requirements are:

Architecture: 32bit (big endian is OK)
Reliable C++ compiler (no libraries needed)
Standard C library
8-bit ASCII based character set
TCP/IP networking

Note that the architectural requirements are only a guideline and are enforced rather for lack of energy for debugging Epos on every perverse 36-bit machine with PDP byte ordering. Epos supports big endian architectures, but the corresponding code still needs to be tested. The integers and pointers can be any size not less than 32 bits as long as the integers are not longer than pointers. If they were, a single code change would do the port.

TCP/IP networking is not strictly necessary, but if you don't have it, you can either try to adapt the QNX IPC proxy for your favourite IPC interface, or you can build the monolithic binary of Epos.

A bourne-compatible shell is helpful, as it allows to run a configure script. Otherwise you have to write a src/config.h file by hand as we have done with the Windows ports. A plain old make utility helps the compilation process if your OS can emulate a UNIX development environment a little bit.

8.6 More Information

The header files mostly define basic interfaces for individual Epos components. Reading the ones related to a specific piece of code may often clarify things. Lots of global data declarations live in common.h; others (especially small, library-like functions) can be found in interf.h.

If you have any code or development related comment or question about Epos, send it to the Epos development mailing list epos@braille.mff.cuni.cz. You are also encouraged to subscribe to the list first by sending a mail containing only the text subscribe epos to mailto:listserv@braille.mff.cuni.cz>. Please spend a few seconds by trying to look up the answer in the documentation first.

Next Previous Contents