Next Previous Contents

4. Options

Most aspects of the operation can be customized by changing options. This can be enforced by TTSCP commands, in configuration files and on the command line. Basically, an option is an option name/option value pair. There are a few possible option types: number, string, yes/no, character, and a few enumerated types.

Almost all individual options are documented in this document. If you strongly suspect this section of documentation to be out-of date, the reliable list of all built-in long options can be found in src/options.lst. (See approximately line 320 and below. Every line represents at most one option for every option class as explained below. The option name is quoted; its semantics is usually explained following the option.) You can also list the option names and types using eposd -H.

4.1 Option Classes

There are four classes of options: static options, global options, language options and voice options. For instance, switching a voice in fact means switching to another set of voice options, while the language options, global options and static options stay the same.

Every voice is language specific (implies a specific language). That's why switching the language automatically switches the voice to the voice defined as default or the last active one for this particular language. (It is of course possible to use the same configuration file to setup a similarly sounding voice for every language if desired.) It is also important to distinguish between a voice and an inventory. An inventory is a set of speaker-dependent files used as a base for voice, that is, a mode of speech. Multiple different voices may use the same inventory, and may even sound quite different depending on various configuration options. A user of moderate expertise will be able to modify the voice dependent configuration files, but not the inventory files. Also, the inventories may often be distributed externally to Epos, while sample voices based on them may either be included with Epos, or with the segment inventory, or their creation may be completely left to the user.

A set of voice options should completely describe a voice. A set of language options should completely describe a language, except for voice-specific behavior. A set of global options should cover language independent aspects of operation. It should be understood that a set of options may employ various references to other information, especially filenames (of language dependent transcription rules, voice dependent segment inventories etc.)

Some language and/or voice specific options may have suitable defaults for all but specific languages/voices. That's why there is a corresponding language option for every voice option to default to if unspecified for a given voice. Likewise, there is a global option for every language option to default to. Therefore, adding a new language or voice options doesn't necessarily imply adding them to each language or voice configuration file, if a reasonable default can be suggested.

The difference between global and static options is subtler and not that important for most users. All voice specific, language specific and also global options are implemented as user specific, that is, each TTSCP control connection may assign a different value to them at the same time. For some options however, which are of even more global impact, this may make little sense. These are termed static. Assigning a value to a static option causes the value to change even for other concurrent users.

Whenever an option name is given, it is first understood as a voice option (of the current voice); if there is no such voice option, the name is treated as a language option, then as a global/static option. To override this order, you can prefix an option with "S:", "C:", "L:" or "V:". This will restrict the search to static, global, language or voice options, respectively. (This is automatically done with configuration files, because every configuration file describes either a language or a voice, or it is unrelated to the current language and voice altogether.)

4.2 Option Types

Each option has an associated type, which is indicated with each option in the documentation. Mostly these are traditional types like booleans, strings, and non-negative numbers. Others are limited to a fixed choice of values (e.g. "mono" or "stereo"), yet others are limited to a choice of values which depends on other configuration (e.g. on the levels of descriptions, on the available encoding mapping files, or on the current list of language configurations). With all these options it is impossible to set the option to a syntactically impossible value, although it is easy to set the option to a value which makes little sense.

Autoconcatenating Options

Several string-typed options (especially the language, voice and soft option lists) have special behavior if they appear multiple times within the configuration (not limited to the configurations files!). In contrast with almost all options which take the last value offered, these special options concatenate all the supplied values and separate them by colons. This makes sense, as all these option values are syntactically colon separated lists of strings.

If you need to set an autoconcatenating option to a different value, you need first to reset it first by supplying an empty string to it. This does set it to the empty string as opposed to concatenating a colon and an empty string to the previous value.

4.3 Configuration Files

Whenever the program starts up, it reads a number of configuration files, setting up the initial (default) values of the options. These are located under /usr/local/share/epos (unless overidden at configure/compile time or with the --base_dir option on the command line). By convention, configuration file names have the ".ini" suffix.

(On a Windows NT-like operating system, you can use the HKEY_LOCAL_MACHINE\SOFTWARE\Epos\Setup registry key to set the value of Path to a path leading to the configuration files. This value can still be overridden by the command line.)

Unless overridden, three files in /usr/local/share/epos/cfg will be processed to setup the global configuration: fixed.ini, epos.ini and either ansi.ini or rtf.ini (depending on the preferred output format if colored output is turned on -- either ANSI escape sequences or RTF are supported at the moment; other markup formats can be added easily). By convention, fixed.ini contains the standard global and static configuration values used by a given installation and rarely or never changes, while epos.ini contains less permanent parameters and temporary values. The global parameters include a list of languages, or language configurations which will be parsed when the global configuration is set up. Every language .ini file in turn contains a list of voices to be configured for this particular language. Language configuration files are located under /usr/local/share/epos/lng/*/*.ini, voice configuration files under /usr/local/share/epos/inv/*/*.ini, where * represents any string listed in the list of languages (or voices for a language). Go see the real files under /usr/local/share/epos/cfg/*.ini for an example.

Configuration File Format Overview

A configuration file contains one option per line (empty lines are ignored). Each option is a name-value pair, separated with whitespace. A string value may be (and sometimes must be, e.g. if it begins with whitespace) enclosed in double quotes. Every configuration file is associated with a certain option set, that is, it contains either just global options, or options related to some language or voice. (In the latter two cases, the name option identifies the language or voice properly.)

Character Encoding

Sometimes it is not convenient or possible to encode all configuration files in the same character encoding. For example, a character encoding may be language dependent. But it is even possible for different lines of the same file to use different encodings. You can switch character encodings for the same file using the @charset directive. Charset name must be specified as a parameter and it must be enclosed in parentheses. Epos attempts to load the corresponding unicode character mapping file and immediately switches the charset number for the current file. Please note two differences from the charset option: the directive only affects the current "logical" file (including included files as well as any files it has been included from using the @include directive), whereas the option doesn't affect it at all, but it affects all subsequent files (e.g. for the given language), such as the rules file.

You have the following choice of charsets: arbitrary 8-bit encodings (provided a Unicode mapping file is present in cfg/mappings), utf-8, sampa-std (standard SAMPA) and sampa-alt-NAME where NAME is a pre-configured SAMPA alternate.

Escaping Special Characters

It is possible to enter some special characters using escape sequences, such as "\ " for a space, "\n" for a newline, or "\~" for "..." or "dots" (treated as a single character in Epos). The available escape sequences are listed in the table of escape sequences.


escape sequence interpreted as ASCII code example
\n newline 10
\t tab 9
\e escape 27
\E escape 27
\[ escape 27
\\ backslash 92
\ hard space 255
\# hash 35
\; semicolon 59
\@ at sign 34
\~ dots (1) No...
\. decimal point (2) 1.2 kg
\- range (3) 2-3 people
\m minus (4) -1
\X temporary (31)
\Y temporary (30)
\Z temporary (29)
\W temporary (28)
\V temporary (27)
\U temporary (26)
Escape sequences used in configuration files

(If you suspect this table to be out-of-date, you can consult the token_esc and value_esc constants in option.lst, or the table in parser.h.) ASCII codes in parentheses are ASCII codes used for different purposes by Epos.

Comments and the @include Directive

A configuration file usually contains various comments. A comment can start on any line with a semicolon or hash mark and lasts to the end of line. The semicolon (or hash mark) must be located at the beginning of line or just after some whitespace. Semicolons in the middle of a word don't start a comment.

If a line begins with @include (possibly preceded by whitespace), it is treated as the include directive. The format of the line should be @include "filename" and it will cause the contents of the filename file to be inserted at this place of the main .ini file. These includes can be nested to any depth up to the value of the max_nest option; if no or relative path is specified, the directory which holds the topmost file is used for the lookup.

Other Directives

Two more directives, @warn and @error with obvious semantics are available for diagnostic purposes.

4.4 Command Line

Command line option values can be passed to Epos at server startup, or to the say-epos utility. The behavior of these two command lines is very similar, though not identical. The monolithic Epos executable should not deviate from the server executable, unless specified otherwise.

The options can be specified anywhere on the command line, and are processed from left to right before any other text (which is treated as the text to be synthesized) on the command line. Long options, which correspond directly to the options specified in configuration files, are preceded with a double dash and they take a value, which can be separated using the equality sign (=), or spacing; the value (true or the empty string) can be understood implicitly, if applicable to the option type, and the option is followed by another option or comes last at the command line. Short options are denoted by single letters, preceded with a single dash, and they never take a value.

Input Text

Technically speaking, this subsection doesn't really concern the options, but it is nevertheless included.

The main purpose of the say-epos utility is to convert a specified text to speech. Therefore, any command line text which is not part of an option name or option value is concatenated together with spaces, and sent to the server for the usual processing (TTS and phonetic transcription), after all preceding, intervening or following options have been sent to the server.

The same is true for the monolithic Epos executable, but in this case, only a single argument of this sort is accepted. (Just enclose the text with quotes to have it treated as a single argument. The sole reason why this is so is the less-than-maintained status of the monolithic executable.)

In both cases, quite random defaults are supplied if no input text is specified.

It is not possible nor desirable to specify the input text for the TTSCP server executable. It is accepted and silently ignored.

Long Options

The long options are also available through the say-epos utility; in such case, the long option is passed to the TTSCP server without any effect on the say-epos client.

The ordering of the options on the command line is usually not significant, unless the current language or voice is switched during the processing. An example:

say-epos --language german --pausing "Wie geht es?" --show_segments

This sets the default language to German, enables pausing after each transformation rule is applied to the text (sets the pausing option to true, in other words) and prints the segment string generated in the process (sets the show_segments option to true). The string to be transcribed and synthesized is given on the command line as well (it must be quoted unless it consists of a single word).

The list of all available long options together with their types can be obtained with eposd -H of the main program (not the client stub, sorry). The semantics of nearly all individual long options is described below.

For turning long boolean options off, it is possible to give their name with three dashes. Therefore,

say-epos ---show_transcript "Say this, do not show the transcription"

is equivalent to

say-epos --show_transcript off "Say this, do not show the transcription"

Refer to the individual options section for documentation on available long options.

Language and Voice Switching

Two pseudo-options --language and --voice can be used to switch the current language or voice, or to specify options for other than the default language or voice. There may be multiple options of this kind given to the say-epos utility or the monolithic Epos executable, and their ordering is important. For example,

say-epos Something. --init_f 80 --voice vichova

doesn't do the expected thing, i.e. use the voice named "vichova" as modified by setting the neutral pitch for this command to 80. Instead, the neutral pitch is set for the default voice, and then the voice is switched to the specified one. To get the intended behavior, reorder the command line:

say-epos Something. --voice vichova --init_f 80

Short Options

The most frequently used options and occasionally even collections of options and/or other stuff are given a shortcut, called a short option. A short option is a single letter preceded by a single dash. Usual conventions for merging short options into a single string are available, and the example above may thus be abbreviated as

say-epos --language german -p "Wie geht es?" -d

or even

say-epos --language german -pd "Wie geht es?"

(There is no short option for --language.)

The short options are interpreted directly by the say-epos utility; that's why the list of short options available with this utility is different from the list of short options available with the server or the monolithic Epos executable. A list of (the most of) the server short options can be obtained with eposd --help, or eposd -h/. A list of (the most of) short options understood by the say-epos utility can be obtained with say-epos -h. Such list however includes undocumented options which may disappear in the next releases.

Please keep in mind that the say-epos utility is only an example of a TTSCP client and not a full-fledged controlling interface to Epos.

Server Short Options

-f

Do not fork at startup. Same as ---forking.

-h

Show a summary of available short options

-p

Same as --pausing.

-v

Show Epos version.

-D

Turn on debugging output. Multiple -D options cause the output to be more verbose: this is equivalent to lowering the debug_level option to four minus number of occurences.

-H

In addition to a summary of short options, show also a list of all available long options.

Client Short Options

-k

Shut down the server.

-l

List all available languages and available voices for the current language.

-m

Write the waveform to a file (said.vox in the client's current directory) instead of writing it to the local sound card; the output doesn't include any header and is in the mu law format.

-o

Write the waveform to the standard output. In this case, no transcription is performed.

-u

Turn on utterance chunking. This option has unintuitive consequences when combined with -w.

-w

Write the waveform to a file (said.wav in the client's directory) instead of writing it to the local sound card.

4.5 Setting Options in TTSCP

See the TTSCP specification for general overview of TTSCP and the set command.

restr.ini File

Setting options in TTSCP can be a security problem, as some options can cause the server to access unrelated files. It is therefore strongly recommended not to run the Epos daemon with superuser privileges, but sometimes a more fine-grained access control mechanism is needed. It is authentication and limiting access to specified options for some or all users.

By default, all settable parameters can be changed by any outside connection (this doesn't affect the value in use for any other connection in any way). You can control this privilege by restricting it in cfg/restr.ini.

Every line of that file is in the form

option_name       access_rights

The access rights must be in lowercase and must not contain spaces. They are a sequence of r, w', $ and #, none of which may be repeated. Their order is significant, the interpretation is as follows:


r read only
w write only
rw no restriction
#w$r root can write, authenticated users can read
r#w anyone can read, root can also write
#rw root can read/write
Examples of option access rights

Unknown parameters are considered just not to have been implemented in this particular version of Epos and are not reported. Parameters not mentioned in restr.ini are allowed unlimited access by any connection.

Note that the location of this file can be changed at the command line (with base_dir).

4.6 Semipermanent Personal Preferences

If many users with different needs and aesthetic feelings share the same Epos daemon installation, they may choose to place the appropriate TTSCP commands into a file and to point the TTSCP_USER environment variable to this file. The contents of that file will be transmitted by the client to set up the working environment at the beginning of every session.

An example:

set language czech
set voice kadlec
set init_f 90
set init_i 110
set language slovak

This example will adjust the preferred pitch and volume for a certain Czech voice; it will also set the default language to Slovak. See the TTSCP specification for more info.

4.7 Reloading Configuration Files

Under UNIX, Epos reinitializes itself upon a hangup signal (SIGHUP). Existing TTSCP connections are terminated and configuration is reloaded.

4.8 Soft Options

Most options (discussed until this point) have built-in meanings and semantics; for most uses this is sufficient and necessary. However, the user may also decide to define additional options to be provided by a language to its voices. This mechanism is called soft options; soft options are always voice options and are described at the language level (that is, the name, type and default value is supplied with the language, but individual voices may choose to specify a value for the option).

The soft options are described by the language option soft_options. It is an autoconcatenating list of colon separated descriptions of individual soft options; every item is of the format name[(type)][=default], where name is an arbitrary option name, type is b (meaning boolean; other possibilities might include s, n, c for strings, integers and characters, respectively, but these don't seem to be useful. The default is the default value to be used if the option is left unspecified by a voice. It should be chosen as a backward compatible value for a new option if applicable.

The type and/or the default may be left unspecified. The default type is boolean, the default default is an empty string.

An example:

        soft_options   "colloquial=false"
        soft_options   "segment_listing_file(s)=traditnl.dph"

This example defines two options, a boolean colloquial and a string segment_listing_file.

The sets of soft options for individual languages are independent and never clash with each other. However, built-in option names may not be used as soft option names.

Please note the difference between rule file macros and soft options: rule file macros are useful for arbitrary string replacement and they serve well for concentrating every single idea to a single place at the startup time. Soft options, on the other hand, are limited to conditioning rules, but can change value later just as any other option without the need to recompile the rules. Indeed, multiple users may use the same rules with different values of the same soft option simultaneously.

4.9 Level of Description Dependent Options

Some options, especially pertaining to parsing the input and formatting the output, are set separately for every layer of the TSR, so that each of them is actually an array of options, indexed using the commercial at character (@) followed by a layer name. For example,

        perm@colon       ":,"

defines the permissible colon terminators. Since the TSR layer names are themselves defined by the unit_levels option, the availability of such options is dependent on the current value of some other option. For other options, such as default_scope, the value is a layer name, and is thus meaningful only after the layers are defined; all such options can only be set after the unit_levels option has been set correctly.

As level description dependent options (the former type) do not yet exist upon program startup, the access control in restr.ini only works for the arrays of options, not individual options:

        perm    r

For these and other reasons, changing the unit_levels option is not recommended after any of both types of level of description dependent options have been set.

4.10 Selected Individual Options

Most individual options will be described in this section. The rest are either straightforward or rarely useful. A complete list of options can be obtained through eposd -H or in src/options.lst.

The type and the semantic class of the argument is indicated for all individual options except for truth value (on/off) options. This is because these options, when found without an argument, are automatically interpreted as on.

Many options may not make any sense to you; indeed, some of them don't actually make sense to me. Such options are usually relics from now forgotten ad hoc configurations. I appreciate any suggestions on how to replace any old fashioned configuration mechanisms with more generic and/or simpler ones.

Overall Options

Some options control overall preferences, strategies and assumptions to be used by Epos. Most of them are global booleans. They are usually of technical nature and the output produced by Epos should not change with these options changed, but they can be useful in some special configurations. They also get added to resolve some simple software engineering dilemmata.

The options in this subsection are not static unless specified otherwise.

--comma delimiter

This is the delimiter for Epos-generated lists, especially in TTSCP; an arbitrary string is allowed. Do not change.

--default_char character

The character to replace any unknown characters in the input text. See also the relax_input option.

--end_of_file character

The character to terminate the input text. This defaults to the escape character. The length of the input text is usually defined externally (by the apply command within a control TTSCP connection or by the end of an input), however, there are cases where this is not applicable or desirable for some reason. The character specified by this option terminates the input text, but not the input stream. It is also necessary to press Enter after the character. Applies only to the monolithic Epos.

--asyncing

Turn on to enable asynchronous close() processing. Usable only on unices; uses fork to delegate the synchronous close() to a child. This option can be useful for eliminating communication delays when closing a sound card file descriptor, but it can cause subsequent references to the same device fail, because the child still hasn't released the device. The option has no meaning on non-UNIX systems, where the close() is always synchronous.

--forking

Turn on to allow forking and fully detaching the Epos daemon. Usable only on unices. If off, some debugging information will be written to stdout in some configurations.

--init_time n

If set to a non-zero value under a UNIX, the parent process will wait for at most n seconds for the daemon process to start accepting connections. One second is likely to be more than enough except when the machine is severely overloaded. Note that most Epos initialization takes place before the fork, whereas this option is only used after the fork.

--markup_language ml

The parameter is either "ansi", "rtf" or "none". This parameter is only effective in fixed.ini or on the command line. Depending on its value, the ansi.ini or rtf.ini configuration file is appended to the fixed.ini file during parsing the configuration. These two files contain the complete output formatting information necessary for printing text in either the ANSI escape sequences (ISO 6429) or the Rich Text Format; they use colors to distinguish between symbols of different levels of description.

--pend_max n

The maximum number of subtasks waiting in an input queue for a single agent. If this limit is reached, the preceding agent stops processing further input until only pend_min subtasks are left in the queue. Setting this limit higher will consume additional memory, as more processing can happen in advance, but setting it too low may cause unnecessary delays.

--pend_min n

The minimum desired number of subtasks waiting in an input queue for a single agent. If the queue length decrements to this limit and the preceding agent has enough input to process, it resumes operation. This limit should be set roughly to half the pend_max value.

--memory_low

Turn on if you're very low on memory. This mode sacrifices speed for a little bit of saved memory. Basically, turning this on caused various dictionaries to be discarded whenever the rule which has used them has been applied and reloaded the next time it is needed. Likewise, cached files are released upon the last unclaim. Otherwise these data structures are kept cached in for the next use.

--paranoid

With this option on, Epos will tend to detect more errors in various kinds of input files than without. It will rather try to reject suspicious or formally incorrect inputs, than to do something reasonable with them. Useful for debugging. This option is not static.

--pausing

With this option on, Epos will pause and wait for a keypress after every rule is applied. Of course, this is incompatible with the forking option and some other setups. Should be only used for debugging.

--ptr_trusted

One of the checks performed when the trusted option is disabled is checking whether some pointers are actually pointers, that is, very big numbers when cast to an integer. This can be useful on some machines, but it is absolutely unportable. It may or may not work with your compiler; enable this option to skip these checks.

--relax_input

Turn on in real life situations. When off, Epos will quit parsing any text which contains an unknown character (not listed in one of the perm* or input_perm* options). This option replaces such characters with the value of the default_char option before they're classified. This option is not static.

--show_rule

Print each rule before it is applied. This is useful mostly for debugging situations (when a text is parsed in an unexpected way and the user is trying to find out which rule has escaped his attention). This option may not work with all setups.

--profile filename

Setting this option to a file name causes profiling information to be recorded to the file named. The file is created in the current directory of the server if a relative pathname is given. Each line of the profile log corresponds to one timeslice of an agent. First, the time spend before running the agent (after the last agent has finished). Second, agent type. Third, time spent by the agent. Both time intervals are given in microseconds and their accuracy depends on the gettimeofday system call. Using the profiler on a loaded machine is going to give almost meaningless results.

--handle_size n

The TTSCP handle length in character. The handles are always generated randomly using a 64 character alphabet. Use small values for debugging the TTSCP implementation manually (and accept the risk of a handle-guessing attack); use higher values in a production environment.

--shriek_art number

The picture to be printed to stdshriek in case of an error. May or may not work. Small integers such as 0, 1 or 2 are possible.

--trusted

Code related to the class unit often calls its sanity method to detect any serious structure violation before it makes Epos crash mysteriously. In stable versions, however, these checks are unlikely to be necessary. Use this option to skip them.

--verbose

When the rules are dumped with the debug rule type and this option is set, all of them will be displayed. Otherwise only the current rule is displayed. That's all.

--localsound

Enables the use of the TTSCP #localsound output module.

--readfs

Enables the use of file system based TTSCP input modules. See the pseudo_root_dir option for more details. Note that this option can not be turned on unless the underlying operating system has a fully functional implementation of the select call.

--writefs

Enables the use of file system based TTSCP output modules. See the pseudo_root_dir option for more details.

--unit_levels

Levels of description. Must be a colon separated list which includes segm and phone as the two lowest levels, and text as the highest level of description. There are reasons why this should not be a language dependent option; you can however define this to be the union of all levels of description needed by any language.

--default_scope

The default scope level of a rule -- one of the levels of description defined with the unit_levels option.

--default_target

The default target level of a rule -- one of the levels of description defined with the unit_levels option.

--languages list

Lists initially available languages. The parameter is a colon separated list of language names. Every language must have its associated .ini file; the name of the file is obtained by suffixing .ini to the language name, while the directory name matches the language name and is located under the directory as determined by the lang_base_dir option. The first language listed will become the default language. This option autoconcatenates.

--sampa_alts list

Lists the alternate encodings of SAMPA (i.e. non-SAMPA SAMPA-like notations used by people e.g. for languages where SAMPA is not or was not available at the moment it was needed). The parameter is a colon separated list of strings. Every alternate encoding has its associated sampa-alt-XXX.txt file where the XXX comes from this list. These encodings are loaded at Epos startup.

Types of Output

When Epos is compiled as a TTSCP server, the variability of data formats is controlled by TTSCP rather than by option settings. However, there are some options related to the output formats produced by the monolithic binary (executable). There are also some conventional informative outputs that can be produced by the monolithic and server binaries equally.

All options in this subsection are static.

--show_phones

Print the sequence of sounds generated from the text processed. Monolithic binary only.

--show_segments

Print the sequence of segments generated from the text processed. Monolithic binary only.

--show_raw_segs

When used in conjunction with the show_segments option, the segments will be not only listed by name, but they will also include the actual numbers generated. Monolithic binary only.

--play_segments

Synthesize the waveform and say it through the sound card. Monolithic binary only.

--wave_header

When dumping the waveform into a file or a TTSCP data connection, put the RIFF wave file header at its beginning. Regardless of this value, the header is never added when writing the waveform to a sound output device (a file descriptor which understands the usual ioctls). TTSCP requires this option be always set to on; consequently, the option is only reasonably useful with the monolithic binary.

--ulaw

Generated waveform uses mu law sample encoding instead of linear encoding.

--out_sampling_rate Hz

Voice dependent option. May be used to downsample the output by one half.

--autofilter

If this option is enabled, an appropriate low band filter is used whenever downsampling. This is necessary to avoid phantom sounds in the output.

--label_seg

Output segment (diphone) labels in the output waveform using the appropriate RIFF WAVE chunks.

--label_phones

Output phone labels in the output waveform. This is only possible if phone boundary information is made available for the inventory using the snl_file option. For other voices, this option has no effect.

--label_sseg

Very experimental. If used in conjunction with the label_phone, the phone labels are assigned not the phone level, but the highest level whose boundary is detected at this place. If you do use this option, be aware that Epos will use a (very slightly) TTSCP 0 non-compatible internal representation for the segment (diphone) stream to preserve the necessary suprasegmental unit boundary information. Consequently, network voices provided by different TTSCP servers may fail reporting unexpected segment numbers.

--immed_segments

Print the sequence of segments generated from the text processed just after the segments rule. This is useful especially in conjunction with the neuronet option where the segment layer is created, but later to be discarded by the rules. Monolithic binary only.

--neuronet

This is normally on. Turning this off skips the neuronet initialization and makes Epos shutdown if the functionality is requested later. This may be useful with debugging tools like the Electric Fence.

--help

Print a brief synopsis of short options upon startup. No data processing is performed.

--long_help

Print a list of long options upon startup. No data processing is performed.

--version

Print the current version number to stdshriek upon startup.

Text Output Formatting

You can tailor the conventions for printing out processed text quite a lot. Basically, we're printing out a text structure representation, so that we can see what level of description does a character belong to. Preserving this information in the output is often very desirable; it can either be done by inserting delimiters such as custom syllable breaks, or by coloring some levels of description.

This family of options can result in a quite complex configuration. That's why we provide at least two complete sets of settings in ansi.ini and rtf.ini. You can use the markup_language option to switch between them in fixed.ini.

Some options control the colors used for output. For the time being, these options actually take the escape sequence needed to switch the color for the current format (e.g. ANSI escape sequences or RTF). In principle, other strings than escape sequences can be printed, but such configuration is discouraged.

Some options configure the appearance of the TSR to the user. The model we use is assigning a few colors consistently to the individual levels of description and marking up the boundaries between units with parentheses, separators or both. The levels of description are defined in compile time. The segment and text may not be applicable for some of these options. This model of displaying the TSR is not used for transmitting the text over TTSCP.

All the options in this subsection are static.

--colored

If disabled, all color manipulating options will be ignored. Many configurations will enable this by default, because the escape sequences are rarely usable directly and never indirectly.

--normal_color color

String to switching to the neutral (default) color. Issued at the end of every colored piece of text.

--curr_rule_color color

String to switch to the a bold color. The bold color will be used to highlight the current rule in the list of rules printed by the debug rule type.

--fatal_color color

String to switch to the color used for printing out fatal error messages.

--header filename

The value is the file name of a file in the directory specified by the ini_dir option, which is to be printed before any phonetic transcription.

--footer filename

The value is the file name of a file in the directory specified by the ini_dir option, which is to be printed after any phonetic transcription.

--begin* string

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is a string which will be printed before the first unit within this unit, for example before the word-initial syllable in case of begin@word).

--close* string

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is a string which will be printed after the last unit within this unit, for example before the word-final syllable in case of close@word).

--color* string

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is the string to switch the color for this level of description.

--separ* string

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is a string which will be printed between adjacent units of this level of description, for example between words in case of separ@word).

--structured

Whether the verbose model of displaying the TSR as described above is used. When off, only the text itself is printed and simple spacing is used to delimit units which do not correspond to actual characters. This option is orthogonal to the colored option.

--postfix

If on, the upper level characters (such as punctuation) are printed after the lower level characters (such as letters or sounds) in the phonetic transcription.

--prefix

If on, the upper level characters (such as punctuation) are printed before the lower level characters (such as letters or sounds) in the phonetic transcription. Disabling both prefix and postfix options effectively disables printing other characters than sounds. This option is orthogonal to the structured option.

--swallow_underbars

Epos uses the low line (_) character to represent suprasegmental units with no content at their level (e.g. syllables are often only implicitly terminated or even generated by the rules and have no associated symbol); this option, when enabled, suppresses them completely.

Limits

Most algorithms used in Epos are boundless, avoiding techniques like fixed size arrays or buffers. On the other hand, there are instances when this is inadequate, especially for speed or space considerations. In these cases Epos tries to use growable data structures, so that they perform well up to a certain size limit and then somewhat slower, but still correctly. We call such a limit a soft limit, as opposed to a hard limit which cannot be exceeded. Most limits in Epos are configurable and soft, but some hard limits have also been imposed. This subsection also covers some time vs. space trade-off configuration parameters, though these are no limits at all. Also some sanity check limits are imposed; these act as hard limits, but can be effectively disabled by setting them to absurdly high values, with no direct impact on efficiency in the typical case.

In fact, you can ignore this subsection completely, as the few hard limits tend to employ reasonably high values.

The options in this subsection are static unless specified otherwise.

--buffer_size bytes

Soft limit. The initial buffer size for a wave file. This value is not used, if we already know that we will eventually write this waveform to a sound card device; in this case we use ioctls to find out the size of its hardware buffer to maximize the chance of getting a smooth playback.

--ssif_buff_size bytes

Soft lmit. The initial buffer size for the SSIF buffer as SSIF is being extracted from TSR.

--dev_text_len bytes

Sanity check limit. When reading from a device, this is the maximum amount of data which will be read for processing. This is not necessary when reading from a file, because the length of a file can be known in advance. This option is not static.

--hash_search n

Trade-off. Controls how many multipliers are tried out when constructing a perfect hash table, for each table size. The search begins at 1 and continues up to n. If the table still has collisions, hash table size is increased by one and the search restarts. This is iterated until a perfect hash table is found. As we only use perfect hash tables for representing constant sets and functions, they are only constructed during Epos startup. Setting this option to a small value (such as 17) speeds up Epos startup, while larger values can sometimes arrive at a smaller table, thus saving some memory.

--hashes_full percentage

Trade-off. Controls how full should a hash table ideally be. The hash tables used in Epos are actually quite robust performance-wise, so that even values like 1000, that is, ten data items per a hash table slot, result in near-optimal speed. Values somewhere below 100 are the best bet.

--max_errors count

Sanity check limit. If more than count of errors in a rules file is found, Epos quits parsing the file.

--max_nest depth

Sanity check limit. If the include directives nest deeper than this value, Epos quits parsing the file on the assumption the inclusion is cyclic.

--max_line_len characters

Hard limit. Maximum line length in a text-oriented input file. Longer lines are truncated.

--max_net_cmd characters

Hard limit. Maximum TTSCP command length. TTSCP lines longer than this will be truncated. The protocol requires this value to be at least 80, but a few kilobytes is recommended.

--max_rule_weight weight

Sanity check limit. Maximum rule weight in a choice, as well as the maximum rule repeat count. Using very large weights can result in memory exhaustion. Values on the order of 10000 are still perfectly safe.

--max_text_size bytes

Sanity check limit. Maximum amount of space allowed for growable processing buffers, or for the input text (checked just before parsing). This option is generally used to avoid memory exhaustion.

--max_utterance bytes

A trigger of a hard limit. If utterance chunking is employed, Epos tries quite hard to shrink every utterance below this limit based on a fixed language independent heuristic.

--split_utterance bytes

Hard limit. If utterance chunking is employed and Epos completely fails to break an utterance below the max_utterance value, it will simply split the string after split_utterance bytes.

--multi_subst count

Sanity check limit. How may substitutions will be applied to a unit during processing of a subst rule. The rule is applied until the unit settles down or until this limit is reached. In the latter case, the substitution is considered impossible (infinite).

--rules_in_block count

Soft limit. Number of rules used in a block of of rules.

--scratch_size bytes

Hard limit. Epos uses a temporary internal buffer in a few places. Usually only very few bytes of the buffer are needed, but overflowing it is fatal. It is recommended to provide at least a few hundred bytes, preferably kilobytes, for this buffer.

--seg_buff_size segments

Soft limit. Maximum number of segments generated by the diphoniser and synthesized at once. It more segments have to be generated, they are synthesized in chunks of seg_buff_size. If this option is set to zero, a growable buffer is used instead, and there is no limit on memory consumed. This option has no effect in the monolithic Epos.

--variables count

Soft limit. Number of variables used in a set of rules.

Language Dependent Configuration

This subsection lists some options defined for each language. Additional language dependent options are certain directory and file names and possibly others. As every voice is associated with a single language (two voices may share a multilingual segment inventory if desired), every voice dependent option is also language dependent.

--name voicename

This option assigns a name to a newly created language. If there is no name specified, this option defaults to the configuration file name (from the last slash to the nearest dot) the configuration has been loaded from. The name is then used to refer to the language in TTSCP. The language name must begin with an alphabetical character and consist of alphanumerical characters (dashes and underscores are also allowed).

--voices list

Lists initially available voices. The parameter is a colon separated list of voice names. Every voice must have its associated .ini file; the name of the file is obtained by suffixing .ini to the voice name while the directory name matches the voice name and is located under the directory as determined by the per language (default) inv_dir option. The first voice listed will become the default voice for its language until switched. This option autoconcatenates.

--soft_options list

Lists available soft options as described in subsection soft options. This option autoconcatenates.

--fallback_mode mode

Not used under normal circumstances. Epos initialises its synthesis type dependent specific structures the first time it uses a voice. Should such an initialization fail due to the reason specified by this option, the current voice will be switched to the voice specified by the fallback_voice option and the initialisation will be retried. The mode can either be a TTSCP error code of the 4xx class, or a template mode identifier. In the former case, the fallback occurs only if the initialization fails with the specified TTSCP error code. For the latter case, the following modes have been defined:

0 fallbacks are disabled
1 fallbacks occur on all 4xx class errors
4 fallbacks occur with uninstalled voices (445) and network errors (47x)
7 fallbacks occur with network errors (47x)

--fallback_voice voicename

Voice to switch to if another voice fails to initialize.

--rules_file filename

The parameter is a filename. The file contains the transformational rules to be applied for this language.

--perm* set

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is a simple sequence of all characters belonging to this level of description. For example, perm@phones will list letters, numbers and other segmental symbols. Punctuation will be assigned to the higher level sets. The sets should be disjoint and only the characters actually processed by the rules should be listed here. The language independent built-in parser tries to resolve the most common ambiguities of Latin-based writing systems, like periods.

--perm_input* set

Additional permisible characters to the respective lists specified by the perm* options to be permitted within an input text in the initial parse. The characters will however not be permitted later in re-parses.

--perm_working* set

Additional permisible characters to the respective lists specified by the perm* options to be permitted during internal re-parses. The characters will however not be permitted in the initial parse of the input text.

--downgradables set

Additional permisible characters at the phone level. Whereas all of the perm* options must specify disjoint sets of characters for each language, this option typically consists of characters which are listed as permisible for higher levels than the phone level, too. The characters are parsed at the higher level if possible. If that would however constitute an empty suprasegmental unit (i.e. there are no preceding phones since the beginning of the text or since the last unit of the same or even higher level), the character will be parsed at the phone level at this particular occurence. This affects both the initial parse and later re-parses.

--charset charset

The character set to be used with the current language (especially in rules, voice configuration; also in text input and output through TTSCP). If an 8-bit encoding is not already known to Epos, it's loaded from a corresponding mapping file within the directory specified by the unimap_dir option. Note that this option doesn't affect the encoding of the configuration file in which it has been set itself; see also the Character Encoding subsection for more details.

Voice Dependent Configuration

--name voicename

This option assigns a name to a newly created voice. If there is no name specified, this option defaults to the configuration file name (from the last slash to the nearest dot) the configuration has been loaded from. The name is then used to refer to the voice in TTSCP. The voice name must begin with an alphabetical character and consist of alphanumerical characters (dashes and underscores are also allowed).

--type synthtype

The parameter, the speech synthesis type, is one of the following:

none voice is mute
internet voice uses a remote speech synthesizer using TCP/IP
lpc-int voice uses an LPC synthesizer (integer based)
lpc-float voice uses an LPC synthesizer (floating point based)
lpc-vq voice uses an LPC synthesizer (vector quantified)
tdp voice uses a time domain synthesizer
mbrola voice uses an external MBROLA synthesizer

This option may influence other voice dependent options quite significantly, as some of them are speech synthesis type dependent.

--location [[voice][.language]@]hostname[:port]

If this speech synthesis is of the internet type, this option can be used to set the hostname of the remote server. If the remote server is listening on a non-standard port number (currently the standard port is considered to be 8778), the host name may be followed by a colon and the port number requested. The desired remote voice and language may be optionally specified before the host name, separated with a @ character from the host name and with a dot from each other. If a language name is specified, while a voice name is not, the language name should be preceded with a dot. The defaults for voice, language, and port number are remote default voice, local current language, and 8778, respectively.

For other synthesis types, this is a directory name which holds inventory related files (in the "inv" subtree), and is subject to normal file naming conventions, as described in file naming.

--deadlock_timeout n

The value is a time interval in seconds. This option is meaningless for voices of type other then internet. For remote voices, if the remote server is successfully connected to, but doesn't send any TTSCP session header (nor anything else) to the local server acting as a TTSCP client, it is either severely misconfigured, overloaded, deadlocked (e.g. tries to use itself as a remote server for its current voice), or communicating over a congested, unreliable or slow network connection. A value of 0 is actually a very small positive value and negative values are not accepted.

--n_segs n

The total number of segments withing the segment inventory.

--models filename

The value is a file name. The file contains the segment inventory proper. Its format is speech synthesis type dependent.

--counts filename

The value is a file name. The file contains the lengths of individual segments in this segment inventory.

--dpt_file filename

The value is a file name. The file contains the symbolic segment names for user output; each consists of exactly three characters on a line (indented with spaces from the left if necessary). No blank lines nor comments are allowed. This file usually comes with a diphone inventory.

--codebook filename

The value is a file name. The file contains the code book for the vector quantified LPC speech synthesis (lpc-vq type only).

--snl_file filename

The value is a file name. The file contains phone boundary information for individual segments in this segment inventory. Each line of the file contains three space-separated items: segment number, relative position within the segment (valued from 1 to 1024, e.g. 512 is the middle of the segment) and the character (phone representation) which is to be associated with the position. Lines not conforming to this specification are ignored. Currently, at most one label may be indicated for one segment, but it would be easy to get rid of this limit (at a cost of a few extra processor instructions). See also label_phones option.

--init_f percentage

The auditory neutral integer value for the fundamental frequency. The typical value is 100.

--init_i percentage

The auditory neutral integer value for the volume. The typical value is 100.

--init_t percentage

The auditory neutral integer value for the prosodic duration of segments. The typical value is 100. Of course, it is segment length relative (some segments are longer than others), just as the init_f and init_i options are.

--channel channeltype

The parameter, the output channel type, is one of the following:

mono mono output signal
left stereo output signal, right channel is mute
right stereo output signal, left channel is mute
both stereo output signal, two identical channels

This option may be used to simulate a dialogue by assigning different output channels to different speakers.

--inv_sampling_rate Hz

The sampling rate this segment inventory has been recorded at. The algorithms we use also imply that we use the same frequency for the synthesis.

--sample_size bits

Number of bits per sample. If some kind of a stereo output is turned on, this option sets the number of bits per channel. Again, this is related to the quality of recording of the segment inventory. We currently only support 8 and 16 bits.

--sampa_alternate name

For MBROLA voices, you can indicate non-standard SAMPA variants here; the value of SAMPA means the standard SAMPA; for voices where there is no SAMPA notation fixed yet, a different name should be used. The alternative SAMPA mapping to Unicode will be loaded from file sampa-alt-name.txt where name is specified by this parameter, or sampa-std.txt if it is specified as SAMPA.

Prosody Generation

The options in this subsection control how the resulting per segment prosodic information is assembled from the prosodic adjustments done to the structural units by the rules. It actually controls the interpretation of those adjustments themselves.

--pros_weight* weight

The asterisk stands for a @-separated symbolic name of a linguistic description level, such as phone, syll or word. The parameter is an integer value primarily used for enabling (1) or disabling (0) certain levels of description when the total quantities for a segment are computed. It must however be understood that when some rules like smooth are applied, the prosodic values are distributed down to the target level of such a rule and cannot be distinguished anymore. If higher values than 1 are set, that will multiply the prosodic effect assigned to a level correspondingly.

--pros_eff_multiply_*

The asterisk stands for a single letter, f, i or t, that is, a prosodic quantity symbol. This option controls how are the prosodic values for individual levels of description combined. If it is off, they are summed up (taking the corresponding pros_neutral_* as the baseline); if it is on, they get multiplied with each other (again taking pros_neutral_* as the neutral value). See prosody modelling for more information and examples.

--pros_neutral_*

The asterisk stands for a single letter, f, i or t, that is, a prosodic quantity symbol. This option controls what prosodic value is to be considered neutral for use in Epos. The current configuration file use 100 and prosodic adjustments are essentially percentages, but a higher value could be used for more fine grained prosody control (provided the synthesis algorithms can take advantage of it).

File Naming

In Epos, most of the files ever opened are located in a single directory tree. This tree usually starts at /usr/local/share/epos, but it can be changed at configure time. For example, after issuing

cd src
./configure --prefix=/usr/local

and after recompiling and reinstalling Epos, the files will be searched under /usr/local/lib/epos. See configure --help for more details on configuring Epos. It is also possible to use the command line option base_dir at Epos startup to change the tree base without recompilation. Files of the same type -- and related to the same language or voice, if applicable -- are located in the same subdirectory by default. Thus, the path name actually used by Epos consists of the base directory path, the subdirectory (or directory for short) and a relative file name. This makes it possible to move either the whole configuration structure, or a specific part of it, or a single file to another place.

The relative file name may contain slashes (directory name separators). If they only occur in the middle of the name, the file name is still relative to the directory it would normally be located in. However, if the file name begins with a slash or with ./, the file is treated as absolute or relative to the current working directory of the Epos process, respectively. The second case is thus slightly unreliable, but the first one allows to place any file in an arbitrary directory. Likewise, if the directory name begins with a slash, it is not considered to be relative to the base directory.

Changing these options in run time does not immediately cause re-reading the renamed files. All of the options in this section are static unless stated otherwise.

--base_dir dirname

Only changeable on the command line. The value is an absolute directory name. This option can be used to change the location of the whole configuration structure, which can also be used for trying out Epos before installing it:

cd src
./eposd --base_dir ../cfg

--pseudo_root_dir dirname

The value is a directory name. Sets the path prepended to any file name referenced in the TTSCP stream command. This subtree cannot be escaped with cute parent-of-root paths, but you can use symlinks to arbitrary accessible parts of the kernel name space, again, without giving access to the rest of the file system. Write access to this subtree to any user except Epos effectively gives him the privilege to use the Epos file access rights anywhere in the system through creating a symlink to the absolute root directory. This option is not static.

--ini_dir dirname

The value is a directory name. Sets the path to language independent configuration files. This option is only changeable on the command line.

--fixed_ini_file filename

The value is a file name. This option can be used to change the file name of the fixed.ini file, which usually contains operating system independent, relatively fixed default global configuration. This option is only changeable on the command line.

--cfg_file filename

The value is a file name. This option can be used to change the file name of the epos.ini file.

--local_sound_device filename

The value is a file name. This option can be used to change the file name of the local sound card device. In many unices, setting this to /dev/dsp is the recommended way to actually hear Epos speak. Other unices don't have /dev/dsp, however. If the sound card has no file name at all, set this to the null device file name (that may be handled specially by the respective port of Epos).

--mbrola_binary filename

The file name of the mbrola executable file, either absolute or relative to the location of the voice configuration (which allows using different binaries with different voices, although this option is not voice dependent). Because of a limitation of the interface the binary is spawned by Epos every time it is needed and so this option is not static.

--input_file filename

The value is a file name. This option can be used to change the file name of the implicit input text used by the monolithic version of Epos. The value is language dependent.

--stddbg_file filename

The value is a file name. This option can be used to change the file name whereto various debugging output should be written. If not set at all, stdout will be used.

--stdshriek_file filename

The value is a file name. This option can be used to change the file name whereto output unrelated to the usual output should be printed. It includes especially error messages.

--rules_dir dirname

The language dependent value is a directory name. The directory shall contain the rules file.

--hash_dir dirname

The language dependent value is a directory name. The directory shall contain any dictionaries used by the rules.

--input_dir dirname

The language dependent value is a directory name. The directory shall contain the implicit input text file for the monolithic version of Epos.

--lang_base_dir dirname

The value is a global directory name, not a language dependent one. It serves as the base directory for looking up the newly constructed languages.

--voice_base_dir dirname

The value is a global directory name, not a language dependent one. It serves as the base directory for looking up the newly constructed voices. It is however only used for configuration files, not for inventories. A language dependent subdirectory name is appended to it.

--inv_base_dir dirname

The value is a global directory name, not a language dependent one. It serves as the base directory for looking up inventories and related data. A voice dependent subdirectory name (the location option) is appended to it.

--unimap_dir dirname

The value is global directory name. It serves as the base directory for looking up mappings between individual character sets and the Unicode and also between SAMPA notation and Unicode.

--ttscp_help_dir dirname

The value is a directory name. In this directory, TTSCP help files for individual TTSCP commands and other help topics are located. The contents of these files is sent to the TTSCP control connection in reply to a corresponding help command. This option is not static.

--wav_dir dirname

The value is a global directory name. Any waveform files created by Epos without explicit directory specification will be created in this directory. Applies only to the monolithic Epos.

Daemon Startup Options

All options in this subsection are static. They also usually have no effect if changed during run time; change them in the configuration files instead and request Epos reinitialization.

--preload_voices

When set, Epos tries to initialize the synthesizer configurations for all voices during the startup. This will cause unreachable remote voices and local voices without a speech inventory installed to disappear from the configuration. This option will cause a considerable increase in memory consumption or startup time in some cases.

--prefer_portaudio

This option has no effect unless Epos has been compiled with the --enable-portaudio=yes option (a configure option, not an Epos option). Epos normally supports an unlimited number of OSS sound cards, one of them representing the #localsound TTSCP output module. With this option, Epos will use the PortAudio library for #localsound output instead. Using this option has adverse impact on some functionality, such as the intr TTSCP command.

--daemon_log filename

The value is a file name. This option can be used to set the file where various information about the Epos process is recorded. At the moment that is of little practical use except for debugging.

--syslog

Log all TTSCP error messages with syslogd, if the syslog facility is available. Due to the internal design of Epos, some of these messages are never actually sent over TTSCP to anyone - for example, a fatal misconfiguration condition detected before the first client connects; but they're logged anyway.

--full_syslog

Log all TTSCP completion messages with syslogd, if the syslog facility is available, including 1xx and 2xx class messages.

--authpriv

Log all security relevant TTSCP completion messages with the facility authpriv instead of daemon. This includes messages concerning denial of access or incorrectly specified resource or password. In that case, the err message level is used instead of warn. Notice that network errors are not affected by this option.

--log_codes

When set, all TTSCP messages are preceded with their numeric codes as in TTSCP when logging using syslogd.

--server_pwd_file filename

The value is a file name. This option can be used to force the server to store its internal password to a file. This password can then be used for TTSCP authentication in order to issue restricted commands such as down. If the file can not be created, no error is reported.

--debug_password password

The value is a string. The string may be used instead of the server password. Use of this option usually makes TTSCP insecure and is discouraged.

--restr_file filename

The file named by this parameter provides access control in TTSCP to individual options. Its syntax is described above. This option can only be changed from the command line.

--listen_port port

The TCP port number where the daemon should be listening for incoming TTSCP connections. The daemon will check if no other service is already running on that port and refuses to run if the port is already occupied.

--local_only

When this option is set, the daemon accepts no new connections on network interfaces except the localhost one. This way, only clients running on the same machine can connect to the server. If this option is not set, the server accepts new connections on all available interfaces.

Debugging Options

Various kinds of debugging information can be printed by Epos to standard output. The amount of it is configurable. Most debugging information is printed throughout the code using the D_PRINT macro; other sources of debugging information are not discussed in this subsection. The D_PRINT macro takes three parameters: the severity level, the format string and additional parameters as implied by the format string. The semantics is fully analogous to the printf family of standard library functions with the additional condition that only information whose severity level is sufficiently high under current settings is printed. There is also a variant of the macro, DO_PRINT, which prints its message unconditionally. This is useful for temporary promotion of individual debugging messages without losing their standard severity levels. There is one more variant, DBG, which can be used for debugging printouts which are not effectively handled by the printf-style function like dumping of arrays.

Note also that #define DEBUGGING must be enabled in interf.h, else the debugging macros are ignored altogether.

The severity level may have one of these values:


level debugging messages easy invocation typical examples
3 rare -D major events, warnings
2 normal -DD informative messages
1 verbose -DDD detailed debugging printouts
0 detailed -DDDD miscellaneous chaos
Severity levels

Both options in this subsection are static.

--debug

This option must be on to provide any debugging information (except for daemon activity logging controlled by the daemon_log option and syslog logging.

--debug_level

The minimum severity level of debugging messages which should be printed.


Next Previous Contents