In Epos, nearly all of the TTS processing is controlled by a rule file;
there is one rule file per language and it usually has the .rul
suffix. The rule file for the German language, for instance, resides
by default in lng/german/german.rul
. The rules may also slightly
vary for the individual voices using the
soft options.
The text being processed by Epos is internally stored in a multi-level data
structure suitable for the application of transformational rules. Every phonetic
unit (or an approximation of one) is represented by a single node in the
structure. The nodes are organized into layers corresponding to linguistic
levels of description, such that a unit of level n
can list its
immediate constituents, that is units of level n-1
. Every layer
also has a symbolic name, which is used to refer to it in the rules.
The number and symbolic names of individual levels can be specified
with the unit_levels
option before the languages are defined.
An example is given in a table.
Level name | written TSR semantics | spoken TSR semantics |
text | the whole text | the whole text |
sent | sentence construction | terminated utterance |
colon | sentence/clause/colon | intonational unit |
word | word | stress unit |
syll | word | syllable |
phone | letter | sound |
segment | segment | |
Every unit, be it segmental level or not, may contain a character. The TSR, as generated by the text parser, contains the appropriate punctuation at suprasegmental levels (that is, levels above the phone level): spaces at the word level, commas at the intonational unit level, periods, question marks and such will become the contents of a sentence (terminated utterance) level. Some suprasegmental units will have no content, because they have been delimited only implicitly; for example, a colon-final word has been delimited by a comma, but the comma is actually a colon level symbol: the last word will have no content. This content may be modified by the rules and actually, it often is. This allows marking up a unit for a later use (changing its content into an arbitrary character, such as a digit or anything else, then applying some rules only within units having this contents using a rule of type inside.)
The rules are applied sequentially, unless stated otherwise. Each rule operates units of a certain level within a unit of some other level; for instance, a rule may assimilate phones within a word, another rule may change the syllabic prosody within a colon. The smaller units being manipulated are called target units, the larger unit is referred to as a scope unit; the respective levels are called scope and target. Each scope unit is always processed separately (from any other scope units) as if no other text ever existed. For example, if the scope of some assimilation happens to be "word", every word will have the rule applied in isolation and the assimilation will never apply across the word boundary nor will be able to distiguish a word boundary from sentence boundary.
Any line of the rules file may contain at most one rule and possibly some comment. The rule begins with an operation code specifier (what to do), followed by the parameter (one word, opcode specific), and possibly by scope and target specification, if the defaults (usually word and phone, respectively) are not suitable.
The scope and the target can be one of the
available levels of linguistic description as defined
with the unit_levels
option. If target or even scope
for a rule is not specified, the default_target
or
default_scope
option value, respectively, will be used.
The typical defaults are phone
and word
, respectively.
Every rule is evaluated within certain unit, and the scope specifies, what kind of unit it should be. The meaning of the target is somewhat opcode specific, but generally, this is the level which is affected by that rule, or the lowest level affected by that rule within the scope. See the individual rule descriptions in this section in conjunction with the real world rule files for exact interpretation of the target level.
The code, scope and target identifier is not case sensitive, but the parameter usually is.
As you sometimes need different character encodings for different languages, there is this mechanism for switching character encodings in text files including rule files and dictionaries.
You can use the backslash to escape any special character including the backslash itself anywhere in the rules just as in the configuration files. See the corresponding section for details.
Notice especially the possibility of referring to several internal pseudocharacters with the nice property that they can never be found in the input text and therefore are suitable for temporary markers of all kinds in the rules. See the raise rule example.
In addition, special characters listed in the table of escape sequences can be inserted using the same mechanism.
@include
Directive
Any text starting with a semicolon or #
not in the middle of a word up to the
end of the line is a comment. It will be properly ignored. If a line
doesn't contain anything except whitespace and/or comment, it is also
ignored. The @include
directive can be used to nest the rule
files. The same rules apply within .ini
files; for more
details, see
the @include directive in configuration files.
A line which doesn't contain a rule may contain a macro definition instead.
It is specified as identifier = replacement
, for example,
$vowel = aeiouy
Alternatively, the keyword external
may follow an identifier instead of
the equality sign and the replacement:
$some_pathname external
This way the macro identifier is assigned the value of its corresponding configuration parameter (for the current language if possible).
The macros will get expanded anywhere where they occur except for their own
point of definition. Therefore, $vowel $short$long
will be a valid macro
definition, provided that $short
and $long
have already been defined. The
expansion is performed at the definition time and it is not iterated, because
the replacement is not expected to contain the dollar sign.
Macros can later be redefined if you wish and they can be local to a block of rules as described below.
If there be any uncertainty concerning the exact length of the identifier,
you can use braces to delimit it: ${name}
is usually equal to $name
, but
$nameaeiou
is not equal to ${name}aeiou
. It is also possible to use
a colon or an ampersand as a delimiter: $name&aeiou
.
It is a good practice to use macros extensively for classes of symbols so that the same sets and subsets of characters are listed only once in the rules and therefore are kept consistent throughout. The exact values of the macros are however always language specific and so Epos doesn't specify any built-in macros. If any macros are used in the examples for specific rules below, reasonable definitions of the macros are assumed to precede the rule.
For an abundance of examples see existing rule files.
!
) Operator
Whenever an unordered list of tokens should be specified within the parameter
to some rule (use common sense and/or individual rule descriptions above),
you can also make negative specifications, such as "all consonants except
l and r". To do this, use the exclamation mark serving as an "except" operator:
$consonants!lr
(The right operand is subtracted from the left one.)
If there is no left operand, say in !x
, the semantics is "all but x".
A consequence is that !
alone means "everything".
The operator is right-associative; !$vowels!ou
means "all excluding vowels,
but o
and u
don't count as vowels just now". Therefore,
o
and u
are included in this unordered list.
This operator never works for ordered lists, not even
for the syll
rule sonority groups. But there is
a similar usage associated with rule types if
, with
,
prep
and postp
, where the exclamation mark can be
used to negate the condition; see the respective rule types.
The rule types described in this subsection operate in some way on a list of words (or other strings), which can range from a few items up to machine-generated megabytes of data. These strings are usually listed in a separate file, while the parameter of such a rule is the file name. Alternatively, the strings can be quoted inside the rule file, especially if only a few ones are listed. Such a collection of strings is called a dictionary and obeys the same format for any rule type which needs external data (except for the neural networks).
The dictionary consists of multiple lines, each of which contains a single dictionary item. An item consists of two whitespace separated words, the former being the item itself, the latter being some string associated with the item. Often, the second string is used to replace every occurrence of the first string in the text being processed. That's why the strings are called replacee and replacer, respectively. The order of dictionary items is not significant.
We use adaptive hash tables -- and balanced optionally bounded depth AVL trees for collisions -- for representation of the dictionary in memory to achieve instant lookups of any item, even in a huge dictionary.
The replacee cannot contain whitespace (unless escaped with a backslash), but the replacer can. That is, if more than two words are found on a line, the first one is the replacee and the rest of the line, except for any post-replacee and/or trailing whitespace, becomes the replacer. However, some rule types may not allow multiple word replacers.
The dictionaries follow the same conventions for character encoding, escaping special characters, inclusion directives and comments as the rule files and other text files.
Instead of a file name reference, it is possible to quote the contents of the dictionary directly; this is done by encapsulating the contents in double quotes. Dictionary items are in this case whitespace-separated, every replacer and replacee are separated by a comma.
The dictionary may either be parsed and loaded into memory at Epos startup
or at the moment of the first use. The former option's advantage is
early error reporting, while the latter can sometimes completely avoid
loading a huge unused dictionary. Use the option paranoid
to choose
your preference.
subst
Substring substitution. The replacers replace every occurrence of their respective replacees; longer matches are matched first; the process is iterated until no replacee occurs in the string. It there is a tie between several matches of equal length, the rightmost match is chosen.
It is required
either to have a phone
target, or to keep all the replacers
and replacees the same length (because of the descendants
of the units affected). Note also that to be considered a match
in the former case (target phone
), all characters other than phones also
have to match (must be found or not found on the same positions in both
the replacee and the occurrence in question). The only exception is
the terminating scope-level separator (if any), which is ignored and
preserved.
Any replacer may begin with a ^ or end with a $. That forces the substring being replaced to be at the beginning or the end of the scope unit, respectively. This ^ or $ also counts as a character when determining the longest match.
The replacer should not contain units of the scope level or higher.
Unless the paranoid
option is set, this is tolerated, but the
replacer is truncated at the first of such characters.
With the phone
target, this rule type will drop the
internal structure of the replaced text as soon as a match is found.
In other words: an affected scope unit with a replacer is re-parsed as
any other plain text. With any other target the original structure
is always kept.
Infinitely looping substitutions are currently reported as an error condition.
As this rule type should not be used for trivial tasks with short and often matching dictionaries, the example we shall now give is somewhat involved:
< word syll
regress \ >m(!_!) word word
regress \ >d(!_!) word word
regress \ >t(!_!) word word
regress \ >q(!_!) word word
regress \ >p(!_!) word word
>
subst "^mmmmm,mmmxm ^mmmmmm,mmmxmm ^mmmmmmm,mmmmxmm \
pmm,pxm qmmm,qxmm pmmm,pxmm tmmmm,tmmxm qmmmm,qxmxm \
dmmmmm,dmmxmm tmmmmm,tmmxmm qmmmmm,qxmxmm pmmmmm,pxmmxm \
dmmmmmm,dmmxmmm tmmmmmm,tmmxmmm qmmmmmm,qxmmxmm \
pmmmmmm,pxmmxmm dmmmmmmm,dmmxmmxm tmmmmmmm,tmmxmmxm \
qmmmmmmm,qxmmmxmm pmmmmmmm,pxmmmxmm \
mmmmmmmm,mmmmxmmm" colon word
postp "m" word word
The purpose of this example sequence of rules is to form stress units out of graphical words, based on the following assumptions for the given language: polysyllables are retained as stress units, but following monosyllables may be merged to them; monosyllables which are colon-final should be retained; other monosyllables may merge to each other and/or to the preceding polysyllable; the merges should not produce too long stress units.
The first part of the example is used to mark all non-colon-final
(more exactly: space delimited) words with the letters m,d,t,q,p
based on the number of syllables; note that the p
is used
not only for pentasyllables, but also for all words of more than five
syllables. Then the substition rule is used to relabel some
monosyllables (destined as heads of stress units consisting solely
of monosyllables) with x
. Finally, all monosyllables that
haven't been relabeled to x
are merged to the preceding
stress word if there is any using the postp
rule.
The substitution rule in the example has 21 dictionary items,
the first three being applicable only at the colon-initial position.
Mostly it directly lists the resulting labeling for the whole
colon, but with extremely long sequences of monosyllables it
relies on the facts that the longest matching replacee (and the
rightmost one if there are multiple) is chosen and that the
substitution process is iterated. For example,
pmmmmmmmmm
would be first relabeled to pmmmmmxmmm
using the last item as listed in the dictionary and then once more
using a different item to pxmmxmxmmm
.
prep
Preposition. If the scope unit is identical to some replacee,
it gets replaced with its respective replacer and merged to its right-hand
neighbor. If there is no such neighbor, nothing happens. As with the
subst rule, the target
must currently be phone
or all the replacers of sizes corresponding
to their respective replacees.
Let us take a typical example:
prep preps.dic
where the referenced file contains a list of prepositions for the language, e.g. for Czech:
bez
do
k
ke
ku
na
nad
o
od
po
pod
pro
pRed
pRes pRez
s
u
v
ve
z
za
ze
You can see that most of the prepositions have the replacers
identical to the replacees, so that the preposition doesn't
change except for being merged to the left if found.
There is however one irregular monosyllabic preposition
in Czech which does change its behavior with regard to
the voicing assimilation in Czech, and this can be done
too as shown. Notice also that the unit
(here: the word) must match the dictionary item
exactly, as opposed to mere substring matching required
by the subst
rule.
As a special case, if the parameter begins with an exclamation mark, then the rest of the parameter is parsed as usual and any substitutions are performed exactly as usual, but the scope units which get finally merged to their respective right hand neighbors are exactly those which are not found in the dictionary.
A typical example can be the following rule whose purpose is to abolish all syllable boundaries (within each word). The rule defines an empty dictionary and then merges each word which is not found in the dictionary and which has something to be merged to.
prep !"" syll
postp
Postposition. See rule type prep for the description and examples, but the resultant unit is merged to its left-hand neighbor instead of the right-hand neighbor.
analyze
This rule type analyzes a unit of level immediately below the scope level into a sequence of units based on a dictionary of known contents of the new units at the target level and priorities assigned to them. We will explain the operation of this rule in terms of the morphematic analysis, i.e. the most common use with the scope level being the word, the result of the analysis will be the morpheme level (just below the word level) and the target will be phones.
Each item of the dictionary corresponds to a single morpheme (some linguists would prefer to say "morph" here). The replacee is the form of the morpheme expected withing the word, the replacer is a numeric value which expresses the "badness" of this particular string.
In addition to these values, the dictionary must also
include two additional items, !META_unanal_unit_penalty
and !META_unanal_part_penalty
. These serve as
global parameters of the analysis process.
The rule will split every affected word into morphemes
so as to minimize the sum of badnesses of each of the new
morphemes. Each such possible analysis may contain
parts (morphemes) which have been found in the dictionary
and which incur the badness specified there, and also
parts which failed to be found in the dictinary. For
each part not found, both a per letter penalty
(as specified with !META_unanal_unit_penalty
)
and a per part penalty (as specified with
!META_unanal_part_penalty
) is added to the total
badness.
If two alternative analyses are available with the same total badness, the first (leftmost) part which is not of identical length in both is considered and the analysis with the longer one is chosen.
Usually the global parameters are set so high that the algorithm will resort to an analysis to morphemes found in the dictionary whenever at least one is possible, and also to one which consists of a smaller number of morphemes and also one which avoids morphemes, which are known to the dictionary, but repelled by larger badness values.
An example fragment of the dictionary file may look like this:
!META_unanal_unit_penalty 100
!META_unanal_part_penalty 150
cow 5
cows 5
s 5
slip 5
lip 5
In this case, the word cowslip
will not be analysed
as cow-s-lip
, because that would yield badness 15,
while there are two analyses with badness 10, namely
cows-lip
and cow-slip
. Here the first (and wrong) one
will be taken due to the tie breaking rule, which will
probably bring about voiced pronunciation of the fricative.
Several alternative changes to the dictionary can be proposed
to avoid the misguided morpheme boundary inside slip
.
No cows
in the dictionary, an explicit cowslip
item
with badness less than 10, or decreasing the badness of
slip
can serve as examples.
In the unchanged case, the word gas
will be analysed
as ga-s
with total badness 355, as the only alternative
analysis gas
has total badness 450: that is, one unanalysed
part consisting of three unanalysed units (letters).
A solution would be, for example, to increase the badness of the
miniature morpheme s
to a value somewhere between 101 and 249;
with this adjustment it will still be cheaper than an isolated
unanalysed letter s
in an otherwise known context,
but it will not be recognized at the border of an unanalysed context.
prosody
This rule type is a prosody modeling rule which uses a dictionary of prosodic adjustments to be applied. More details below.
segments
You don't want to read about this rule type, unless you are preparing a new voice for a synthesizer with the traditional segment-level interface based on a newly structured speech segment inventory.
Setup the segment layer below the phone layer. The parameter names a file, which contains phone to segment mappings, again in the dictionary format. The replacees each represent a three character segment identifier, the replacers are the respective segment codes (decimal). It is possible, and indeed typical to include multiple identifiers for the same segment number.
The middle character denotes the phone the resulting segment will be assigned to. The left hand and right hand characters may either be a question mark, or they may specify the right hand and/or left hand neighbors to match a specific character. The question mark is therefore a kind of wildcard.
If both fully specified and partly specified segments exist for
a given triplet of phones, they will be placed from left to right
in this order: lt?, ?t?, ?tr, ltr
.
A sentence may contain these segment with the Czech diphone inventory by Tomas Dubeda:
p l o u t e f
0p? pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0
or, with the traditional Czech segment inventory:
p l o u t e f
0p? ?pl pl? ?lo ?o? ou? ?u? ut? ?te ?e? ef? ?f0
(In this second example, for instance the diphones ?pl
and ?pt
would actually share the segment number and
would correspond to the p-any consonant
diphone.)
There are more possibilities for representing a segment inventory; it is necessary to decide for the major diphone types, whether they should live in their initial or final sound. That is unfortunate, but it is the way it is.
It is possible to repeat a segment a few times. This effect can be controlled by adding 10000 times the number of extra repetitions to the segment number. Therefore,
?e? 20241
generates three identical segments number 241 for the stationary
part of the specified vowel.
with
This is actually a conditional rule, though it also uses a dictionary. It applies an arbitrary rule upon the units (words) listed in the dictionary. More details below.
The contentual rules manipulate unit contents. That is, they're suitable
for implementation of more regular letter-to-sound rules, character replacement
and other transformations. They are a magnitude faster than e.g. the more
general (and more heavy weight) subst
rule, so they should be used
whenever possible.
regress
Assimilation, elision or other mutation of phones or other units
depending on their immediate environment. The parameter is of the form
o>n(l_r)
, where o,n,l,r are arbitrary strings. The semantic is "change tokens
in o
to their corresponding tokens in n
whenever the left neighbor is in l
and right one is in r
". The first two strings should therefore either be of
equal length, or n
should be a single character, with the obvious
interpretations of "corresponding".
The zero character (0
0) may be included in any of the strings; it means
"no element", and it can be used to insert new units, delete the old ones,
and to limit the change to the beginning or the end of the scope unit,
respectively. On the other hand, if the contents of some unit is literal 0
before the application of this rule, it will stay untouched. This special
meaning of 0
with this rule type can be suppressed by escaping.
Examples:
regress 0>'(0_aeiou) word phone
inserts the apostrophe before the vowels listed at the beginning of a word.
regress $voiceless>$voiced(!_$voiced) word phone
assimilates voiceless consonants to their voiced counterparts (assuming
$voiced
and $voiceless
have been defined previously), when they're followed
by a voiced consonant. The change proceeds from the right to the left,
therefore ppb
will change to bbb
. See
below
for the explanation of the exclamation mark (here: "everywhere").
progress
As above, but the change proceeds from left to right. In the second
example for the regress
rule, the result would be pbb
if progress
was employed.
The structural rules can be used to restructuralize the text. They usually interact with multiple levels of description simultaneously.
raise
Move a unit to a higher level of description, e.g. when a segment
level unit should directly affect the prosody. The parameter is of the form
from:to
(from
and to
are arbitrary strings,
and they can employ the
except operator (exclamation mark).
The tokens in from
, if found at the target
level, are copied to the scope level, if the original scope token is listed
in to
. It is also possible to omit the colon and the to
string; the default
interpretation is "everywhere".
This rule is usually found as a link between rules operating on
different levels. For example, suppose we want to split every
colon before any occurence of one of the words nebo
and anebo
:
with "nebo anebo" word
{
regress 0>\X(0_!)
}
raise \X:! word phone
syll \X<\ _ colon word
regress \X>\ (!_!) colon word
regress \X>0(!_!)
Having inserted an internal pseudocharacter \X
at the phone
level at the beginning of each of the words listed in the dictionary
used by the with
rule, we raise this pseudocharacter to the
word level and treat it as the least "sonorous" element with the
following "syllabification" (splitting) rule. The last two rules
perform a simple clean-up - they change all word level occurences
of the pseudocharacter to a space and delete all phone level occurences
thereof.
syll
Roughly speaking, this rule type can be used to split words into syllables according to the theory of sonority, i.e. at the least sonorous phones.
More generally, it is used to do any sort of inserting unit boundaries depending on local extremes of a simple metric defined at target units. A split occurs at the scope level unit, and, whenever necessary, at all levels between the scope and the target units.
The parameter is an ordering of the target units (typically, phones),
starting from the extremal (least sonorous) ones, with groups of
equal status (equal sonority) delimited by <
Example:
syll 0<ptkf<bdgv<mnN<lry<aeiou" syll phone
inserts the following (and other) syllable boundaries:
a|pa ap|pa ap|ppppa arp|pa ar|pra a|pr|pa
Tokens not listed are considered least sonorous, order of tokens within the same sonority group (see the example) is irrelevant. It is not possible to use the except operator with this rule type.
As you can see from the example, the syllable boundaries are inserted exactly once per every sequence of equivalent target units (e.g. equisonorous phones) such that both preceding and following target units of the group have higher sonority, and they're inserted either between the first and second element of the group, or, if the group consists of a single unit, before that unit.
This semantics is suitable for the syllabification task in all languages known to us where syllabification is not primarily morphologically based, but this rule type can also be used for other tasks involving a unit split as some point defined by its contents, e.g. splitting a higher level prosodic unit before or after certain words, as shown in the example to the raise rule. The authors are eager to hear from you if you'd prefer an extension or simplification of this rule type or if you can comment on automated syllabification issues over a wide range of languages.
The utterance prosody is modeled in Epos by assigning values for the following prosodic quantities of individual text structure units (possibly at multiple levels of description):
Currently, these are values per cent, 100 being the neutral value.
Epos doesn't currently provide sets of segment inventories for multiple pitch ranges, therefore extreme values, such as 15 or 1500 may sound very unnatural.The prosody adjustments at different levels sum up for the actual values assigned to the generated segments. For example, a phone with the frequency (pitch) value of 130 in a word with the value of 120 will contain segments (after the
segments
rule is applied) with
frequency of 150. Alternatively, it is possible to multiply
the values for pitch, volume and duration instead, by setting the
pros_eff_multiply_f
, pros_eff_multiply_i
and
pros_eff_multiply_t
options, respectively.
It is also possible to change the neutral value of 100
to a different base value with the f_neutral
, i_neutral
and t_neutral
options.
contour
This rule assigns a specified prosody contour to units at some level of description within a unit which consists of them. For example, the rule can be used to assign pitch contours to stress units; individual values will probably be assigned to syllables.
The parameter describes a single prosody contour. The first letter denotes the prosodic quantity (frequency, intensity or duration) to be specified; the second is a slash; the adjustments follow as colon-separated decimal integers. For an example,
contour f/+2:+0:-2 word syll
assigns a falling pitch contour to a trisyllabic word. The number
of syllables in a word, or, more generally, of the target units
in a scope unit, must match the number of adjustments specified
in a contour
rule, otherwise an error occurs; consider
the
length-based selection of rules
to ensure that. As an exception to that, it is possible
to specify padding in the contour. At most one
adjustment may be immediately followed by an asterisk. This
adjustment will be used for zero or more consecutive target
units as necessary to stretch the contour over the scope unit.
prosody
Individual prosodic feature generation. (See also the contour rule for assigning whole contours more conveniently.)
Typically, there will be many instances of this rule in the rules file, each of which will use a different configuration file for different purpose (e.g. one may handle word stress, another one the sentence-final melody of wh- questions, another one semantic emphasis corresponding to an exclamation mark). The parameter of this rule is the name of a file formatted as a dictionary (see dictionary-oriented rules) and is further specified here.
Each prosodic adjustment occupies one line; it affects exactly one of frequency, intensity and duration (T, I, or F, respectively) of units positioned among others as specified. Their ordering is insignificant, because each of them affects different units or a different quantity of them.
The structure of an adjustment is very simple, so let's just
pick an example: i/3:4 -20
. The first letter must be one
of T, I, F and specifies the quantity that may be adjusted;
the first number specified denotes the position within a unit
whose length is to be equal to the second number: here, the
rule applies at every third syllable of every tetrasyllable,
provided that the target of the rule is syllable, while
the scope is word (this is specified in the rules file as
usual, not in the prosody file). The last number, separated
by whitespace, is the intensity adjustment to be added
everywhere this specification applies. It is an integer value.
It is also possible to have an adjustment applied for any
length of the scope unit (in the example above, for words
of any number of syllables. To do this, use "*" as the
second number of the adjustment. Also, it may make sense
to count the target unit starting at the end of the scope
unit; in this case append the word "last" to the first number.
An example could be f/1last:* -30
, or "drop the pitch by 30
for last syllables of every word". Consequently, at most three
distinct rules may affect a unit; if that happens, only one is
chosen -- the more specific one, or, if both contain the
asterisk, the one counting from the beginning is chosen.
An example, in order of decreasing precedence:
t/1:2 +30
t/1:* +20
t/2last:* +5
You can therefore override general adjustments with exceptions for some lengths which have to be handled separately.
If multiple prosodic rules (using their own files) supply adjustments for a certain unit, the adjustments are summed.
It is important to understand the difference between e.g. a syllable and its phones: the syllable can have an entirely different prosodic value than its phones; for every given segment, the value for any prosodic quantity is obtained by totalling the values for all of higher levels units it is contained in. This independence of levels of description might theoretically be useful for modeling tone languages.
smooth
Smoothing out of one of the F,I,T quantities. The parameter is
quantity/left_weights/base_weight\right_weights
where the left_weights
,
if there are multiple ones, shall be slash separated, the right_weights
shall
be backslash separated. The new value of the quantity specified for any
target is computed as a weighted average of the values for the surrounding
units at the same level. If the target is too near to the scope boundary
to have enough neighbors in some direction, the value for the last unit
in that direction instead.
Example:
smooth i/10/20/40\20\10 word syll
applied to the second word un-ne-ce-ssa-ry
will adjust intensity values
for all of the syllables. E.g. the second syllable will be computed as
0.3 x i("un
") + 0.4 x i("ne
") + 0.2 x i("ce
") + 0.1 x i("ssa
")
The computations for different units do not interfere. The weights can also be specified as negative quantities and/or as sums of more values. This permits linear parameterization of the rules.
The smooth
rule has also an unavoidable side effect. If (some of)
the prosodic adjustments are assigned at the word level, for example, and smoothing
should take place at the syllable level, it is first necessary to move
the prosodic information down to the syllable level. It is done by adding
the quantity found at the word level to every contained syllable and by
removing it from the word level at all. The unit::project
method
is responsible for that; it is called before the actual smoothing.
Prosodic adjustments existing at lower levels than is the one being smoothened
are ignored by the smooth
rule.
Multiple rules are occasionally necessary where there are syntactical placeholders for a single rule only. Or, several rules have to be grouped in a certain way -- for example, when one rule has to be chosen nondeterministically out of a set of rules. To satisfy these needs, Epos rules include three types of composite rules with different semantics. A composite rule is syntactically treated a single rule.
A block is a sequence of rules enclosed within braces ("{
" and "}
").
Both the opening and the closing brace follow the rule syntax, but
they take no parameters except for an optional scope specification.
The block is treated as a single rule, which is useful especially
with conditional rules:
if condition
{
do this
do that
}
The rules are applied sequentially, as you would expect, for every
unit of the proper size as given by the scope of the opening brace.
This means that every word (if the scope is word
) is processed
separately throughout all the rules in the block. This involves
some splitting of execution on entering the block. By default, no
such splitting is done and the block inherits its scope from its
master rule (a conditional rule, a block it is encapsulated in,
or the global implicit block which covers all the rules altogether).
Consequently, the scope of any enclosed rule may not be larger
than the scope of the block.
Any macros defined in the block are local to the block. The semantic details are C-like and are by no means important.
A choice is a sequence of rules enclosed within brackets ("[
" and "]
").
Both the opening and the closing bracket follow the rule syntax, but
they take no parameters except for possible scope specification.
The choice is treated as a single rule.
Whenever the choice is applied, one of its subordinate rules is chosen at random for every unit of the proper size as given by the scope of the opening brace, and only this rule is applied.
Generally, choices behave like blocks; the main difference is that with blocks, all of the rules are applied, whereas with choices, exactly one of them gets applied (possibly different rules for different pieces of the text processed).
Empty choices (with no rules within) are not tolerated, contrary to empty blocks.
A (length-based) switch is a sequence of rules enclosed within angle
brackets ("<
" and ">
"). Both the opening and the closing bracket follow
the rule syntax, but they take no parameters except for possible scope
and target specification. The switch is treated as a single rule.
Whenever the switch is applied to a scope unit, target units contained
within are counted. If n
units are found, the n
-th rule in sequence of
the subordinate rule is applied.
If there is less than n
rules available, the last one will be used.
You can avoid this behavior by specifying "nothing" after the last rule.
An example is supplied inside the example for the subst rule.
Write "3x
" before a rule to repeat it three times (in a block)
or to make it three times more probable (in a choice):
[
3x prosody typical.dic
prosody variant.dic
]
(The first alternative now has 75% of being chosen, while the other one is left for the remaining 25%.)
The repeat count must be a positive integer. You can not use this feature just after conditional rules, because repeated rules are not counted as a single rule for syntactic purposes:
if $something
2x regress 0>x(!_!) #...wrong!
You should rewrite this to
if $something
{
2x regress 0>x(!_!)
}
Huge integers (like one million) are disallowed. This is because the current implementation needs a few bytes of memory (one pointer) per every repetition.
The conditional rules execute the following rule if and only if a condition is met. The condition is specified as the parameter, the following (conditioned) rule is given on a separate line (or lines, if e.g. a composite rule follows). (Comments, whitespace and empty lines may intervene as usual.) It is not syntactically necessary to indent the conditioned rules with whitespace, but it is strongly recommended for readability.
The conditioned rule is syntactically considered to be a part of the conditional rule.
inside
Apply a rule or a block of rules within certain units only. The parameter is a list of values at the scope level, wherein the following rule should be applied; the except operator may be used.
Every unit (a sentence, for example), which fulfills the criterion,
is processed separately, therefore the scope of the following rule may
be at most that of the inside
rule itself.
Example:
if phr_break
{
regress 0>\#(!_0) colon
inside \# phone
{
contour t/-65 phone
}
}
This example takes action only if the phr_break
variable
is set. The action is to insert a hash character (representing
a pause) to the phone level at the end of every colon, and to
affect pro prosodic values of the new character, so that the pause
is sufficiently short. Notice the necessary escaping of the hash
character so as not to confuse it with a comment-out character.
near
Apply a rule or a block of rules within units which contain at least one of the specified units. The parameter is a list of values at the target level, which are looked up in a unit of the scope level; the except operator may be used. If an occurence is found, the following rule gets applied to the scope level unit.
If the parameter begins with an asterisk, the asterisk is treated as an except operator and the test is negated. In other words, the following rule gets applied, if every target level unit contained meets the set description with the leading asterisk ignored. You can combine asterisk and an extra except operator to get tests of the "contains no characters of this class" type.
Example:
near *!$vowel
{
regress $lowercase>$uppercase(!_!)
subst spellout.dic
}
A fragment of this kind can be used to spell out all
words which contain no vowels (and are thus supposedly
unpronounceable). The referenced dictionary spellout.dic
should contain the spelled out equivalents for each upper case
letter. The shift of the word to the upper case may look
puzzling, but it is actually only a technical trick to
prevent the spell-out phrases (which are supposedly listed
in lower case) to be spelled out themselves.
Likewise,
near *$vowel
operates only on words consisting solely of vowels;
near $vowel
operates on words which contain at least one vowel and
near !$vowel
operates on words which contain at least one non-vowel.
with
Apply a rule or a block of rules for listed units. In contrast with the preceding rule type, this refers not only to the token at the scope level (such as space), but to the whole structure (such as the string of phones delimited by the space).
The parameter is a dictionary filename or a quoted dictionary; it should list the strings subject to the following rule, such as special words. All the details concerning the syntax of the parameter are exactly the same as with other dictionary oriented rules and a simple example is given at the raise rule.
(Advanced users: replacers can be specified in the dictionary and they will be used to replace the replacee as with any other dictionary-oriented rule, but the replacement process will not be iterated.)
The parameter can optionally be prefixed by an exclamation mark, in which case the subordinate rule will be applied exactly to those units which did not match instead of those which did.
An example of how to apply a block of rules to all words except the words "exception" and "resistant":
with !"exception resistant" word
{
...
}
if
Apply a rule or a block of rules only if a condition (given by the parameter) is met. The condition must currently be specified as a boolean voice configuration option (possibly a soft option) or its negation (i.e. prefixed with an exclamation mark).
Example:
if !colloquial
{
...
}
The rules within the block will be applied only if the colloquial option is not set.
This if
rule inherits its scope from its parent rule
if not specified explicitly.
Again, the scope of a subordinate rule may not be larger than that of
the if
rule itself.
regex
Regular expression substitution. The parameter is of the form
/regular_expression/replacement/
. This rule type is similar to subst
with only one dictionary item, but it is way more powerful and more arcane;
its use is not intended for end wizards nor trivial tasks.
For a regular expressions' overview, UNIX users can consult e.g. the grep
manual page, whereas Windows users can telnet to a nearby UNIX machine and
write man grep
there.
Epos uses the extended regular expression syntax with the following difference:
in "regular" regular expressions, parentheses match themselves, while
the open group and close group operators are \(
and \)
, respectively.
As we use groups heavily and next to none real parentheses, we decided
to do it the other way round. Also, sed
users may be surprised
by the iterative behavior of the regex
rule type in Epos.
The replacement may contain escape sequences referring to the match of
the n
-th group within the regular expression: \1
to \9
.
\0
represents the entire match, but this is probably unusable under the
current design, as this would cause an infinite substitution loop.
In order to use this type of rule, you need to have the rx
or regex
library already installed and have WANT_REGEX
enabled in common.h
.
This is because we don't actually implement the regex parsing stuff; we leave it
to your OS libraries. In case you don't have such libraries installed, we use
the glibc implementation (rx.c
in the Epos distribution).
Note that if your system doesn't support locale setting nor provides
a usable regex library, you can't use named character classes such
as [:upper:]
in your regular expressions. This is the case
on Windows CE.
debug
Debugging information during the application of the rules. Scope and target are ignored, the parameter is parsed lazily.
Parameter "elem
": dump the current state of the text being processed
Parameter "pause
": wait until keypress