Festival supports a number of other synthesis systems
A very simple, and very efficient LPC diphone synthesizer using the "donovan" diphones is also supported. This synthesis method is primarily the work of Steve Isard and later Alistair Conkie. The synthesis quality is not as good as the residual excited LPC diphone synthesizer but has the advantage of being much smaller. The donovan diphone database is under 800k.
The diphones are loaded through the Donovan_Init
function
which takes the name of the dictionary file and the diphone file
as arguments, see the following for details
lib/voices/english/don_diphone/festvox/don_diphone.scm
As an example of how Festival may use a completely external synthesis method we support the free system MBROLA. MBROLA is both a diphone synthesis technique and an actual system that constructs waveforms from segment, duration and F0 target information. For details see the MBROLA home page at `http://tcts.fpms.ac.be/synthesis/mbrola.html'. MBROLA already supports a number of diphone sets including French, Spanish, German and Romanian.
Festival support for MBROLA is in the file `lib/mbrola.scm'.
It is all in Scheme. The function MBROLA_Synth
is called
when parameter Synth_Method
is MBROLA
. The
function simply saves the segment, duration and target information
from the utterance, calls the external `mbrola' program with the
selected diphone database, and reloads the generated waveform
back into the utterance.
An MBROLA-ized version of the Roger diphoneset is available from the MBROLA site. The simple Festival end is distributed as part of the system in `festvox_en1.tar.gz'. The following variables are used by the process
mbrola_progname
mbrola_database
In addition to the above synthesizers Festival also supports CSTR's older PSOLA synthesizer written by Paul Taylor. But as the newer diphone synthesizer produces similar quality output and is a newer (and hence a cleaner) implementation further development of the older module is unlikely.
A general selection based synthesis module is being developed. It includes an implementation of the techniques published in hunt96, but as the original work (not our current implementation) was done at ATR, we will not distribute it. A newer method of selection based synthesis is being developed and produces similar quality to the hunt96 work. This new technique is discussed in black97c, but it still requires much work to make it a stable easy to use synthesis method.
A more general method of synthesis from arbitrary units is already included in the system but its still quite young and none of the released voices use it. It splits the taks of synthesis between unit selection, joining, prosodic modification and resynthesis into distinct well defined steps to make it easier to to choose between different types of databases (diphone vs general units), signal processing techniques etc. Future versions ofthe system will use this more. If you you intend to develope new units sleection or signal processing techniques you may wish to use this new framework.
As one of our funded projects is to specifically develop new selection based synthesis algorithms we expect to include such models within later versions of the system.
Also, now that Festival has been released other groups are working on new synthesis techniques in the system. Many of these will become available and where possible we will give pointers from the Festival home page to them. Particularly there is an alternative residual excited LPC module implemented at the Center for Spoken Language Understanding (CSLU) at the Oregon Graduate Institute (OGI).
Go to the first, previous, next, last section, table of contents.