Susan
R. Hertz
Dr. Hertz is President and Chief Scientist of NovaSpeech LLC.
She has more than thirty years
experience in multi-language text-to-speech synthesis, including both
text analysis and speech generation, and both rule-based and concatenative methods. She has extensive business, technical, and
research experience in these areas, and a strong background in speech
processing, software development, software optimization, linguistics,
acoustic phonetics, speech perception, and computer science.
In 1983, Dr. Hertz founded Eloquent Technology,
Inc. (ETI), a text-to-speech software company, and transformed it over a
number of years from a basement operation into a profitable, worldwide
leader in multi-language text-to-speech technology.
As President and Chief Technology
Officer at ETI, Dr. Hertz oversaw all of the company’s business and technical
operations throughout its seventeen-year existence, and invented or
designed much of its core technology. This technology included the
multi-voice ETI-Eloquence text-to-speech system for thirteen
languages/dialects and the
sophisticated Delta
programming language and interactive environment used to develop the
ETI-Eloquence synthesis rules. The ETI-Eloquence product was known for its
extremely small memory footprint, flexibility, accurate text
processing, and consistent and intelligible speech output. The
ETI-Eloquence product line is developed and marketed today by Scansoft, Inc.
At ETI, Dr. Hertz established relations and
negotiated several large software contracts with Fortune 50 companies,
including a five-year strategic alliance with IBM, which acquired portions
of ETI's technology and incorporated ETI-Eloquence into its ViaVoice line of
speech products.
In January 2001, Dr. Hertz sold Eloquent
Technology, Inc. to SpeechWorks International, Inc. (now part of Scansoft,
Inc.). See
ETI's Smithsonian Speech Synthesis History Project (SSSHP) page for more
details on ETI's history and technology, and the
narrative by Sue Hertz
on the SSSHP site for a personal
account of the events that led to ETI's formation.
After the ETI-SpeechWorks merger, Dr.
Hertz worked for a year and a half in the SpeechWorks Ithaca office
as Chief Scientist and Executive Director of text-to-speech technologies.
Since 1979, Dr. Hertz has also held positions in
the Linguistics Department at Cornell University. She is currently an
Adjunct
Professor in the Linguistics Department,
teaching occasional graduate-level and upper-level courses in speech
synthesis and phonetics.
Dr. Hertz has conducted or overseen
many research projects
in text analysis, speech synthesis, speech perception, phonetic
model development, and the phonology/phonetics interface. In the
area of speech synthesis, she has designed or developed a number of
sophisticated models and algorithms for both text analysis and
speech generation.
Dr. Hertz
has been the Principal Investigator or Project Director on 14
government grants or contracts in the area of speech synthesis.
Selected R&D
Accomplishments
Synthesis Tools
-
1974-1983:
Dr. Hertz designed and implemented
SRS (Speech Research System), an interactive software
system that provided a linguistically-oriented rule formalism and
associated tools for expressing and testing phonological and
acoustic generalizations about human languages through speech
synthesis. SRS was the first large-scale system that enabled
linguists without any prior programming or computer experience to
express, test, and edit phonological and phonetic rules in
linguistically familiar ways. The system also included text
processing capabilities so that it could also be used for
text-to-speech synthesis.
-
1983-1995: Dr. Hertz designed and
supervised the development of the Delta System, a sophisticated
programming language and interactive environment for text-to-speech
synthesis and other areas of natural language processing. The
Delta System was the first system of its kind to be
centered around an integrated multi-tiered utterance representation
in which user-defined units of different kinds, such as phrases,
words, syllables, phonemes and quantitative values (durations,
formant values, etc.), could be related, tested, and manipulated in
straightforward ways.
Synthesis Rules
and (with others) more rudimentary
rule
sets for German, Dutch, and Spanish.
1996-2000: Dr. Hertz with ETI employees developed Delta-based
text-to-speech rules for thirteen languages/dialects
: US and UK English, Parisian and Canadian French, Castilian and
Mexican Spanish, German, Italian, Finnish, Brazilian Portuguese,
Mandarin Chinese, Japanese, and Korean. See Hertz et al.
(1999)
[PDF]
(265 KB).
models
Synthesis Models
-
1980-1985: Dr. Hertz developed a
variety of models and strategies for text analysis in English and
other languages, including certain novel, knowledge-based approaches
for morphological analysis and letter-to-sound conversion. These
approaches formed the foundations for the Delta-based text analysis
components in ETI-Eloquence.
-
1989-1995: Dr. Hertz
developed a
dialect-universal text-to-speech rule module for English
based on acoustic and phonological analysis of dialects of Brooklyn,
Boston, Alabama, General American, and Black English. This module
underlies the ETI-Eloquence synthesis rules for English.
-
1992-1994: Dr. Hertz
developed
the
phone-and-transition model of speech timing,
which underlies all of the ETI-Eloquence synthesis rules. This model
enabled the straightforward expression of language-universal
acoustic patterns, and relatively large language-universal
components in ETI-Eloquence. (Our recent
projects
at NovaSpeech have validated many of the fundamental premises
underlying the model, which has been evolving into a full-fledged
theory that is helping us to account not only the acoustic speech
patterns we observe, but also the perceptual constraints that govern
these patterns.)
-
2000-2001: Dr. Hertz discovered a hybrid synthesis approach
in which waveform concatenation
(unit selection) and formant-based approaches are combined (see
[PDF]
(329 KB)). (See
projects for work by NovaSpeech in this
area.)
|