Home Company Projects Publications Contact Us Join Email List

 
People    
 

Susan R. Hertz

Dr. Hertz is President and Chief Scientist of NovaSpeech LLC.  She has more than thirty years experience in multi-language text-to-speech synthesis, including both text analysis and speech generation, and both rule-based and concatenative methods. She has extensive business, technical, and research experience in these areas, and a strong background in speech processing, software development, software optimization, linguistics, acoustic phonetics, speech perception, and computer science.

In 1983, Dr. Hertz founded Eloquent Technology, Inc. (ETI), a text-to-speech software company, and transformed it over a number of years from a basement operation into a profitable, worldwide leader in multi-language text-to-speech technology.

As President and Chief Technology Officer at ETI, Dr. Hertz oversaw all of the company’s business and technical operations throughout its seventeen-year existence, and invented or designed much of its core technology. This technology included the multi-voice ETI-Eloquence text-to-speech system for thirteen languages/dialects and the sophisticated Delta programming language and interactive environment used to develop the ETI-Eloquence synthesis rules. The ETI-Eloquence product was known for its extremely small memory footprint, flexibility, accurate text processing, and consistent and intelligible speech output.  The ETI-Eloquence product line is developed and marketed today by Scansoft, Inc.

At ETI, Dr. Hertz established relations and negotiated several large software contracts with Fortune 50 companies, including a five-year strategic alliance with IBM, which acquired portions of ETI's technology and incorporated ETI-Eloquence into its ViaVoice line of speech products.

In January 2001, Dr. Hertz sold Eloquent Technology, Inc. to SpeechWorks International, Inc. (now part of Scansoft, Inc.). See ETI's Smithsonian Speech Synthesis History Project (SSSHP) page for more details on ETI's history and technology, and the narrative by Sue Hertz on the SSSHP site for a personal account of the events that led to ETI's formation.

After the ETI-SpeechWorks merger, Dr. Hertz worked for a year and a half in the SpeechWorks Ithaca office as Chief Scientist and Executive Director of text-to-speech technologies.

Since 1979, Dr. Hertz has also held positions in the Linguistics Department at Cornell University. She is currently an Adjunct Professor in the Linguistics Department, teaching occasional graduate-level and upper-level courses in speech synthesis and phonetics.

Dr. Hertz has conducted or overseen many research projects in text analysis, speech synthesis, speech perception, phonetic model development, and the phonology/phonetics interface. In the area of speech synthesis, she has designed or developed a number of sophisticated models and algorithms for both text analysis and speech generation.

Dr. Hertz has been the Principal Investigator or Project Director on 14 government grants or contracts in the area of speech synthesis.

Selected R&D Accomplishments 

Synthesis Tools

  • 1974-1983: Dr. Hertz designed and implemented SRS (Speech Research System), an interactive software system that provided a linguistically-oriented rule formalism and associated tools for expressing and testing phonological and acoustic generalizations about human languages through speech synthesis. SRS was the first large-scale system that enabled linguists without any prior programming or computer experience to express, test, and edit phonological and phonetic rules in linguistically familiar ways. The system also included text processing capabilities so that it could also be used for text-to-speech synthesis.

  • 1983-1995: Dr. Hertz designed and supervised the development of the Delta System, a sophisticated programming language and interactive environment for text-to-speech synthesis and other areas of natural language processing. The Delta System was the first system of its kind to be centered around an integrated multi-tiered utterance representation in which user-defined units of different kinds, such as phrases, words, syllables, phonemes and quantitative values (durations, formant values, etc.), could be related, tested, and manipulated in straightforward ways.  

 

Synthesis Rules

models

Synthesis Models

  • 1980-1985: Dr. Hertz developed a variety of models and strategies for text analysis in English and other languages, including certain novel, knowledge-based approaches for morphological analysis and letter-to-sound conversion. These approaches formed the foundations for the Delta-based text analysis components in ETI-Eloquence.

  • 1989-1995: Dr. Hertz developed a dialect-universal text-to-speech rule module for English  based on acoustic and phonological analysis of dialects of Brooklyn, Boston, Alabama, General American, and Black English. This module underlies the ETI-Eloquence synthesis rules for English.

  • 1992-1994: Dr. Hertz developed the phone-and-transition model of speech timing, which underlies all of the ETI-Eloquence synthesis rules. This model enabled the straightforward expression of language-universal acoustic patterns, and relatively large language-universal components in ETI-Eloquence. (Our recent projects at NovaSpeech have validated many of the fundamental premises underlying the model, which has been evolving into a full-fledged theory that is helping us to account not only the acoustic speech patterns we observe, but also the perceptual constraints that govern these patterns.)

  • 2000-2001: Dr. Hertz discovered a hybrid synthesis approach in which waveform concatenation (unit selection) and formant-based approaches are combined (see [PDF] (329 KB)). (See projects for work by NovaSpeech in this area.)

 

 

top     people

 
Copyright © 2006-2008 NovaSpeech LLC
Last modified: 05/02/06