90 likes | 99 Views
Get a comprehensive overview of the guiding principles, history, and frequently asked questions about SSML say-as. Learn how it allows for convenient annotation of text for audio rendering and aids in the interpretation of ambiguous content.
E N D
Understanding say-as Background on this confusing feature of the language Dr. Daniel C. Burnett Chief Editor, SSML
Overview • Guiding principles of SSML • Guiding principles of say-as • History of say-as • FAQ • Summary, Conclusions, and two Questions
Guiding principles of SSML • Convenient annotation of existing text for audio rendering • Control at all levels, from text structure and normalization to prosodic control and even voice characteristics • Limited critical error conditions – “rendering must go on”
Guiding principles of say-as • Primary purpose of say-as: to be able to correctly interpret text commonly written in human-readable documents • “Intended for when the processor has insufficient context to interpret ambiguous text” • Interpretation, not rendering • Expertise about rendering should be left in the synthesis processor as much as possible – authors can always do text normalization themselves if necessary • Types limited in behavior
Interpretation, not rendering • Assist processor when interpretation is ambiguous • “When specified, the interpret-as and format values are to be interpreted by the synthesis processor as hints provided by the markup document author to aid text normalization and pronunciation.” • “In all cases, the text enclosed by any say-as element is intended to be a standard, orthographic form of the language currently in context. A synthesis processor should be able to support the common, orthographic forms of the specified language for every content type that it supports.” • No direct rendering control • “Indicating the content type or format does not necessarily affect the way the information is pronounced. A synthesis processor should pronounce the contained text in a manner in which such content is normally produced for the language.”
Types limited in behavior • say-as Note type (“interpret-as” value) inclusion criteria: • Needed by processor to interpret ambiguous input, and • Broadly desired within the industry, and • Either • Needed to maintain consistency with the rendering already used by the processor when input is not ambiguous, or • Difficult to write the orthography for by hand
History of say-as • Original say-as types • Mixed interpretation and rendering • Revised <say-as> element • Lengthy disagreement about attribute names and values • To make progress, froze attribute names with promise to work on values later • <say-as> attribute values Note • Lays out attribute values in order from highest-valued by group members to least • Rejected working on types where criteria were not met • Although boundary between interpretation and rendering was clear, it is still not satisfactory. Often processor does not have enough context to properly render the item (eg. counting numerals in Japanese)
Frequently-asked questions • Why did you remove type blah? Or Why is there no type foo? • Either the use case for it was about rendering rather than interpretation, or there was only limited agreement on its importance • Extension mechanism intended to address this • Why don’t the attribute names match how they’re used? • Agreement on attribute names but not values before SSML pub • Agreement on behavior for specific values during <say-as> Note work • This can be fixed in a later version of SSML • Why is it still called “say-as” when it doesn’t control rendering? • Too many group members disliked changing the name
Summary, Conclusions, & 2 Questions • Summary: say-as was designed to be an “interpret-as” with (ideally) no rendering control • Conclusions: • This is not intuitively obvious from reading the specification(s) • Semantic category-based rendering control is still strongly desired by the community • A question: what is the best way to accomplish semantic category-based rendering control in the world of W3C? • Another question: are there “interpretation” needs unique to other languages that are not accommodated via say-as?