290 likes | 410 Views
Media Manager Mail Access Unified Messaging. Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000. Desktop. Pager. MediaManager Mail Access. Cell-Phone. PSTN Phone. Messages from many sources. ???. Project Overview. Make messages more accessible Get all types of messages
E N D
Media Manager Mail AccessUnified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000
Desktop Pager MediaManager Mail Access Cell-Phone PSTN Phone Messages from many sources ???
Project Overview • Make messages more accessible • Get all types of messages • Access from different devices with different capabilities • Enable faster browsing of many voicemails • Media Mail services • A unified messaging infrastructure • Voicemail is email encoded in MIME • Transcoding services • Enhance voicemail interaction • Includes: skimmed audio, transcript, text/audio summary, and outline
Related Work • Universal Inboxes/Unified Messaging • onebox.com • CoolMail.net • Lucent/Octel Unified Messenger • Stanford Mobile People Architecture • Audio Content Extraction Techniques • SpeechSkimmer, MIT’s MultiMedia Lab [Arons95] • Auto-Summarization, Microsoft Research • CueVideo, IBM
Client Client Folder Store Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Media Manager Interface Media Manager Service Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture
Desktop MediaManager Mail Access Applications • Conventional GUIs • Context-Aware Applications • Iceberg Universal Inbox Component A conventional desktop gui can contact the Media Manager directly and request messages as text. The Media Manager will return emails and voicemails as text.
requests a redirection from the proxy, which forwards the redirection request to the desktop 2 Desktop 1 palm device asks for a list of messages as text and selects a voicemail 3 desktop asks for the voicemail and plays it MediaManager Mail Access Palm Device Context-Aware Application Redirection Proxy
Naming Service 800-MEDIA-MGR UID: mediamgr@cs.berkeley.edu 1 Preference Registry 2 mediamgr: Cluster locn. 3 Automatic Path Creation Service Bhaskar’s Cell-Phone Universal Inbox MediaManager Mail Access Barbara’s PSTN Phone Iceberg Universal Inbox
Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture Client Client Folder Store Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Media Manager Interface Media Manager Service
MediaManagerServiceIF • getFolders( ) and getFoldersAs( ) • Given a username, returns a list of folder names • Returns the list as audio or gsm • getList( ) and getListAs( ) • Given a username, foldername, and count • Returns a list of messages (sendername, title, date) • Returns the list as audio or gsm • getMessage( ) • Given a Message Ref, returns the entire message • getMessageContent( ) • Given a Content ID and return type • Returns one part of the message as the return type
Messages and Content Objects • Media Message • Media Reference id • Array of Content Objects • Content Object • Content ID • Data • Content ID • Media Reference id • Content Part index • Content Type
Media Message Header Content Object Content ID Cell-Phone MediaManager Mail Access Interface Example • User asks for list of messages as GSM • Media Manager returns a list of message headers • Cell Phone sends a Content ID back • Media Manager sends a voicemail Content Object
Audio Tools • Speech Recognition/Synthesis • Transcribe voicemail to text • IBM ViaVoice SDK and custom audio libs • Natural Language Processing • Directed word spotting by “understanding” content • ViaVoice SRCL • Pitch • Detecting important words by emphasized pitch • Pause • Compression through pause removal • Spurts • Retrieve sentence structure of voicemail
Translated Talk spurts • Phyllis Barbara • Area in the cat staring • And then if you run but feed them • A little more the first time in case they eat too much • On my number is (713) 465-5155 • You can call me anytime. • Have every holiday • Of light Translated using NLP • Hello this is Barbara • My number is (713) 465-5155 Examples Original Voicemail: “Hello, This is Barbara. How are you and the cats doing? I was wondering if you would feed them a little more the first time in case they eat too much. My number is (713) 465-5155. You can call me anytime. Have a very good holiday. Bye bye” Processed Voicemail: (Skimmed) (Just pitch) (Pitch emphasized words in green)
Translated Talk spurts Translated using NLP • <Nothing> Examples continued... Original Voicemail: “Faced with a seemingly inevitable engineering task authors tend to adopt one of two strategies for adding new services to the Internet landscape: inflexible, highly tuned, hand-constructed services….” Processed Voicemail: (Skimmed) (Just pitch) • Faced with a seemingly inevitable engineering task authors tend to adopt what it to strategies for adding new services to the internet landscape. • Inflexible, highly Tate, had constructed services….” (Pitch emphasized words in green)
Results • Pause detection • Worked well for given applications • Playback speedup by 50-70% • Pitch detection • Problems due to high pitch sounds and transitions • Speech recognition • Performance decrease in conversational settings • Natural Language Processing • Performed well with small grammar
Example: Adding GSM Acess • Define a specific types, ie GSMAudio, GSMSummary • Optionally create new Content Objects • Add Content Object definition to MediaManager • Add add gsm transcoder to TranscoderService
Detail: Adding GSM Access • Add Content Object definition to MediaManager • Define GSMAUDIO and GSMSUMMARY • Add cases to createObject() in Content Object • Add cases to Media Manager • Add GSM to Transcodeer • Add method toGSM() to Transcoder • Edit .config file • External.transcoder.gsm rungsm • Edit related transcoders • speechSynthesizer and audioSummary()
Implementing Other Mail Stores • Examples: IMAP, POP, Microsoft Exchange Server • Implement MailAccessIF • String [] getMAFolders( userName ) • MediaMessage [] getMAList( userName, folderName, count ) • MediaMessage getMAMessage( MediaRef ) • ContentObject getMAMessageContent( ContentID ) • Add new protocol to Media Manager protocol table • Optionally add protocol for users in to FolderStore
Conclusion • Overall • System useful as navigational hints • To achieve total comprehension, need better voice recognition • What works well • Skimming using pause removal • Detecting spurts for structure • What needs work • Speech detection in conversational settings • Pitch emphasis needs refining • Future Directions • Implementing more mail stores • Enhancing interfaces • Pause detection/word boundaries using speech detection • Developing voicemail grammars • Using NLP feedback with pitch emphasis detection • Improved speech detection in noisy environments
MediaManagerServiceIF • String[] getFolders( userName ) • byte[][] getFoldersAs( userName, returnType ) • MediaMessage [] getList( userName, folderName, count ) • byte[][] getListAs( userName, folderName, count, returnType ) • MediaMessage getMessage( MediaRef ) • ContentObject getMessageContent( ContentID, returnType )
Pitch Detection • The Idea • A speaker’s pitch naturally changes when introducing topics or emphasizing words [Hirshberg92] • Use pitch increases as hints for “important” words • Algorithm [Aaron95] • Determine pitch for each 20 ms frame (FFT with SHS) • Set emphasis threshold to be top 1% of pitch values (by histogram) • Mark 1 sec interval as emphasized if contains >=3 emphasized frames
Percent of Frames Average energy (dB) Pause Detection • Why is pause detection useful? • Removing pauses speedups playback • Typically, 50-70% of original time [Foulke71] • Long pauses signify groups (talk spurts) • Noise and soft sounds create difficulties • Algorithm: Smoothed Histogram [Lamet81] • Calculate energy per 10 ms frame • Threshold based on smoothed histogram (5 dB after first peak) • Use heuristics to remove artifacts
Results • Pause detection • Worked well for given applications • Playback speedup by 50-70% • Pitch detection • Problems due to high pitch sounds and transitions • Speech recognition • Performance decrease in conversational settings • Natural Language Processing • Performed well with small grammar
Conclusion • Overall • System useful as navigational hints • To achieve total comprehension, need better voice recognition • What works well • Skimming using pause removal • Detecting spurts for structure • What needs work • Speech detection in conversational settings • Pitch emphasis needs refining • Future Directions • Implementing more mail stores • Enhancing interfaces • Pause detection/word boundaries using speech detection • Developing voicemail grammars • Using NLP feedback with pitch emphasis detection • Improved speech detection in noisy environments
Works Cited • [Arons95]B. Arons. Interactively Skimming Recorded Speech, Ph.D. dissertation, MIT 1985. • [Foulke71]E. Foulke The Perception of Time Compressed Speech. Ch 4 in Perception of Language, edit by P.M. Kjeldergaaid, D.L. Horton, and J.J. Jenkins, Charles E. Merill Publishing Company, 1971. pp. 79-107 • [Hirshberg92]J. Hirschberg and B. Grosz. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language workshop (Harriman, NY, Feb. 23-26). Morgan Kaufman Publishers, 1992. pp. 441-446. • [Lamel81]L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpson. An Improved Endpoint Detector for Isolated Word Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-29, 4. (Aug, 1981), 771-785.
Media Manager Interface Media Manager Service Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture Client Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Folder Store Client