1 / 14

Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy

Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy. Multimedia Research. Speech Search. Face identification. Object recognition. Video browsing. Semantic extraction. (3D) Segmentation. (3D) Image search. Speech Applications. Speech as interface

nitsa
Download Presentation

Behrooz Chitsaz Lorrie Apple Johnson Microsoft Research U.S. Department of Energy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Behrooz ChitsazLorrie Apple JohnsonMicrosoft ResearchU.S. Department of Energy

  2. Multimedia Research Speech Search Face identification Object recognition Video browsing Semantic extraction (3D) Segmentation (3D) Image search

  3. Speech Applications • Speech as interface • Speech as 1st class content

  4. Speech recognition Spectral Analysis o1..oT Acoustic Modelsp(ot..ot|phoneme) Matching (Decoding)time alignment  most likely hypothesisW’=argmax(w1..wN)p(ot..oT|w1..wN) P(w1..wN) DictionaryP(phonemes|w) Grammar (Language Model)P(w1..wN) (w1..wN)^ “Hello World”

  5. MAVIS technology • Indexing automatic transcripts as text • Automatic transcription accuracy is only 50-80% • MAVIS techniques • Word-level lattice indexing • index word alternatives – robust to recognizer errors • 50-140% accuracy improvement • index timing – navigate to exact point in video • Vocabulary Adaptation • Use NLP and Bing Search to expand word dictionary • Automatic keywords to expose to search engines • Enables discovery of speech content through search engines • Bi-product of vocabulary adaptation • See http://research.microsoft.com/mavis

  6. MAVIS Architecture Microsoft Azure • Store content to be processed in temporary Azure storage • Do vocabulary adaptation using Bing • Run recognition engine on content • Store results or recognition process (AIB) 4. Search/Retrieve results 1. Submit audio/video RSS 2. Retrieve AIB Web server(s) SQL Server(s) 3. Import AIB in SQL

  7. U.S. Department of Energy Office of Scientific and Technical Information (OSTI) Mission • DOE invests > $10 billion/year in basic sciences, clean energy technology, and nuclear research. • The immediate output from this investment is Information…Knowledge… R&D results • OSTI’s mission is to accelerate scientific progress by accelerating access to this information.

  8. OSTI’s Core Products • Information Bridge • Science Accelerator • Science.gov

  9. WorldWideScience.org

  10. Emerging Forms of Scientific Information Require New Tools • Numeric data, multimedia, and social media are emerging forms of scientific information • Each form presents special opportunitiesand challenges

  11. Search and Retrieval Challenges with Multimedia Science Information • Lack of written transcripts, i.e. no “full text” to search • Metadata, if available, is often minimal • Scientific, technical, and medical terminology/vocabulary • Videos can be long, often up to an hour or more

  12. OSTI and Microsoft Research Partnership • Video files collected from DOE’s National Laboratories • RSS feeds with metadata and URLs sent to Microsoft Research • Audio indexing performed via MAVIS • Audio index blob (AIB) returned to OSTI and integrated with SQL servers • Users can search for a precise term within the video, and be directed to the exact point in the video where the term was spoken

  13. Demonstration of ScienceCinema

  14. Looking to the Future • Additional content from DOE researchers • Integration of multimedia searches into WorldWideScience.org by June • High quality automatic closed captions • Multilingual translation capabilities

More Related