270 likes | 381 Views
Searching Video. Ananth Sankar, Distinguished Engineer, Cisco, asankar@cisco.com. Outline. Value of video to enterprise Video search today Audio/video analytics for video search/navigation Accuracy of analytics Summary. The Value of Video Content in the Enterprise.
E N D
Searching Video Ananth Sankar, Distinguished Engineer, Cisco, asankar@cisco.com
Outline • Value of video to enterprise • Video search today • Audio/video analytics for video search/navigation • Accuracy of analytics • Summary
The Value of Video Content in the Enterprise • Organizational & Executive Communication • Training/Meetings • Internal/External Events • Marketing • Video is a valuable marketing asset. • People who view videos on Cisco.com: • View 44% more pages and are 41% more likely to return to Cisco.com • Are five times more likely to click-through on a blog post containing a video • Are twice as likely to click-through on email containing a video • In the first 2.5 years Cisco used video content it conducted 14,000 video training sessions. • Saved $57M in travel costs for trainers • Saved $21M in productivity time for the trainer • Saved 62,000 hours of productivity time for the attendees who didn’t have to travel to sessions • 38 % of videos on the Cisco’s portal in 2010 provided organizational updates • Recorded communications allow broader reach to global teams • Employees can interact with executives via comments Recording sessions at events such as Cisco Live and the Annual Sales meeting has expanded the audience sizes by thousands of more attendees.
The Volume of Video Content is Growing Cisco employees view more than 85,000 videos on demand each month On YouTube alone, there are over 18 million “how-to” videos Over 4 million visitors per month take a lesson on KhanAcademy.com
Enterprise Video Content Length Going Beyond the Average Attention Span Average YouTube videos are ~6.4 minutes long. Ted Talks are 18 minutes long. “It’s long enough to be serious and short enough to hold people’s attention” - Chris Anderson, Ted creator Enterprises generate videos that are between 30-60 minutes long.
We Need New Ways to Effectively Engage with Video Content • Find video content that is relevant to us • Today rely on manual tagging & titles • Search isn’t effective • Efficiently navigate video content • Today navigation is linear
Suppose we want to find this video based on content Professor Ng talks about “parametric learning algorithms” in a Stanford lecture on machine learning
Extra terms must be added to find the video Needed to add this term Even then, we are left with 2 lectures to sort through
Linear Playback – Play, Fast Forward, Rewind Can’t find “parametric learning algorithms” buried in the video!
How can we make video search better? Automatically extract information and convert it to useful metadata
Information Contained In Videos • Speakers • Speech • Text • People • Pictures or slides • Behaviors • Sentiments • Events • Landmarks • Places Extracting this information to create metadata enables much better video search and navigation.
Analytics System Overview Metadata extraction • Speech recognition • Speaker recognition • Slide detection, …. Video portal and video player queries Index video with metadata Video pointer and metadata Index • Video ingested into analytics engine • Through recording workflow or the video portal • Analytics engine processes video • Generates speaker and key-phrase metadata • Augmented video available on portal • Indexed using metadata • Player also augmented with metadata
Expected Speech Recognition Accuracy Inaccurate recognition can still support some key-phrase applications
Many Factors Influence Accuracy Accents • Native • Non-native Domain-specific language • Vocabularies • Word patterns Speaking styles • Conversational • Presentation style Acoustic conditions • Clean • Noisy, reverberant
Speech Recognition Accuracy Improvement • Keywords are important words within a domain • Out-of-box model may not know these words • Adapting the system to learn language for specific domains increases accuracy • Adapted domain models can give > 70% precision and recall of keywords • Challenges with adapting • Acquiring sample training data while maintaining privacy • Handling new vocabulary items, e.g., acronyms • Multiple accents within the same domain or customer
How Adaptation Works • Generic Vocabulary: • Health care • Finance • Economy • Education Models • Domain Vocabulary: • Pancreatic cancer • Metastatic melanoma • Radiation treatment • Chemotherapy Training algorithm Adaptation algorithm Training data Adapted models Domain data Adaptation Training
What else is possible? • Transcripts • Closed captioning • Topics • Summaries • Sentiments • Translation
Closing Video provides a very rich experience, but is an opaque media Keywords, phrases and speakers are examples of useful metadata Accuracy is impacted by the large variation in data Adaptation is a useful technique to improve accuracy Audio and video analytics make video as easy to find & navigate as text