140 likes | 284 Views
Video retrieval and User interaction and digital rights management. From Multimedia Retrieval, Springer, Blanken et al. “Multimodal” is the keyword…. Based on a case study Formula race cars video recordings Fusion of multimodal information Sound
E N D
Video retrieval and User interaction and digital rights management From Multimedia Retrieval, Springer, Blanken et al.
“Multimodal” is the keyword… • Based on a case study • Formula race cars video recordings • Fusion of multimodal information • Sound • Audio signal analysis to detect interesting events – when the commentator gets excited • At the beginning of an event, there is an overview by commentator • They capture the audio signal and screen out the non-voice range signal • They also look for specific words – not general voice recognition, but searching only for a handful of race-specific words
Fusion • Audio • Analysis of image stream • To catch start of race and other events • Used to locate time boundaries of isolatable events • Superimposed text • Projected on tv screen • Information on the driver • Driver’s place in race, etc.
Audio processing • Mix of human language, car noise, background noise, crowd cheering, horns • Look for human voice frequency • Short time energy (STE) • To remove noise • Wave form based • Pitch – fundamental frequency (F0), the higher, the more excitement in the voice • Search for phonemes • Pause rate – to detect quantity of speech • Keyword spotting – less semantics, but lower error rate
Image stream • Searched for places where commentator raised his voice • Searched histogram, looking for certain colors and shapes • Tracked the changing of colors and shapes over a series of frames • Focus on • Start of race • Passing • Fly-outs (sand and dust)
Text • Two classes • Scene text • Superimposed text • The same text can span many frames, and so they count on its position being fixed to limit processing time
Interaction • Ways to pose queries • Ways to give feedback • Ways to explore
Interaction types • Retrieval • Query formulation • Concept based • Content based • Concept-based • Key words in natural language • People use different words for the same thing • Metadata is often missing • Easy for user, hard for software • Content-based • Query by example paradigm • User provides examples
Dynamic query interaction • Sliders, buttons, etc. • Visual is the key • Of the query • Of the results • Example system, page 299 • Interaction cycle is short
Browsing • Links, with a feeling similar to using the web • Browsing model • To get impression of search space • To find something when you aren’t sure what it is • Browsing a collection of objects and browsing a single object • Browsing keywords or namespace hierarchy • Example on page 301
User input and relevance feedback • Modalities • Visual, audio, tactile • Or touch screen, electronic pen, camera, mic, eye tracker, locality sensor, mouse, keyboard • No user guide needed • If it is speech only, it is difficult to process • Multiple modalities at once • Such as speech and a map for location or distance • Use of ambient intelligence to collect information • Relevance feedback • Binary feedback • Weighed relevance feedback – image page 305 • Personalization • Similar to 1-to-1 marketing concept • User profiles are used • Users not excited about providing profile info, though • Users are grouped into content interest groups
Feedback • Passive works well, like skipping songs on a feed • Making an offer that adds to a query, works sometimes, like Amazon trying to sell you similar books • User profiles can be built automatically from a history of purchases or a clickstream • Filtering techniques • Content based – based on triples • Attribute – value – fit • Title – war and peace – 0 • Social based – by putting people into groups and getting larger user samples and putting profiles into groups
Presentation • Must provide metadata and data in an integrated way • Inherently multimedia in nature in query and response • Tree maps or complex metadata or data • Graphs to put multimedia objects together into single conceptual objects • Starfield display • Breaking videos into segments to aid non-linear searching • Providing sample frame for each segment • Images on pages 314 and 315 and 316 • Key factors in presenting multimedia data – content adaption • What capabilities the device has • Limits of device – like size, color, formats of data • Must often change formats of data to fit a device
Digital rights • DRM (digital rights management) • Preventative approach • Encryption • Node locking • Dongle • Reactive approach • Embedding extra information in the product • Tracking behavior and looking for a violation • Sometimes called forensic tracking • Looking for specific watermarks, often specific to a given user • Makes it hard to pass content on • Application domains • Legal – concept: Personal Entertainment Domain (PED) • To keep content secure, commercially and intelligence-wise • Diagram on page 325 and 326 and 331 • Sometimes the media is free and commercials are embedded