1 / 57

Multimedia Information: Concepts, Challenges, and Solutions

This lecture provides an overview of major concepts, issues, and challenges in multimedia information organization and retrieval. It introduces current approaches and discusses new solutions, methodological considerations, and future work in the field. The lecture also explores the research areas of Prof. Ray Larson and Prof. Marc Davis at UC Berkeley's SIMS.

oplant
Download Presentation

Multimedia Information: Concepts, Challenges, and Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2002 http://www.sims.berkeley.edu/academics/courses/is202/f02/ Lecture 07: Multimedia Information SIMS 202: Information Organization and Retrieval

  2. Last Time • Review • Dublin Core • Other Metadata Systems • Controlled Vocabularies • Name Authority Files • Choice of Names • Form of Names • Other Types of Controlled Vocabularies • Faceted vs. Hierarchic Organization of Vocabularies

  3. Hierarchical Classification • Each category is successively broken down into smaller and smaller subdivisions • No item occurs in more than one subdivision • Each level divided out by a “character of division” (also known as a feature) • Example: • Distinguish “Literature” based on: • Language • Genre • Time Period Slide author: Marti Hearst

  4. Hierarchical Classification Literature English French Spanish ... ... ... Prose Poetry Drama ... Prose Poetry Drama ... 16th 17th 18th 19th 16th 17th 18th 19th Slide author: Marti Hearst

  5. Labeled Categories for Hierarchical Classification • LITERATURE • 100 English Literature • 110 English Prose • English Prose 16th Century • English Prose 17th Century • English Prose 18th Century • ... • 111 English Poetry • 121 English Poetry 16th Century • 122 English Poetry 17th Century • ... • 112 English Drama • 130 English Drama 16th Century • … • 200 French Literature Slide author: Marti Hearst

  6. Faceted Classification • Create a separate, free-standing list for each characteristic or division (feature) • Combine features to create a classification Slide author: Marti Hearst

  7. A Language a English b French c Spanish B Genre a Prose b Poetry c Drama C Period a 16th Century b 17th Century c 18th Century d 19th Century Aa English Literature AaBa English Prose AaBaCa English Prose 16th Century AbBbCd French Poetry 19th Century BbCd Drama 19th Century Faceted Classification Slide author: Marti Hearst

  8. Today’s Lecture Goals • Overview of major concepts, issues, and challenges for multimedia information • Introduction to some of my research areas in digital media at SIMS • Not a survey of existing systems • Not an in depth discussion of algorithms for multimedia indexing and retrieval • For more breadth and depth, talk to me and take “IS 246: Multimedia Information” next semester

  9. Lecture 07: Multimedia Information • Problem Setting • Representing Media • Current Approaches • New Solutions • Methodological Considerations • Future Work

  10. Lecture 07: Multimedia Information • Problem Setting • Representing Media • Current Approaches • New Solutions • Methodological Considerations • Future Work

  11. Marc Davis Research • Creating technology and applications that will enable daily media consumers to become daily media producers • Research and teaching in the theory, design, and development of digital media systems for creating and using media metadata to automate media production and reuse

  12. Global Media Network • Digital media produced anywhere by anyone accessible to anyone anywhere • Today’s media users become tomorrow’s media producers • Not 500 Channels — 500,000,000 multimedia Web Sites

  13. What is the Problem? • Today people cannot easily create, find, edit, share, and reuse media • Computers don’t understand media content • Media is opaque and data rich • We lack structured representations • Without content representation (metadata), manipulating digital media will remain like word-processing with bitmaps

  14. Technology Goals • Goals • Increase access to media content • Decrease effort in media handling and reuse • Improve usefulness of media content • Technology • Create metadata about media content • Use metadata to manipulate media

  15. Types of Multimedia Data • 1D • Audio (speech, music, sound effects, etc.) • MIDI • 2D • Photographs • Graphics • 3D • Video (2D + Time) • Animation (2D + Time) • Computer graphic models • 4D • Computer graphic model animation (3D + Time)

  16. Chang: Content-Based Media Analysis • “Traditional views of content-based technologies focus on search and retrieval—which is important but relatively narrow.” • “[…] emphasizing the end-to-end content chain and the many issues evolving around it. What’s the best way to integrate manual and automatic solutions in different parts of the chain?”

  17. Media Production Chain PRE-PRODUCTION PRODUCTION POST-PRODUCTION DISTRIBUTION

  18. Chang: Content-Based Media Analysis • Areas of research • Reverse engineering of the media capturing and editing processes • Extracting and matching objects • Meaning decoding and automatic annotation • Analysis and retrieval with user feedback • Generating time-compressed skims

  19. Chang: Content-Based Media Analysis • Impact criteria • Generating metadata not available from production • Providing metadata that humans aren’t good at generating • Focusing on content with large volume and low individual value • Adopting well-defined tasks and performance metrics

  20. Lecture 07: Multimedia Information • Problem Setting • Representing Media • Current Approaches • New Solutions • Methodological Considerations • Future Work

  21. Representing Video • Streams vs. Clips • Video syntax and semantics • Ontological issues in video representation

  22. Video is Temporal

  23. Streams vs. Clips

  24. Stream-Based Representation • Makes annotation pay off • The richer the annotation, the more numerous the possible segmentations of the video stream • Clips • Change from being fixed segmentations of the video stream, to being the results of retrieval queries based on annotations of the video stream • Annotations • Create representations which make clips, not representations of clips

  25. Video Syntax and Semantics • The Kuleshov Effect • Video has a dual semantics • Sequence-independent invariant semantics of shots • Sequence-dependent variable semantics of shots

  26. Ontological Issues for Video • Video plays with rules for identity and continuity • Space • Time • Character • Action

  27. Space and Time: Actual vs. Inferable • Actual Recorded Space and Time • GPS • Studio space and time • Inferable Space and Time • Establishing shots • Cues and clues

  28. Lecture 07: Multimedia Information • Problem Setting • Representing Media • Current Approaches • New Solutions • Methodological Considerations • Future Work

  29. The Search for Solutions • Current approaches to creating metadata don’t work • Signal-based analysis • Keywords • Natural language • Need standardized metadata framework • Designed for video and rich media data • Human and machine readable and writable • Standardized and scaleable • Integrated into media capture, archiving, editing, distribution, and reuse

  30. The Semantic Gap • “[…] the semantic gap between the rich meaning that users want when they query and browse media and the shallowness of the content descriptions that we can actually compute is weakening today’s automatic content-annotation systems.” • Dorai and Venkatesh, “Computational Media Aesthetics: Finding Meaning Beautiful”

  31. Signal-Based Parsing • Practical problem • Parsing unstructured, unknown video is very, very hard • Theoretical problem • Mismatch between percepts and concepts

  32. Perceptual/Conceptual Issue Similar Percepts / Dissimilar Concepts Clown Nose Red Sun

  33. Perceptual/Conceptual Issue Dissimilar Percepts / Similar Concepts John Dillinger’s Timothy McVeigh’s Car Car

  34. Effective and useful automatic parsing Video Scene break detection Camera motion analysis Facial recognition Feature tracking Low level visual similarity Audio Pause detection Audio pattern matching Simple speech recognition Approaches to automated parsing At the point of capture, integrate the recording device, the environment, and agents in the environment into an interactive system After capture, use “human-in-the-loop” algorithms to leverage human and machine intelligence Signal-Based Parsing

  35. Keywords vs. Semantic Descriptors dog, biting, Steve

  36. Keywords vs. Semantic Descriptors dog, biting, Steve

  37. Why Keywords Don’t Work • Are not a semantic representation • Do not describe relations between descriptors • Do not describe temporal structure • Do not converge • Do not scale

  38. Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.

  39. Natural Language vs. Visual Language Jack, an adult male police officer, while walking to the left, starts waving with his left arm, and then has a puzzled look on his face as he turns his head to the right; he then drops his facial expression and stops turning his head, immediately looks up, and then stops looking up after he stops waving but before he stops walking.

  40. Notation for Time-Based Media: Music

  41. Visual Language Advantages • A language designed as an accurate and readable representation of time-based media • For video, especially important for actions, expressions, and spatial relations • Enables Gestalt view and quick recognition of descriptors due to designed visual similarities • Supports global use of annotations

  42. Retrieving Video • Query: • Retrieve a video segment of “a hammer hitting a nail into a piece of wood” • Sample results: • Video of a hammer hitting a nail into a piece of wood • Video of a hammer, a nail, and a piece of wood • Video of a nail hitting a hammer, and a piece of wood • Video of a sledgehammer hitting a spike into a railroad tie • Video of a rock hitting a nail into a piece of wood • Video of a hammer swinging • Video of a nail in a piece of wood

  43. Types of Video Similarity • Low-level numeric features • Color • Motion • Blobs • Semantic • Similarity of descriptors • Relational • Similarity of relations among descriptors in compound descriptors • Temporal • Similarity of temporal relations among descriptors and compound descriptors

  44. Retrieval Examples to Think With • “Video of a hammer, a nail, and a piece of wood” • Exact semantic and temporal similarity, but no relational similarity • “Video of a nail hitting a hammer, and a piece of wood” • Exact semantic and temporal similarity, but incorrect relational similarity • “Video of a sledgehammer hitting a spike into a railroad tie” • Approximate semantic similarity of the subject and objects of the action and exact semantic similarity of the action; and exact temporal and relational similarity • “Video of a hammer swinging” cut to “Video of a nail in a piece of wood”

  45. What is Retrieval For? • Redefine retrieval task as part of a larger user goal • Using a recipe • Getting to a location • Making a video greeting • Smoliar: Rethinking information organization and retrieval • Context • Form • Content

  46. Lecture 07: Multimedia Information • Problem Setting • Representing Media • Current Approaches • New Solutions • Methodological Considerations • Future Work

  47. New Solutions for Creating Metadata After Capture During Capture

  48. Evolution of Media Production • Customized production • Skilled creation of one media product • Mass production • Automatic replication of one media product • Mass customization • Skilled creation of adaptive media templates • Automatic production of customized media

  49. Editing Paradigm Has Not Changed

  50. Central Idea: Movies as Programs Content Representation Producer Parser Media Media Parser Media Content Representation • Movies change from being static data to programs • Shots are inputs to a program that computes new media based on content representation and functional dependency (US Patents 6,243,087 & 5,969,716)

More Related