440 likes | 622 Views
Multimedia- and Distributed DBMS Introduction. Content of the Complete Course. Introduction to Multimedia Database Systems Introduction to MPEG-7 Introduction to MPEG-21 Distributed Multimedia Database System. This course:. Introduction to all important partial aspects: Multimedia Systems
E N D
Content of the Complete Course • Introduction to Multimedia Database Systems • Introduction to MPEG-7 • Introduction to MPEG-21 • Distributed Multimedia Database System
This course: • Introduction to all important partial aspects: • Multimedia Systems • What does the User expects from MMDBMS? • Data Types • MMDBMS and Retrieval / Difference to Queries • MMDBMS Indexing • Demo Systems • MPEG-7
Multimedia System Definition A computer hardware/software system used for • Acquiring and Storing • Managing • Indexing and Filtering • Manipulating (quality, editing) • Transmitting (multiple platforms) • Accessing large amount of visual information like, Images, video, graphics, and associated multimedia Examples: image and video databases, web media search engines, home media gateway, mobile media navigator, etc.
Multimedia Systems (cont.) • Why it’s important? • Adoption of Digital Video • New Content Creation Tools • Deployment of High-Speed Networks • New Content and Services • Mobile Internet • 3D graphics, network games • Media portals • Standards become available: coding, delivery, and description
Multimedia Systems: User Needs • User Needs Access multimedia information anytime anywhere on any device from any source Network/device transparent Quality of service (graceful degradation) Intelligent tools and interfaces Automated protection and transaction
User Needs: Example • Example: PDR (Tivo) Any-Time Paradigm http://www.tv-anytime.org • Time-shift, local storage: ~20 hours • Instant record, live pause, simultaneous record/playback • Search/retrieval, multi-source comparison/summarization, bookmarking –> Indexing !!!! (MPEG-7 contr.) • Personal profile, multi-user profile • Target services, ads, consumer usage data, • Pay per choice, e-Commerce
Multimedia Information Retrieval (and Indexing) • Historically associated to MMDBMS (1990-) • Multimedia information retrieval : • deals with the storage, retrieval, transport and presentation of different types of multi-media data (e.g., images, video clips, audio clips, texts,…) • real need for managing multimedia data including their retrieval • Multimedia information retrieval in general: • retrieval process: • queries • indexing the documents • matching document and query representations
Short description of some multimedia data types • Image: • storage: • encoded as a set of pixels or cell values • in compressed form to save space: e.g. GIF, JPEG • image shape descriptor: describes the geometric shape of the raw image: • rectangle of m by n grid of cells • each cell contains a pixel (=picture element) value that describes the cell content in one (black/white image) or more bits (gray scale, e.g., 8 bits or color image, e.g., 24 bits)
Short description of some multimedia data types • Video data: = stream of images (sequence of frames) and audio • frame = still image • presentation at specified rates per time unit • divided into video segments: • each segment: • is made up of a sequence of contiguous frames that include the same objects/activities= semantic unit • and corresponding audio phrases • identified by its starting and ending frames • stored in compressed form to save space: e.g., MPEG
Short description of some multimedia data types • Audio data: • speech, music, ... • can be structured in sequences: • characterized by tone, duration, … • when sequence contains speech: characteristics of a certain person’s voice: e.g., loudness, intensity, pitch and clarity • when sequence contains music: beat, pitch, chords, ...
Short description of some multimedia data types • Composite or mixed multimedia data (e.g. video and audio data): • may be physically mixed to yield a new storage format • or logically mixed while retaining original types and formats • additional control information describing how the information should be rendered • build up presentations (e.g., SMIL)
MMDBMS and Retrieval: What is that ? First attempt for a clearer meaning • Example: an insurance company’s accident claim report as a multimedia object: it includes: • images of the accident • insurance forms with structured data • audio recordings of the parties involved in the accident • text report of the insurance company’s representative • Multimedia databases store (slightly) structured data and unstructured data • Multimedia retrieval systems must retrieve structured and unstructured data
MMDBMS and Retrieval (cont.) • Retrieval of structured data from databases: • typically handled by a Database Management System (DBMS) • DBMS provides a query language (e.g., Structured Query Language, SQL for the relational data model) • deterministic matching of query and data • Retrieval of unstructured data from databases: • typically handled by Information Retrieval (IR) system • similarity matching of uncertain query and document representations • result: list of documents according to relevance
MMDBMS and Retrieval (cont.) • Multimedia database management systems should combine the Database Management System (DBMS) and information retrieval (IR) technology: • data modeling capabilities of DBMSs with the advanced and similarity based query capabilities of IR systems • Challenge = finding a data model that ensures: • effective query formulation and document representation • efficient storage • efficient matching • effective delivery
MMDBMS and Retrieval (cont.) • Query formulation: • must accommodate information needs of users of multimedia systems • Document representations and their storage: • an appropriate modeling of the structure and content of the wide range of data of many different formats (= indexing) -> XML ? -> MPEG-7 • cf. dealing with thousands of images, documents, audio and video segments, and free text • at the same time modeling of physical properties for: • compression/ decompression, synchronization, delivery -> MPEG-21
MMDBMS and Retrieval (cont.) • Matching of query and document representations: • taking into account the variety of attributes and their relationships of query and document representations • combination of exact matching of structured data with uncertain matching of unstructured data • Delivery of data: • browsing, retrieval • temporal constraints of video and audio presentation • merging of data from different sources (e.g., in medical networks)
MMDBMS Queries 1) As in many retrieval systems, the user has the opportunity to browse and navigate through hyperlinks with querying: need of: • topic maps • summary descriptions of the multimedia objects 2) Queries specifying the conditions of the objects of interest • idea of multimedia query language: • should provide predicates for expressing conditions on the attributes, structure and content (semantics) of multimedia objects
MMDBMS Queries (cont.) • attribute predicates: • concern the attributes of multimedia objects with an exact value (cf. traditional DB attributes): • e.g., date of a picture, name of a show • structural predicates: • temporal predicates to specify temporal synchronization: • for continuous media such as audio and video • for expressing temporal relationships between the frame representations of a single audio or video • e.g., “Find all the objects in which a jingle is playing for the duration of an image display”
MMDBMS Queries (cont.) • spatial predicates to specify spatial layout properties for the presentation of multimedia objects: examples of predicates: contain, is contained in, intersect, is adjacent to e.g., “Find all the objects containing an image overlapping the associated text” • temporal and spatial predicates can be combined : e.g., “ Find all the objects in which the logo of the car company is displayed, and when it disappears, a graphic showing the increase in the company sales is shown in the same position where the logo was” • temporal and spatial predicates can: refer to whole objects refer to subcomponents of objects: with data model that supports complex object representation
MMDBMS Queries (cont.) • semantic predicates: • concern the semantic and unstructured content of the data involved • represented by the features that have been extracted and stored for each multimedia object • e.g.,”Find all the objects containing the word OFFICE or “Find all red houses” • uncertainty, proximity and weights can be expressed in query • multimedia query language: • structured language • users do not formulate queries in this language, but enter query conditions by means of interfaces • natural language queries? • interface translates query to correct query syntax
MMDBMS Queries 3) Query by example: • e.g., video, audio • the query is composed by picking an example and choosing the features the object must comply with • e.g., in a graphical user interface (GUI): users chooses image of a house and domain features for the query: “Retrieve all houses of similar shape and different color” • e.g., music: recorded melody, note sequence being entered by Musical Instruments Digital Interface (MIDI) 4) Question-answering? • e.g., questioning video images: “How many helicopters were involved in the attack on Kabul of December 20, 2001? “
MMDBMS Example: Oracle’s interMedia • Enables Oracle 9i to manage rich content, including images, audio, and video information in an integrated fashion with other traditional business data. • interMedia can parse, index, and store rich content, develop content rich Web applications, deploy rich content on the Web, and tune Oracle9i content repositories. • interMedia enables data management services to support the rich data types used in electronic commerce catalogs, corporate repositories, Web publishing, corporate communications and training, media asset management, and other applications for internet, intranet, extranet, and traditional application in an integrated fashion • http://technet.oracle.com and more at the end of the course
MMDBMSIndexing • Remember: Indexing and Retrieval Systems. • Indexing = assigning or extracting features that will be used for unstructured and structured queries (refers unfortunately often only to low-level features) • Often also segmentation: detection of retrieval units • Two main approaches: • manual: • segmentation • indexing= naming of objects and their relationships with key terms (natural language or controlled language) • automatic analysis: • identify the mathematical characteristics of the contents • different techniques depending on the type of multimedia source (image, text, video, or audio) • possible manual correction
Indexing multimedia and features • multimedia object: typically represented as set of features (e.g., as vector of features) • features can be weighted (expressing uncertainty or significance of its value) • can be stored and searched in an index tree • Features have to embedded with the semantic content • ….
Indexingimages • Automatic indexing of images: • segmentation in homogeneous segments: • homogeneity predicate defines the conditions for automatically grouping the cells • e.g., in a color image, cells that are adjacent to one another and whose pixel values are close are grouped into a segment • indexing: recognition of objects: simple patterns: • recognition of low level features : color histograms, textures, shapes (e.g., person, house), position • appearance features often not important in retrieval
Indexingaudio • Automatic indexing of audio: • segmentation into sequences (= basic units for retrieval): often manually • indexing: • speech recognition and indexing of the resulting transcripts (cf. indexing written text retrieval) • acoustic analysis (e.g., sounds, music, songs: melody transcription: note encoding, interval and rhythm detection and chords information): translated into string • e.g., key melody extraction: Tseng, 1999
Indexing video • Automatic indexing of video: • segment: basic unit for retrieval • objects and activities identified in each video segment: can be used to index the segment • segmentation: • detection of video shot breaks, camera motions • boundaries in audio material (e.g., other music tune, changes in speaker) • textual topic segmentation of transcripts of audio and of close-captions (see below) • heuristic rules based on knowledge of: • type-specific schematic structure of video (e.g., documentary, sports) • certain cues: appearance of anchor person in news =>new topic
An example of indexing • Learning of textual descriptions of images from surrounding text (Mori et al., 2000): • training: • images segmented in image parts of equal size • feature extraction for each image part (by quantization): • 4 x 4 x 4 RGB color histogram • 8 directions x 4 resolutions intensity histogram • words that accompany the image are inherited by each image part: • words are selected from the text of the document that contains the image by selecting nouns and adjectives that occur with a frequency above a threshold • cluster similar image parts based on their extracted features: • single-pass partitioning algorithm with minimum similarity threshold value
An example of indexing • for each word and each cluster is estimated: P(wi|cj) as where mji = total frequency of word wi in cluster cj Mj = total frequency of all words in cj • testing: • unknown image is divided into parts and image features are extracted • for each part, the nearest cluster is found as the cluster whose centroid is most similar to the part • the average likelihood of all the words of the nearest clusters is computed • k words with largest average likelihood are chosen to index the new image (in example k = 3) • Use of ImageText in MPEG-7 ??
Demo Systems • Hermitage Museum Web Site (QBIC) http://hermitagemuseum.org/ http://hermitagemuseum.org/fcgi-bin/db2www/qbicColor.mac/qbic?selLang=English • Media Portal: WebSEEk http://www.ctr.columbia.edu/webseek/ • Video Search Engine: VideoQ http://www.ctr.columbia.edu/videoq • Georgraphical Application http://nayana.ece.ucsb.edu/M7TextureDemo/Demo/client/M7TextureDemo.html • http://www-db.stanford.edu/IMAGE/
QBIC features and features • Color: QBIC computes the average Munsell (Miyahara, et.al., 1988) coordinates of each object and image, plus a k element color histogram (k is typically 64 or 256) that gives the percentage of the pixels in each image in each of the k colors. • Texture: QBIC's texture features are based on modified versions of the coarseness, contrast, and directionality features proposed in (H. Tamura, et.al., 1978). Coarseness measures the scale of the texture (pebbles vs. boulders), contrast describes the vividness of the pattern, and directionality describes whether or not the image has a favored direction or is isotropic (grass versus a smooth object). • Shape: QBIC has used several different sets of shape features. One is based on a combination of area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants. A second is the turning angles or tangent vectors around the perimeter of an object, computed from smooth splines fit to the perimeter. The result is a list of 64 values of turning angle.
MPEG-7 • New Standard of the Moving Picture Experts Group (03/06 2002) MPEG-7 is a content description and a new compression standard with maximal content accessibility • standardization of metadata for multimedia content and retrieval • including images, graphics, audio, speech, video and composition information • deals with technical features (e.g., color, shape, motion), as well as content features (e.g., facial expressions) • but, standardizes only what is necessary so that the description of the content may adapt to different users and application domains • – very important to us – here only short intro