410 likes | 437 Views
This chapter explores the application of Singular Value Decomposition (SVD) in media analysis, specifically in the context of the MPESAR system. It also discusses different approaches to segmentation and semantic retrieval in multimedia. Common analysis tools, such as color space and word space, are explored, along with their principles and techniques.
E N D
Understanding The Semantics of Media Chapter 8 Camilo A. Celis
Questions 1. What kind of application does SVD has? How is it used in this paper? 2. What does MPESAR stands for? What does this system do? 3. How does MPESAR generally works?
Contents Understanding the problem Analysis Tools Segmenting Video Semantic Retrieval
Contents Understanding the problem Different Approaches Segmentation Literature Semantic Retrieval Literature Analysis Tools Segmenting Video Semantic Retrieval
Understanding the problem Semantic: (N) the study of meaning. the study of linguistic development by classifying and examining changes in meaning and form. Rapid growth of media: personal media, social media... Low price Social preassure We are not understanding media.
Different Approaches Increasing number of methods to retrieve information from media [Aner and Kender] Finds a background in a video shot, and then clusters shorts into physical scenes by noting shots with common background. QBIC (IMB) Allow to search for images based on the colors and images in an image. Known as query by-example. • Where is the semantics of the media? • "The most important information is in the WORDS!"
Segmentation Literature Extension of others work. Latent semantic indexing. (LSI) - Allows to summarize the semantic content of a document and measure similarities. Visualization and segmentation algorithm based on wavelet analysis of text documents. (time and frequency) Scaled-space ideas to segmentation problem. Multi-dimensional signals.
Semantic Retrieval Literature Multimedia retrieval systems. (audio and video) Mixtures of probability expert for semantic-audio retrieval (MPESAR) is a sophisticated model connection words and media. Consider the acoustic and semantics similarity of sounds, allowing user to retrieve sounds without searching on the an exact word. "MPESAR algorithm is appropriate for mapping one type of media to another."
Contents Understanding the problem Analysis Tools Segmenting Video Semantic Retrieval
Contents Understanding the problem Analysis Tools SVD Principles Color Space Word Space Segmenting Video Semantic Retrieval
Analysis Tools Common tools and mathematics used to analyze multimedia signals. Two type of transformations, which reduce raw text and video signals into meaningful spaces. Preprocessing the data * Mapping from a one dimensional signal (speech) into a multidimensional signal (video).
Analysis Tools: SVD SVD (Singular Value Decomposition) principles: Factorization of real or complex matrixes. Noise reduction. Semantic and video data are expressed as vector-value function of time. Collect data from an entire video and put the data into a matrix X. (Columns of X represent the signal at different times) Using SVD, rewrite the matrix X in terms of 3 matrices U,S,V.
Analysis Tools: Color Space Color changes are useful metrics for finding the boundary between shots. Collect a histogram of colors of each frame. (512 histograms bins) Convert all the tree intensities RGB intensities (0-255) to a single histogram bin, by finding the log base 2, of the intensity value Pack the tree colors into a 9-bit number using floor() to covert to an integer.
Analysis Tools: Word Space Latent Semantic indexing (LSI), uses a SVD in direct analogy to the color analysis. Analyse the audio data by collection a histogram of the words in a transcript of the video. Only one document to study. Consider sentences of the document, which define a semantic space. Issues? Synonomous and Polysemy. SVD captures both relationships.
Contents Understanding the problem Analysis Tools Segmenting Video Semantic Retrieval
Contents Understanding the problem Analysis Tools Segmenting Video Temporal Properties Video Segmentation overview Scalar Space Combined Image and Audio Data Hierarchical Segmentation Results Semantic Retrieval
Segmenting Video Indexing by combining two major sources of data images words Describe the semantic path of a vide's transcript as a signal, from the initial sentence to the conclusion. Instead of trying to find similarities (segments) see audio-visual content as a signal and look for large changes in this signal.
Scale Space Used to find boundaries in a signal. Analyse a signal with many different kernels that vary in size of the temporal neighborhood that is included in the analysis at each point in time. Look for changes in the signal over time. (Do so by calculating the derivate of the signal with respect to time)
Overall From hierarchial segmentation and compare it with other forms of segmentation. A simple description of a video is possible by unifying the representations. Combine 2 well known technique to find boundaries in a video. Reduce dimensionality (SVD) and put all in the same format and its application on color and word data.
Combining color, words and scale space The result is a 20-dimensional vector function of time and scale.
Results(cont.) Representations of the semantic information in the Headline News video in scale space. The top image shows the cosine of the angular change of the semantic trajectory with different amounts of low-pass filtering. The middle plot shows the peaks of the scale-space derivative The bottom plot shows the peaks traced back to their original starting point. These peaks represent topic boundaries.
Segmentation in Perspective New framework for combining into a unified representation and for segmentation from multiple types of information from a video. Described hierarchial segmentation (Unexactedly) good amount of information in the color. This method is also applicable with other type of information. (musical key, audio emotion, etc)
Contents Understanding the problem Analysis Tools Segmenting Video Semantic Retrieval
Contents Understanding the problem Analysis Tools Segmenting Video Semantic Retrieval The algorithm Testing Conclusions
Semantic Retrieval: MEPSAR Connecting sounds to words and vice-versa. Queries with sounds and words Learn about the connections between semantic space and acoustic space.
Algorithm Semantic Features Uses PORTER stemmer to remove common suffixes from words, and deletes common words before further processing. Partition the space into overlapping clusters of regions. Acoustic Features Signal processing and machine learning calculations endeavors to capture the sound. MFCC(mel-frequency cepstral coefficient) Analyse speech sounds. Used to reduce the audio signal GMM captures the long-term characteristics of each sound.
Semantic Retrieval Acoustic signal processing chain Building MPESAR models
Testing Audio to semantic testing procedure.
Retrieval Results Histogram of true label ranks based on likehood from audio-to-semantic test. • Histogram of true label ranks based on likehood from semantic-to-audio test.
Questions 1. What kind of application does SVD has? How is it used? The SVD has also applications in digital signal processing, e.g., as a method for noise reduction. It allows to summarize different kind of video data and combine the results into a common representation. 2. What does MPESAR stands for? What does this system do? (Mixture of Probability Expert for Semantic-Audio Retrieval) Learns the connections between a semantic space and an acoustic space. -Ex) Given a description of a word, the system finds audio signal that best fits the word. 3. How does MPESAR generally works? Semantic space maps words into a high-dimentional probabilistic space. Acoustic space describes sounds by a multidimensional vector. A many to many connection.
Questions 1. What kind of application does SVD has? How is it used? The SVD has also applications in digital signal processing, e.g., as a method for noise reduction. It allows to summarize different kind of video data and combine the results into a common representation. 2. What does MPESAR stands for? What does this system do? (Mixture of Probability Expert for Semantic-Audio Retrieval) Learns the connections between a semantic space and an acoustic space. -Ex) Given a description of a word, the system finds audio signal that best fits the word. 3. How does MPESAR generally works? Semantic space maps words into a high-dimensional probabilistic space. Acoustic space describes sounds by a multidimensional vector. A many to many connection.