Multimedia Information Retrieval

Multimedia Information Retrieval • Unlike alphanumeric data, multimedia data do not have any semantic structure • Achieving symmetry between annotation and query is difficult • Retrieval is based on similarity between query and stored information instead of exact match • Stored information is represented using indexing

IR Model • Information is preprocessed to extract features and semantic contents • Indexed based on these features and semantics • User’s query is processed and main features are extracted • Query’s features are then compared with features or index of each information item in the database • Information item whose features are similar to those of the query are retrieved and presented to the user

Design Issues • Indexing • a mechanism that reduces the search space of an operator without losing any relevant information • Similarity Computation • easy to compute and should conform to human judgement

Performance Measures • Retrieval speed, recall, precision • Recall measures the ability of retrieving relevant information items from the database • defined as the ratio between the number of retrieved relevant items and the total number of relevant items in the database • Precision measures retrieval accuracy • defined as the ratio between the number of retrieved relevant items and the number of total retrieved items • Recall and precision are usually considered together • high recall and low precision • high precision and low recall

Text Retrieval • Text may be used to annotate other media such as audio, images and video and conventional IR techniques used to retrieve multimedia information • Boolean IR systems or text-pattern search systems • Substantial effort is spent in analyzing the contents of the documents and in generating keywords and indices • Boolean queries are keywords connected with logical operators (AND, OR, NOT)

File Structures • Flat files • Inverted files • for each term a separate index is constructed that stores the document identifiers for all documents containing the term • each term and the document IDs containing the term are organized into one row • searching and retrieval is fast because only rows containing the query terms need to be retrieved and there is no need to search the whole database

Extensions • Nearness parameters used in query specification help define the topic more precisely and therefore increase probable relevance of the retrieved item • Within Sentence and Adjacency specification in queries • Term location information is included in the inverted file • Term i : document id, paragraph no., sentence no., word no. • For example, if an inverted file has the following entries: information: R99, 10, 8, 3; R155, 15, 3, 6; R166, 2, 3,1 retrieval: R77, 9, 7, 2; R99, 10, 8, 4; R166, 10, 2, 5

Indexing • Stop words -- grammatical functional words, such as “of,” “the,” and “a.” • Stemming -- reducing words to a common root form • Thesaurus -- list of synonyms • Weighting -- term significance derived from occurrence frequency within a document and among different documents

Relevance Feedback • Query modification • terms occurring in documents previously identified as relevant are added to the original query or the weight of such terms is increased • terms occurring in documents previously identified as irrelevant are deleted from the query or the weight of such terms is reduced • Document modification • terms in the query, but not in the user-judged relevant documents, are added to the document index list with an initial weight • weights of index terms in the query and also in relevant documents are increased by a certain amount • weights of index terms not in the query but in the relevant documents are decreased by a certain amount

Audio Search and Retrieval • Keywords can be highly subjective because of a different perspective or even a different taxonomy • Hard to browse directly since it must be heard in real-time (unlike video which can be keyframed) • Two categories : Speech and Non-speech • with speech, indexing and retrieval is based on obtaining spoken words either manually or by speech recognition technique • with non-speech, indexing and retrieval may be based on text annotation (but will it help a query like “find the first occurrence of the note G-sharp.”)

Multimedia Information Retrieval