220 likes | 387 Views
A Framework for Effective Annotation of Information from Closed Captions Using Ontologies. Authors: Latifur Khan, Dennis McLeod, Eduard Hovy. Presenter : Mohamed Mustafa Khimani. TOPICS. Introduction Related Work Content extraction Ontologies
E N D
A Framework for Effective Annotation of Information from Closed Captions Using Ontologies Authors: Latifur Khan, Dennis McLeod, Eduard Hovy Presenter : Mohamed Mustafa Khimani
TOPICS Introduction Related Work Content extraction Ontologies Metadata acquisition and management of metadata Experimental Implementation Conclusions
INTRODUCTION Keyword based techniques & use of query expansion mechanism Ontology-based model Extraction of semantic concepts from keywords Document Indexing Precision and Recall Effective selection/retrieval of audio information
RELATED WORK Query expansion through use of semantically related terms e.g. using WordNet Use of conceptual distance measure between query and document to model relevance
CONTENT EXTRACTION Fully automated content extraction – converting speech to equivalent text Selected content extraction – Word spotting An audio object is composed of a sequence of contiguous segments Audio object Oi is defined as: (Idi, Si, Ei, Vi, Ai) Vi (description) is a finite set of tags of labels e.g. {10, 1145.59, 1356.00, {Gretzky Wayne}, *}
ONTOLOGIES Ontology defines a set of representational terms called Concepts Interrelationships among these concepts describe a target world Ontology as a DAG Each node in DAG represents a concept Concept = Unique Label name + synonyms list(l1, l2, l3, …, li, …, ln) – user requests are matched with this li – an element of the list Interrelationships
NPC Region = League + Team + Player Disjointconcepts
ONTOLOGIES Each league and its team and player form a region During annotation of concepts – choose a particular region Due to the disjoint property, objects are associated with only one disjoint concept rather than two A player plays in several leagues Multiple instances of the player in ontology (sub-tree) Single instance with two parents (DAG)
Disjoint Is-A Part - Of
METADATA ACQUISITION Process through which descriptions are provided Extract concepts from keywords Concept scoring Stemming –comput – computer, computation, etc. Keyword : Concept – 1 : many Disambiguation – a set of keywords occurring together determine a context for one another Disambiguation methods: Co-occurrence (disambiguate across several regions) Semantic closeness (disambiguate within a region)
METADATA ACQUISITION E.g. Lakers keep grooving with 8th straight win. Kobe Bryant scores 21 points as the Lakers remain perfect on their eastern road trip with a 97-89 triumph over the Nets. Bryant discussed the eight game win streak and his performance in the All Star game. Lakers – Los Angeles Lakers Nets – New Jersey Nets Bryant – Reeves Bryant, Bryant Mark, Bryant Kobe Eastern – Eastern Washington & Eastern Michigan
FORMAL DEFINITIONS Element – Score (Escore) – element lj for a particular concept Ci Concept – Score (Score) Scorei = max Escoreij where 1<=j<=n Region – Score (CscoreR) – For a Region R, is the summation of Concept-Score of selected concepts that are belonged to this region Semantic Distance (SD (Ci,Cj) – shortest path between two concepts Ci and Cj
FORMAL DEFINITIONS Propogated-score(Si) Si = Scorei + Scorej/SD(Ci, Cj) + ….. Smax – For an object, Smax is the largest score of all its selected concepts propagted-score Si Threshold-score(γscore) – Threshold score for an object is a certain fraction of its Smax. Smax * threshold-constant(0-1) For high values of threshold, we may lose some relevant concepts and at the same time discard many irrelevant concepts
CHARACTERISTICS Relevant concepts may be discarded along with irrelevant ones because relevant concepts may not correlate with other concepts – Si will be low If there is no correlation, the algorithm fails to resolve ambiguity – we keep all the selected concepts Due to incompleteness of Ontologies, some irrelevant concepts may be associated Disambiguation fails to disambiguate concepts when there is little or no correlation among the concepts selected
IMPLEMENTATION Total number of clips – 2,481 Maximum length of clip – 5 min Average size of closed caption for a clip – 25 words Total # of concepts in ontologies – 7,000 Average # of concepts associated with an object -4.47
CONCLUSION Ontology proposed can be used to generate information selection requests in database queries Can be extended to video with closed captions Better than keyword-based technique Cost of building domain specific ontologies and connecting domain data to them automatically Evolving ontologies, extracting highlighted sections of audio, addressing retrieval questions in the video domain, facilitation of cross-media indexing