E N D
1. 1 Video Indexing and Retrieval using an MPEG7 Based Inference Network Andrew Gravesandrew@dcs.qmul.ac.uk
2. 2
3. 3 Introduction
4. 4 Project Aims Metadata based retrieval using MPEG7
Assume we have the metadata
Build a modular retrieval system{Video analysis -> MPEG7 -> Video retrieval}
Exploit MPEG7 structure, context and concepts
5. 5 Background Information Retrieval
IR Models {Inference Network model}
Text retrieval: Indexing & Retrieval; Term Statistics
Structured Information Retrieval at QM & Dortmund
Multimedia
MPEG 1/2/4/7
MPEG7, “Multimedia Content Description Interface”
Video Indexing and Retrieval
{Annotation, Content, Metadata} based approaches
Assume we have the Metadata {Shot/Scene detection}
Feature extraction / Acquisition of Semantics
6. 6 MPEG7 { Description Definition Language (DDL), Descriptor (D) and Description Schemes (DS)
Just another XML format
7. 7 Inference Network Model Probabilistic Framework for IR that uses a Bayesian Network (so based on proven statistical theory)
Complete Network = Document Network + QueryNetwork+ Attachment + Evaluation
Complete Network used to estimate the “probability of relevance” for each Document Node
8. 8 Positioning Inference Network
Allows a “combination of evidence”
Allows hierarchical document nodes (structure)
MPEG7
Structural, conceptual & contextual info
So, we process DSs and Ds to form IN
9. 9 In Other Words... Build a Document Network that represents all of the Ds (concepts) and DSs (structure)
Attach a Query Network and evaluate
10. 10 MPEG7 Collection
11. 11 Collection
12. 12 Annotations “Abstract” from box
“StructuredAnnotation” for each scene to specify exactly participants and location
“FreeTextAnnotation” to describe action
“FreeTextAnnotation” with speech extracts
13. 13 MPEG7 Excerpt #1 <AudioVisual id="Communication Problems">
<MediaInformation/>
<MediaProfile/>
<CreationInformation>
<Creation>
<Title>Communication Problems
</Title>
<Abstract>
<FreeTextAnnotation>
It's not a wise man who entrusts his furtive winnings on the horses to a geriatric Major, but Basil bas never known for that quality. Parting with those ill gotten gains was Basil's first mistake; his second was to tangle with the intermittently deaf Mrs Richards.
</FreeTextAnnotation>
</Abstract>
<Creator>BBC</Creator>
</Creation>
<Classification>
<Genre>Comedy</Genre>
<Language>English</Language>
</Classification>
</CreationInformation>
14. 14 MPEG7 Excerpt #2 <SegmentDecomposition decompositionType="temporal" gap="true" id="TableOfContent" overlap="false">
<Segment id="A satisfied customer" xsi:type="AudioVisualSegmentType">
<TextAnnotation>
<FreeTextAnnotation>Basil receives a tip on a horse from a customer. Sybil warns Basil not to bet. Basil says Sybil is a dragon to Polly.</FreeTextAnnotation>
<StructuredAnnotation>
<Who>Basil,Sybil,Major,Polly</Who>
<Where>Lobby</Where>
</StructuredAnnotation>
</TextAnnotation>
<SegmentDecomposition decompositionType="temporal" gap="true" overlap="false">
<Segment id="Shot_1" xsi:type="AudioVisualSegmentType">
<TextAnnotation><FreeTextAnnotation>
Glad you enjoyed it. Polly will you get Mr Firkins bill please.
</FreeTextAnnotation></TextAnnotation>
<MediaTime><MediaIncrDuration timeUnit="PT1N25F">86</MediaIncrDuration>
</MediaTime>
</Segment>
</SegmentDecomposition>
<MediaTime>
<MediaIncrDuration timeUnit="PT1N25F">3028</MediaIncrDuration>
</MediaTime>
15. 15 Model
16. 16 Model Overview Document Network (built during indexing)
Static, contains information about the collection
Query Network (built during retrieval)
Query Language based upon INQUERY
Statistical operators (and approximations of Boolean)
Attachment process
Builds the “Complete Network”
Create DN->QN links where concepts are the same
Evaluation process
Calculate probability of relevance for each element
17. 17 Document Network Document Node layer. Created from MPEG7 structural aspects
Context Node layer. Provides contextual information
Concept Node layer. Contains all the contents present in collection
18. 18 Query Network 1 Query text is parsed to produce Query tree
Inverted DAG with a single final node
Terms & Operators
Boolean Operators: #and #or #not
Statistical Operators: #sum #wsum #max
Constraints: #constraint #tree
19. 19 {No; Simple; Complex} constraints
#constraint and #tree Query Network 2
20. 20 Attachment Attachment creates DN->QN links (at concept level)
Find candidate links & then consider constraints
Strength of link can be determined by closeness of match
Perform Tree Matching to find “Edit Distance” (ED)
Use ED by a) testing against threshold, b) reduce weight
21. 21 Evaluation After attachment we have formed the Complete Network
This is evaluated for every Document Node and resultant probabilities are used for ranking
All nodes required are evaluated using 1) Value of parents nodes 2) Conditional probabilities
Nodes may inherit parental contexts (Link Inheritance)
The parents outside the constraint may be ignored (Path Cropping)
22. 22 Extraction Structural Extraction. About the hierarchical makeup.
Attribute Extraction. Data about the structural elements.
Concept Extraction. Obtain the concepts that appear.
Text preprocessing
Luhn’s Analysis, Term Statistics
23. 23 Probability Estimation Probability document is relevant to the query
Conditional probabilities between the nodes
Context->Context (eg: Video->Scene)
Context->Concept
24. 24 Experiments
25. 25 Experiment Overview Software written in {C++ NT}
Not using INQUERY
1. Basic. Does the model work at all?
2. Real Data. Does the model work with our real metadata collection?
3. Metrics. What are the precision/recall metrics?
26. 26 Remember... Link Inheritance (LI)
Link Degradation (LID)
Tree Matching (TM)
Threshold (TMT): The attachment is made only if the constraint is met, and if the Edit Distance is below the specified threshold.
Weighted (TMW): The attachment is made if the constraint is met. The Edit Distance is used as a weight upon the DN->QN link.
Path Cropping (PC)
27. 27 Representations <Root>
<Operator Type="WSUM">
<Concept weight="0.2">breakfast </Concept>
<Concept weight="0.8">view </Concept>
</Operator>
</Root>
<Root>
<Operator Type="AND">
<Concept>
<Text>BBC</Text>
<Constraint>Creation </Constraint>
</Concept>
<Concept>Basil</Concept>
</Operator>
</Root>
28. 28 Experiment 1 <Root>
<Video id='Video1' Duration='1000' weight='1.000000'>
<CreationInformation weight='0.750000'>
<Creation weight='1.000000'>
<Concept weight='0.8' cid='1'>banana</Concept>
</Creation>
</CreationInformation>
<MediaInformation weight='0.750000'/>
<Scene id='Scene1' KeyFrame='none.jpg' Duration='400' weight='0.700000'>
<Shot id='Shot1' KeyFrame='none.jpg' Duration='100' weight='0.625000'/>
<Shot id='Shot2' KeyFrame='none.jpg' Duration='300' weight='0.875000'>
<Concept weight='0.7' cid='1'>banana</Concept>
</Shot>
</Scene>
<Scene id='Scene2' Duration='600' weight='0.800000'/>
</Video>
<Video id='Video2' weight='1.000000'/>
<Video id='Video3' weight='1.000000'/>
</Root>
<Root>
<Concept>
<Text>banana</Text>
<Constraint>CreationInformation</Constraint>
</Concept>
</Root>
29. 29 Experiment 1 Model works
Different levels of document granularity (Video/Scene/Shot) retrieved in same list
Parameters work but unclear if they help
30. 30 Experiment 2 Model worked with real collection to produce real results
Results were as expected given knowledge of material
31. 31 Experiment 3 Recall/Precision metrics calculated
Rank in results list (not result rank) used for analysis
Ten best Video/Scene/Shot chosen by author
Ranking seems good:
6/10 required in top 10
All 10 within top 93 (out of 362 in total)
Figures suggest that the model working effectively although this is not conclusive
32. 32 Discussion Size of collection too small to produce significant results. No known MPEG7 collections.
No independent queries with relevance assessments exist (obviously)
Software efficiency crucial - simplifying assumptions can be made to ensure that the IN is computationally viable. Size of computation is not proportional to size of collection.
33. 33 Concluding Remarks
34. 34 Concluding Remarks MPEG7 was found to contain useful “tools”
Model for VIR developed
Based on Inference Network, Built from MPEG7 files
Indexing captures structure, context and concepts
Retrieval done using Terms, Operators and Constraints
Model parameters devised
Results suggest that approach taken well founded although lack of data is problematic
35. 35 Next... Build an independent MPEG7 collection with relevance assessments etc!
Automatic methods for generating metadata
Eliminate bias, Increase consistency, Improve quality
Feature extraction etc. to produce Simple Semantics
Solve the Semantic Gap issue
Build metadata based models that exploit contextual information
Assume contextual information can help retrieval
Assume we have good metadata
Efficiency of the evaluation vital
36. 36 The End