460 likes | 472 Views
This presentation discusses the implementation and progress of a hierarchical framework for content-based image retrieval. It explores the advantages, problem formulation, knowledge encoding, remaining work, and future possibilities of the framework.
E N D
Hierarchical Framework for Content Based Image RetrievalStatus and Implementation Bob Curry
Agenda • BRIEF Review of work to date • Progress since last presentation • Overview of implementation • Remaining work • Future Possibilities
Proposed Solution – Hierarchical Framework • Use complementary strengths of RSC and RIE • Create a small domain-specific RSC database (domain database) that will process semantic queries relevant to that problem domain and will generate a set of example images. • Send the set of example images to the larger RIE database (target database) to get matched images From Previous Presentation
Advantages of Hierarchical Framework • Provides a semantically-relevant method of querying an image database • Decouples the knowledge organization from the image matching mechanism • Requires expert involvement to encode knowledge of smaller set of data • Images and image matching algorithms in the large target database can change and improve with no impact to knowledge • Multiple, distributed domain databases can be used against one target database From Previous Presentation
Hierarchical CBIR Framework Diagram From Previous Presentation
Problem Formulation • In order for this framework to be realized, several problems must be solved: • Definition and method for capturing domain knowledge that allows semantic queries • Method to allow semantic features in domain to be mapped to an example image • Definition of an interface to an RIE image retrieval system that possesses the necessary properties. • Development of a query language to specify desired semantic values for image retrieval • The solution of these problems will be the contribution of this work to the general knowledge in CBIR From Previous Presentation
Knowledge Encoding for the Problem Domain • Various possible ways to encode knowledge • This approach uses • Structured Annotation (agent, action, object) to specify semantic values of interest • Shown to be effective in semantically describing visual data • Domain-Specific Ontology to represent all agents and objects • Ontology intrinsically codes general-to-specific relationships • Spatial Relationships to represent all actions • Eliminates need to develop example images for all possible actions between all possible agents and objects. From Previous Presentation
Remaining Work • Refine mechanism to transform from action specification to translation/rotation data • Create prototype image synthesis process – wring out problems – generate images from examples • Expand example to fully populate ontology with instances including images. Do full end-to-end scenarios From Previous Presentation
Additional Comments from Previous Presentation • In addition to Agents, Actions, Objects, use Setting as a relevant concept • Develop a method to select the best image from group of possible images
Accomplishments since last Presentation • Refined mechanism to transform from Action specification to image clip placement • Defined and implemented all data structures required in ontology to support the approach • Developed general methodology and specific implementation for clip image pruning based on • Image knowledge • Action Constraints • Created support for Setting information and Setting images • Defined format and algorithm for queries
Accomplishments since last Presentation • Created implementation with capabilities including: • Creating clip images • Processing query results from ontology • Synthesizing example images from queries • Tested example scenarios on simplified knowledge base of geometrical shapes • Generated example images as expected • Confirmed proper operation of all elements including pruning process
Overview of Implementation • Consists of three main parts • Domain Database (Win2000) • Protégé ontology • Example domain images • Processing Module (Win2000) • MS Visual C++ code • Interpretation of query results • Image selection • Image synthesis • Target Database (Linux) • GIFT image matching system • Simulated heterogeneous images
Query Definition Query [Agent; Action;] Object [;Setting] Agent Ont_Descriptor Object Ont_Descriptor Setting SettingImages::ImageName Action Actions::ActionName Ont_Descriptor Term [, (Ont_Descriptor)] Term ClassName::SlotName (= | < | > ) Value Value String | Integer | Real | Enum An example of a query is: GeometricShape::Edges > 2,GeometricShape::Verticies > 3; RightOf; GeometricShape::NAME = AcuteTriangle1; Outdoors
Image Synthesis Expanded • Tasks are • Obtain Query Results • Agent Image Descriptors • Object Image Descriptors • Setting Image Descriptor • Action Descriptor • Prune Query Results • Synthesize Images
Descriptors Class CClipImage : public CImage { CString ImageName; CString FileName; CKnowledgeDef MyKnowledgeDef }; Class CKnowledgeDef { Direction Direction_Front; Direction Direction_Top; }; Class CAction { int DeltaX; int DeltaY; int DeltaZ; double Theta; CConstraintDef MyConstraintDef; }; Class CConstraintDef { // Orientation of Agent toward Object Orientation AgentToObjectOr; //Orientation of Object toward Agent Orientation ObjectToAgentOr; };
Descriptors enum Direction { Left = -3, In = -2, Down = -1, None = 0, Up = 1, Out = 2, Right = 3 }; enum Orientation { Top, Front, Back, Bottom, LSide, RSide, Any };
Pruning – Image Knowledge/Action Constraint • Semantically relevant image knowledge is linked to each image • Example: • In a passport photo, • Direction_Front = Out; (The front of the object is pointing out of the image) • Direction_Top = Up; (The top of the object is pointing up in the image) • In a silhouette, • Direction_Front = Right; (or left) • Direction_Top = Up;
Pruning – Image Knowledge/Action Constraint • Semantically relevant constraints on image knowledge between Object and Agent are specified in Action descriptor • Example: • If the Action between two people is “talking” it is appropriate to consider only images where the two people are facing each other. • AgentToObjectOr = Front; (Front of Agent is toward Object) • ObjectToAgentOr = Front; (Front of Object is toward Agent) • Requires that: Direction(AgentToObjectOr) = - Direction(ObjectToAgentOr)
Pruning – Image Knowledge/Action Constraint • Putting it together • Each pair of candidate images obtained from query is examined to see if the direction knowledge meets the orientation constraint. If so, that pair can be combined. Agents Objects Front = Right Top = Up Front = Out Top = Up Front = In Top = Up Front = Left Top = Up
Test Ontology • Simple 2-D Geometric Shapes • Triangles, quadrilaterals, etc. • Allows simple example images – finite set • Includes semantic attributes such as “short”, “long”, etc • Supports demonstration of all important aspects of synthesis • Synthesized images should be less susceptible to shortcomings of Target Database matching algorithms. • Example Query statement: • “Show all images of a fat cylinder under and touching a narrow red cone”
Example Cases • Simple Presence Representation • No Action term - DONE • Single Agent, Single Object • Simple synthesis – no pruning - DONE • Multiple Agents and Objects • No Pruning required - DONE • Pruning - DONE • Compound Queries • AND • OR • BUTNOT
Remaining for me • Consider changing from current DeltaX, DeltaY, DeltaZ and theta image placement to relative spatial relationships defined by Allen (Virtual Images for Similarity Retrieval) • Solves a problem of specifying touching images • More intuitive • Add Image Knowledge/Action Constraints to qualitatively classify distance of image object from observer • Create and test a realistic problem domain • Weapons and Targets – simulating battlefield image info • “Show images of missile chasing airplane over desert” • Incorporate results and recent changes into thesis
Further work possible • Automation of all system interfaces • A complete turn-key test bed for future use • Expand concept beyond unary and binary (Agent – Object) to n-ary object synthesis • Includes associated enhancement of queries • Generalized implementation of image knowledge/action constraints • This could be different for different problem domains
Problem Synthesized Image Image Synthesizer Query Image database Knowledge base
Future Work Dipti’s Pitch
object Two wheelers bike motorbike hat man agent Example (Bob’s system) Sample database Ontology man hat bike • Role • Wears(agent, object) = object above agent • Rides(agent,two-wheeler) = agent above two wheeler
Example (simple queries) Query: man wears hat Query: man rides a bike
object Two wheelers bike motorbike hat girl agent Example (Dipti’s system) Sample database Ontology man hat Rides(Girl, two-wheeler) • Role • Wears(agent, object) = object above agent • Rides(agent,two-wheeler) = agent above two wheeler
Example (composite queries) Query: girl rides a bike and wears a hat
Challenges • Representing the semantic content of an image • Identifying objects in an image • Resolving queries • Synthesizing images
Representing the semantic content of an image • Use Description Logic to represent the content of the image in a way that it is hierarchical and compositional • What is DL ? It’s a language that allows reasoning about information, in particular supporting the classification of descriptors Reference: CLASSIC, Meghini(image description using DL),GALEN.
Description Logic • A DL models a domain in terms of 3 things: • INDIVIDUALS • which represent the instances of objects which we are modeling • CONCEPTS • descriptions of a class of objects which represent the same thing • ROLES • the relationship between Concepts & Roles
Example • Concept Example • person represents all human beings • fruit represents all the fruits • Individual Example • Man, woman are individuals of the concept person • Banana is an individual of the concept fruit • Role Example • Eat(person,fruit) is a relationship describing person and something they are eating
DL • Using these small blocks we can build more complex expressions • Example • Eat(person, fruits) • Eat(Person, fruits) & Sits(Person,Chair) • Example • cool-student = student & drives(student, ferrari)
Reasoning with DL • Subsumption ( ) • Basic inferencing tool • Checks whether a concept is more general than other • Example: • mother woman
Reasoning with DL Classification • Collection of descriptions can be classified using subsumption, providing a hierarchy of descriptions ranging from general to specific. Example person driving car and wearing hat person driving car person New Concept: person wearing hat ? person driving car and wearing hat person driving car Person person wearing hat Person
Automatic Classification • person person driving car person driving car and wearing hat New concept or query: person wearing hat ???
Architecture of DL • DL is described as being split into 2 parts. T-Box & A-Box T- Box => Subsumption & Classification A-Box => Reasons about relationships between individuals thus providing classification and retrieval Eg: mammal vehicle person dog person wearing cap person driving bus bus bike person wearing cap & driving bus Mary’s neighbor driving nimbus Ted
Identifying objects in an image • Segment sections of images and associate them with concepts 2 men standing mountains
man agent object Two wheelers bike motorbike hat girl Resolving queries Sample database Ontology man wears rides hat Rides(Girl, two-wheeler) • Role • Wears(agent, hat) = hat above agent • Rides(agent,two-wheeler) = agent above two wheeler
Resolving queries Query: girl rides a bike and wears a hat • 1- attempt to find an image describe with the query (no synthesizing needed), if not found then • 2- break query into components (synthesizing needed) • Girl rides bike, girl wears a hat, if not found • 3- break query into components (synthesizing needed) • Girl, bike, hats,actions…(for actions, we can have a geo-spatial modeling which maps the action..can use Bob’s definitions here.
Synthesizing images • Issues • find a method to segment out the image content regions • Scaling issues. Ex we have an image of a big hat and a small man…..
To do • Protégé + OIL Adding OIL tag to Protégé, supports the transformation rules of knowledge representation as shown in the papers on DL. • FaCT reasoner Fast Classification of Terminologies is a DL classifier which can be used for concept validity testing. • Query interface • Explore the idea of using image models instead of images.