210 likes | 329 Views
Adding Domain-Specific Knowledge. Amit Singhal & Jiebo Luo Research Laboratories Eastman Kodak Company FUSION 2001, Montreal August 7-10, 2001. Outline of Talk. Problem Statement Background Relevant Prior Art Evidence Fusion Framework Automatic Main Subject Detection System
E N D
Adding Domain-Specific Knowledge Amit Singhal & Jiebo Luo Research Laboratories Eastman Kodak Company FUSION 2001, Montreal August 7-10, 2001
Outline of Talk • Problem Statement • Background • Relevant Prior Art • Evidence Fusion Framework • Automatic Main Subject Detection System • Injecting Orientation Information • Feature detectors • Conclusions • Future Work
Main Subject Detection • What Is the Main Subject in A Picture? • 1st-party truth (the photographer): in general not available due to the specific knowledge the photographer may have about the setting • 3rd-party truth: in general there is good agreement among 3rd-party observers if the photographer successfully used the picture to communicate his interest in the main subject to the viewers
Related Prior Art • Main subject (region-of-interest) detection • Milanese (1993) : Uses biologically motivated models for identifying regions of interest in simple pictures containing highly contrasting foreground and background. • Marichal (et al.) (1996), Zhao (et al.) (1996) : Use a subjective fuzzy modeling approach to describe semantic interest in video sequences (primarily video-conferencing). • Syeda-Mahmood (1998) : Uses a color-based approach to isolate regions in an image likely to belong to the same object. Main application is reduction of search space for object recognition • Evidence Fusion • Pearl (1988) :Provides a theory and evidence propagation scheme for Bayesian networks. • Rimey & Brown (1994) : Use Bayesian networks for control of selective perception in a structured spatial scene. • Buxton (et al.) (1998) : Use a set of Bayesian networks to integrate sensor information to infer behaviors in a traffic monitoring application.
The Evidence Fusion Framework • Region based representation scheme. • Virtual belief sensors map output of physical sensors and algorithmic feature detectors to probabilistic space. • Domain knowledge used to generate network structure. • Expert knowledge and ground truth-based training methodologies to generate the priors and the conditional probability matrices. • Bayesian network combines evidence generated by the sensors and feature detectors using a very fast message passing scheme.
Bayesian Networks • A directed acyclic graph • Each node represents an entity (random variable) in the domain • Each link represents a causality relationship and connects two nodes in the network • The direction of the link represents the direction of causality • Each link encodes the conditional probability between the parent and child nodes • Evaluation of the Bayes network is equivalent to knowing the joint probability distribution
Automatic Main Subject Detection System • An Interesting Research Problem • Conventional wisdom (or how a human performs such a task) • Object Segmentation -> Object Recognition -> Main Subject Determination • Object recognition is an unconstrained problem in consumer photographs • Inherent Ambiguity • 3rd party probabilistic ground truth • Large number of camera sensors and feature detectors • Speed and performance scalability concerns • Of extreme industrial interest to digital photofinishing • Allows for automatic image enhancements to produce better photographic prints • Other applications such as • Image compression, storage, and transmission • Automatic image recompositing • Object-based image indexing and retrieval
Overview • Methodology • Produce a belief map of regions in the scene being part of the main subject • Utilize a region-based representation of the image derived from image segmentation and perceptual grouping • Utilize semantic features (human flesh and face, sky, grass) and general saliency features (color, texture, shape and geometric features) • Utilize a Bayes Net-based architecture for knowledge representation and evidence inference • Dealing with Intrinsic Ambiguity • Ground truth is “probabilistic” not “deterministic” • Limitations in our understanding of the problem • Dealing with “Weak” Vision Features • Reality of the state-of-the-art of computer vision • Limited accuracy of the current feature extraction algorithms
Injecting Metadata into the System • Sources of metadata • Camera : Flash fired, Subject distance, Orientation etc. • IU Algorithms : Indoor/Outdoor, Scene type, Orientation etc. • User annotation • The Bayesian network is very flexible and can be quickly adapted to take advantage of available metadata • Metadata enabled knowledge can be injected into the system using • Metadata-aware feature detectors • Metadata-enhanced Bayesian networks
Orientation • Main difference between orientation-aware and orientation non-aware systems is in the location features
Borderness Feature • Orientation Unaware • a=b=c=d=e • Orientation Aware • a < b < c < d < e
Orientation Aware Bayesian Network • Use orientation aware centrality and borderness features • Other feature detectors affected by orientation but not retrained: • sky, grass • Not retrained if BN is used for main subject detection as the location features would account for the orientation information • Using orientation information to compute the sky and grass evidence would lead to better performance for a sky or grass detection system. • Retrain the links in the Bayesian network for each feature affected by orientation information • BorderA-Borderness • BorderD-Borderness • Borderness-Location • Centrality-Location • Location-MainSubject
Conclusions and Future Work • Bayesian networks offer the flexibility of easily incorporating domain specific knowledge such as orientation information into the system • This knowledge can be added by : • modifying the feature detectors • using new feature detectors • changing the structure of the Bayesian network • retraining the conditional probability matrices associated with the Bayesian network • Directions for Future Work • Use of additional metadata such as indoor/outdoor, urban/rural, day/night • Single super BN versus a library of metadata-aware BNs?