300 likes | 413 Views
Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags. Jinhui Tang † , Shuicheng Yan † , Richang Hong † , Guo -Jun Qi ‡ , Tat- Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign. Outline. Motivation
E N D
Inferring Semantic Concepts from Community- Contributed Images and Noisy Tags Jinhui Tang†, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign
Outline • Motivation • Sparse-Graph based Semi-supervised Learning • Handling of Noisy Tags • Inferring Concepts in Semantic Concept Space • Experiments • Summarization and Future Work
Our task No manual annotation are required.
Methods Can be Used • With models: • SVM • GMM • … • Infer labels directly: • k-NN • Graph-based semi-supervised methods
Normal Graph-based Methods • A common disadvantage: • Have certain parameters that require manual tuning • Performance is sensitive to parameter tuning • The graphs are constructed based on visual distance • Many links between samples with unrelated-concepts • The label information will be propagated incorrectly. • Locally linear reconstruction: • Still needs to select neighbors based on visual distance
Key Ideas of Our Approach • Sparse Graph based Learning • Noisy Tag Handling • Inferring Concepts in the Concept Space
Why Sparse Graph ? • Human vision system seeks a sparse representation for the incoming image using a few visual words in a feature vocabulary. (Neural Science) • Advantages: • Reducethe concept-unrelated links to avoid the propagation of incorrect information; • Practical for large-scale applications, since the sparse representation can reduce the storage requirement and is feasible for large-scale numerical computation.
Normal Graph v.s. Sparse Graph Normal Graph Construction. Sparse Graph Construction.
Sparse Graph Construction • The ℓ1-norm based linear reconstruction error minimization can naturally lead to a sparse representation for the images *. • The sparse reconstruction can be obtained by solving the following convex optimization problem: minw||w||1 , s.t.x=Dw w ∈ Rn : the vector of the reconstruction coefficients; x∈ Rd : feature vector of the image to be reconstructed; D∈ Rd*n (d < n) : a matrix formed by the feature vectors of the other images in the dataset. * J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma. Robust face recognition via sparse representation. IEEE Transaction on Pattern Analysis and Machine Intelligence, 31(2):210–227, Feb. 2009
Sparse Graph Construction (cont.) • Handle the noise on certain elements of x: • Reformulate x = Dw+ξ, where ξ ∈ Rd is the noise term. • Then : • Set the edge weight of the sparse graph:
Semi-supervised Inference • Result:
Semi-supervised Inference (cont.) • The problem with : • Muu is typically very large for image annotation • It is often computationally prohibitive to calculate its inverse directly • Iterative solution with non-negative constraints: • may not be reasonable since some samples may have negative contributions to the other samples • Solution: • Reformulate: • The generalized minimum residual method (usually abbreviated as GMRES) can be used to iteratively solve this large-scale sparse system of linear equations effectively and efficiently.
Different Types of Tags √: correct; ?: ambiguous; m: missing
Handling of Noisy Tags • We cannot assume that the training tags are fixed during the inference process. • The noisy training tags should be refined during the label inference. • Solution: adding two regularization terms into the inferring framework to handle the noise:
Handling of Noisy Tags (cont.) • Solution: • Set the original label vector as the initial estimation of ideal label vector, that is, set , and then solve and we can obtain a refined fl. • Fix fl and solve • Use the obtained to replace the y in the previous graph-based method, and we can solve the sparse system of linear equations to infer the labels of the unlabeled samples.
Why Concept Space? • It is well-known that inferring concepts based on low-level visual features cannot work very well due to the semantic gap. • To bridge this semantic gap • Construct a concept space and then infer the semantic concepts in this space. • The semantic relations among different concepts are inherently embedded in this space to help the concept inference.
The requirements for the concept space • Low-semantic-gap: Concepts in the constructed space should have small semantic gaps; • Informative: These concepts can cover the semantic space spanned by all useful concepts (tags), that is, the concept space should be informative; • Compact: The set including all the concepts forming the space should be compact (i.e., the dimension of the concept space is small).
Concept Space Construction • Basic terms: • Ω : the set of all concepts; • Θ : the constructed concept set. • Three measures: • Semantic Modelability: SM(Θ) • Coverage of Semantic Concept Space: CE(Θ, Ω) • Compactness: CP(Θ)=1/#(Θ) • Objective:
Solution for Concept Space Construction • Simplification: fix the size of the concept space. • Then we can transform this maximization to a standard quadratic programming problem. • See the paper for more details.
Inferring Concepts in Concept Space • Image mapping: xi D(i) • Query concept mapping: cxQ(cx) • Ranking the given images:
Experiments • Dataset • NUS-WIDE LiteVersion (55,615 images) • Low-level Features • Color Histogram (CH) and Edge Direction Histogram (EDH), combine directly. • Evaluation • 81 concepts • AP and MAP
Experiments Ex1: Comparisons among Different Learning Methods
Experiments Ex1: Comparisons among Different Learning Methods
Experiments • Ex2: Concept Inference with and without Concept Space
Experiments Ex3: Inference with Tags vs. Inference with Ground-truth We can achieve an MAP of 0.1598 by inference from tags in the concept space, which is comparable to the MAP obtained by inference from ground-truth of training labels.
Summarization • Exploited the problem of inferring semantic concepts from community-contributed images and their associated noisy tags. • Three points: • Sparse graph based label propagation • Noisy tag handling • Inference in a low-semantic-gap concept space
Future Work • Training set construction from the web resource