140 likes | 269 Views
An Adaptive Machine Learning Framework with User Interaction for Ontology Matching. Hoai-Viet To 1 , Ryutaro Ichise 2 , and Hoai-Bac Le 1 1 Ho Chi Minh University of Science, Vietnam 2 National Institute of Informatics, Japan. Ontology Matching (OM) Problem.
E N D
An Adaptive Machine Learning Framework with User Interaction for Ontology Matching Hoai-Viet To1, Ryutaro Ichise2, and Hoai-Bac Le1 1 Ho Chi Minh University of Science, Vietnam 2 National Institute of Informatics, Japan
Ontology Matching (OM) Problem • Ontology is a hierarchical structure used to organize concepts. • Ontology plays an important role in semantic web development. • Ontology matching finds correspondences between concepts from two ontologies. • Ontology matching is an important process when we want to integrate heterogeneous information source in new semantic web environment.
Machine Learning Framework for OM • We introduced a machine learning framework for ontology matching problem in [Ichise, 2008] • Our hypothesis: the use of semi-supervised learning method will reduce the manual annotation cost. Cb1 Ca1 Ca2 Ca3 Cb2 Cb3 • Pre-alignment • Correct mapping: • Ca1 Cb1 • Ca2 Cb1 … • Incorrect mapping • Ca1 Cb2 • Ca1 Cb3…
Semi-supervised Learning with User Interaction • Basic idea: propagate label through unlabeled data • Problem: few samples of labeled data low confidence prediction. ? Red ? Blue User Interaction
Adaptive Machine Learning Framework • Use multiple learning strategies + user interaction Ontology Storage Ontology Parser Similarity Calculator Initialize User Interaction Pre-alignment labeling training training learner labeling learner labeling
Adaptive Machine Learning Framework • Similarity measures are based on those used in machine learning framework proposed in [Ichise, 2008], which: • include 24 string-based similarity measures • calculate similarity between: concept feature, concept structure feature, and concept hierarchical feature. • Our system: Machine Learning Framework for Ontology Matching with User Interaction (MalfomUI)
Experiments • Purpose: • Compare the performance of our learning framework with other matching systems. • General setting: • Dataset from directory track of OAEI 2008’s campaign. [Caracciolo et. al., 2008] • The dataset is constructed from three internet directories: Yahoo, Google, Looksmart. • Simple equivalent relation. • The dataset includes 4487 labeled matching tasks, in which there are 2160 positive samples and 2327 negative samples. • Base learner: Naïve Bayes
Experiments • Pre-Experiment – Supervised Learning method: • Used as baseline to compare with semi-supervised learning method. • Study the effect of training-set size on the performance of the supervised learning method.
Experimental Results • MalUI-5 to MalUI-4000: Training set size
Experiments • Experiment – Semi-supervised learning with user interaction • Study the performance of semi-supervised learning method with user interaction. • User annotate 20 samples at initialize phase and then label 4 samples more in 2 feedback round.
Experimental Results • MalUI-RF: Comparison with other matching systems [Caracciolo et. al., 2008]
Experimental Results • Semi-supervised learning with user feedback can reduce the cost of manual annotation. * In MalfomUI-RF experiment, users need to label 28 samples in total.
Conclusion • Conclusions: • Our adaptive machine learning framework is effective: it requires less annotation cost but gains approximately good performance. • Machine learning approaches with user interaction are promising for ontology matching systems. • Future works: • Integrate more similarity measures to cover real datasets. • Consider more complicate semi-supervised models.