280 likes | 468 Views
Speeding Up Batch Alignment of Large Ontologies Using MapReduce. Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia. Introduction. Ontology : formalize the knowledge of a domain by means of defining concepts and properties that relate them.
E N D
Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia
Introduction Ontology: formalize the knowledge of a domain by means of defining concepts and properties that relate them
Problem Definition: Ontology Alignment The ontology alignment problem: find a set of correspondences between two ontologiesO1 = < V1, E1, L1 > and O2 = < V2, E2, L2 >. • Ontology • V: Set of Labeled Vertices • E: Set of Edges • Set of ordered 2-subset of V • L: Mapping from each edge to its label • A correspondence maα between xaϵO1 and yαϵO2 • Relation • Confidence
Ontology Alignment Challenges Efficiency / Quality Efficiency / Quality • Improving the Alignment Quality • Structural & lexical disparity • Improving the Alignment Efficiency • Quickly producing quality alignment • Improving the Scalability Resources Ontology Sizes
Space of Alignments & Alignment between many-to-many one-to-many one-to-one Alignment Space Size: Evaluating An Alignment: Cartesian Product of entities
Space of Alignments & Alignment between Bipartite graph many-to-many one-to-many one-to-one Alignment Space Size: Evaluating An Alignment: Cartesian Product of entities
Large Ontology Matching O1 O2 P21 P22 P23 P11 P12 P13 4 blocks • Reduction of alignment space • Early pruning of dissimilar element pairs • aflood(Hanif and Masaki ‘09) • Partition based matching • Falcon-AO (Jian et. al. ‘05) • Parallel matching • MapPSO (Bock and Hettenhausen ‘10) • VDoc+ (Zhang ‘12)
Batch Alignment of Large Ontologies Approach allows any alignment algorithm to be utilized on a MapReduce architecture • Scalability is challenging • OAEI 2012 - Very Large Biomedical Ontology Track • 8 out of 21 tools completed • Ontology repositories (e.g., NCBO at Stanford) • Batch alignment of ontologies • New ontologies posted • Ontologies get updated
Contributions: Batch Alignment of Large Ontologies General & Novel ApproachTo speed up batch alignment of large ontologies using MapReduce • No impact to alignment quality for some algorithms • Benefits ontology repositories
MapReduce Framework output Key-> Output Value Key-> <Value1, Value2> Key-> Value Key identifies a subproblem
MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22
MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22
MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22
MapReduce Framework O1 O11 O21 O31 • O2 • O12 • O22
Mapper & Reducer Algorithms MAP • ← parse the Value in the record • emit() • emit(,) REDUCE • ← align using an alignment algorithm • emit(,)
Identifying Alignment Subproblems Entities from one cluster are predominantly in correspondence with entities in one other cluster • Approach: Hamdi et al. 2010 • Identify anchors: entity pairs with identical names or labels • Cluster concepts around the anchors • Using structural neighborhood
Merging Subproblem Alignments Crisscross mappings • Correspondence1: • Correspondence2: • & • is a subclass of and is a subclass of inconsistent • We remove the one with the lower confidence score while merging. Redundant mappings • Correspondence1: • Correspondence2: • & • is a subclass of inconsistent • We remove
Performance Evaluation Falcon-AO Optima+ LogMap YAM++ • Datasets • Conference track from OAEI (120 pairs) • Large ontologies from OAEI (SNOMED, NCI, ... 5 pairs) • New biomedical ontology testbed (50 pairs from NCBO) • Algorithms • Compare F-measure & runtime • Default setup on a single node • MapReduce setup using Hadoop (12 nodes each with 24 2GB & 2GHz Intel Xeon processors)
Results – 3 Datasets Conference Large OAEI Biomedical
Results – Large OAEI ontologies • Other Datasets • LogMap & Yam++ : • Tradeoff is in the alignment quality • Falcon-AO & Optima+: • No change in output • Conference Track • No partitioning • No change in output
Discussion • First inter-matcher parallelization approach • Especially using MapReduce • Exhibits significant speedup for batch alignment • Some algorithms may find small reduction in alignment quality due to the partitioning • Significant speedup for single ontology pair • Falcon-AO, Optima+ & YAM++ • Any alignment algorithm can fit in our framework
Thank you Questions ?
Parallel Alignment of Large Ontologieson A Computing Cluster • Current Divide and Conquer Approaches • Heavily rely on structure • Size based partitioning techniques are not effective • Current Parallel Matching algorithms • Parallelize the process within the algorithms • Do not support multi node – cluster architecture