SIMILARITY OF WEIGHTED DIRECTED ACYCLIC GRAPHS

SIMILARITY OF WEIGHTED DIRECTED ACYCLIC GRAPHS Presented By: Jing Jin Supervisors: Dr. Virendra C. Bhavsar and Dr. Harold Boley Aug. 31, 2006

Outline • Introduction • Schema Matching • E-Business & Multi-agent System • Tree and Graph Similarity Techniques • The wDAG Similarity Algorithm • wDAG Representation • The Flow Chart • Main Functions • The Data Structure for Reusing Computation Results • Illustrative Example • Comparison of Typical Schema Matching Algorithms • Computational Results • Conclusion • Future Work • References

IntroductionSchema Matching Fig. 1 The match operation [30] • Schema matching is a fundamental problem in many application domains, such as e-Business, semantic web, data integration and semantic query processing.

IntroductionE-Business & Multi-agent System • The wide acceptance of advanced technologies from the semantic web and web services in e-Business. • A multi-agent system can provide a virtual marketplace for seller and buyer agents to conduct e-Business activities, where the match-making between them is a crucial step.

IntroductionTree and Graph Similarity Techniques • Much research has been done on tree [5, 21, 27, 32, 37, 39, 40] and graph [22, 23, 36, 41] similarity techniques. • The weighted tree similarity algorithm [5] Fig. 2 An example of two weighted trees describing second hand car information

IntroductionTree and Graph Similarity Techniques (cont’d) • The tree edit distance algorithm [21, 27, 39] Fig. 3 A tree edit distance example

Similarity flooding [23] IntroductionTree and Graph Similarity Techniques (cont’d) Fig. 4 The sequence of steps to determine the correspondences between tables and columns in S1 and S2

CUPID [22] IntroductionReusing Schema and Mapping Information Fig. 5 An example of lazy schema tree expansion

COMA [11, 31] IntroductionReusing Schema and Mapping Information (cont’d) Fig. 6 An example of reusing previous similarity value in COMA

Fig. 3 Weighted tree representation of a purchase

Fig. 4 The weighted tree from Fig. 3 folded into a wDAG representation

The wDAG Similarity Algorithm wDAG Representation Fig. 5 wDAG serialization in weighted OO RuleML

The wDAG Similarity AlgorithmThe Flow Chart Fig. 6 The flow chart of the wDAG similarity algorithm

The wDAG Similarity AlgorithmMain Functions Fig. 7 The comparison between wDAGs A and B

The wDAG Similarity AlgorithmMain Functions:wDAGsim(g, g’) The equation embedded in the main function wDAGsim(g, g’) that computes the similarity of two (sub)wDAGs is shown below.

The wDAG Similarity AlgorithmMain Functions:wDAGplicity(g) The equation embedded in the wDAGplicity(g) that computes the simplicity of the missing wDAG is shown below. where, DI: depth degradation index, 0 < DI  1.0 DF: depth degradation factor, 0 < DF  0.5 m: breadth of the wDAG g that is not a leaf. DI= 1.0 DF =0.5

The wDAG Similarity AlgorithmThe Data Structure for Reusing Computation Results • Two-dimensional Matrix • Adjacency Lists

The Data Structure for Reusing Computation ResultsTwo-dimensional Matrix • The matrix would be a sparse one which contains many NULL values. Table 1 The two-dimensional matrix corresponding to the wDAGs in Fig. 7 Table 3 Single-dimensional array of wDAG B Table 2 Single-dimensional array of wDAG A

The Data Structure for Reusing Computation ResultsAdjacency Lists Fig. 8 The structure of adjacency lists Fig. 9 The corresponding adjacency lists of the example in in Fig. 7

Illustrative Example

Table 4: Comparison of Typical Schema Matching Algorithms

Table 4: Comparison of Typical Schema Matching Algorithms (cont’d)

Computational ResultsExample Set 1

Computational ResultsExample Set 1 (cont’d)

Example Set 1 (cont’d)

Example Set 2 Example 1 Example 2

Example Set 2 (cont’d) Example 2 Example 3

Example Set 2 (cont’d) Example 4 Example 5

Remarks • The size of OO RuleML source file of wDAG is smaller than that of weighted tree because wDAG eliminates the duplicate copies of shared substructures. And the time complexity of the wDAG algorithm is much lower than that of the weighted tree algorithm.

Conclusion • Mechanisms of schema matching approaches have been reviewed, and relevant tree and graph similarity techniques have been studied. • A wDAG similarity algorithm for the match-making in e-Business environments have been proposed. If two wDAGs are intuitively more similar, our similarity value is higher, otherwise, it is lower. • The intermediate similarity and simplicity values of shared sub-wDAGs can be reused efficiently through the adjacency lists structure. • The applicability and efficiency of the weighted tree similarity algorithm have been improved. • To our knowledge, in the field of match-making in e-Business environments, wDAG representations and their similarity measures have not been studied before.

Future Work • The test data is currently developed manually. A tool should be developed to provide a graphical user interface for users to define such data, like the Teclantic application of the AgentMatcher group (http://teclantic.ca/). • Improvements should be made on the simplicity function. One way is to replace the arithmetic mean by the geometric mean, as currently explored by the AgentMatcher group. • The proposed algorithm outputs the similarity between two wDAGs. The similarity values for multiple wDAGs should be ranked by some sorting algorithm, like bubble sort, just as the AgentMatcher group has already implemented ranking for tree similarity.

References • Allali, J. and Sagot, M.F., A New Distance for High Level RNA Secondary Structure Comparison, IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 2, No. 1, pp. 3-14, 2005. • Batini, C., Lenzerini, M. and Navathe, S.B., A comparative analysis of methodologies for database schema integration, ACM Comput Surv, 18(4), pp. 323–364, 1986. • Bergamaschi, S., Castano, S. and Vincini, M., Semantic integration of semistructured and structured data sources, ACM SIGMOD Record, 28(1), pp. 54–59, 1999. • Berners-Lee, T., Hendler, J. and Lassila, O., The semantic web, Sci. Am, 284(5), pp. 34–43, 2001. • Bhavsar, V.C., Boley, H. and Yang, L., A Weighted-Tree Similarity Algorithm for Multi-Agent Systems in e-Business Environments, Computational Intelligence, 20(4), pp.584-602, 2004. • Boley, H., Bhavsar, V.C., Hirtle, D., Singh, A., Sun, Z. and Yang, L., A match-making system for learners and learning Objects, Learning & Leading with Technology, International Society for Technology in Education, Eugene, OR, 2(3), pp. 171–178, 2005. • Boley, H., Object-Oriented RuleML: User-level roles, URI-grounded clauses and order-sorted terms, Springer-Verlag, Heidelberg, LNCS-2876, pp. 1-16, 2003. • Bouquet, P., Serafini, L. and Zanobini, S., Semantic coordination: A new approach and an application. Proceedings of the International Semantic Web Conference (ISWC), pages 130–145, 2003. • Castano, S., Antonellis, D.V. and Vimercati, S., Global viewing of heterogeneous data sources, IEEE Trans Data Knowl Eng, 13(2), pp. 277–297, 2001. • Deo, N., Graph Theory with Applications to Engineering and Computer Science, Prentice Hall of India Private Limited, twelfth printing, April 1997. • Do, H.H. and Rahm, E., COMA – A system for flexible combination of schema matching approaches, Proceedings of 28th International Conference on Very Large Databases (VLDB), Hong Kong, pp. 610-621, 2002. • Do, H.H., Melnik, S. and Rahm, E., Comparison of Schema Matching Evaluations, Web, Web-Services, and Database Systems, pp. 221-237, 2002. • Doan, A.H., Domingos, P. and Levy, A., Learning source descriptions for data integration, Proc. WebDB Workshop, pp. 81–92, 2000. • Elmagarmid, A.K. and Pu, C., Guest editors’ introduction to the special issue on heterogeneous databases, ACM Comput Surv, 22(3), pp. 175–178, 1990. • Giunchiglia, F. and Zaihrayeu, I., Making peer databases interact - a vision for an architecture supporting data coordination. Proceedings of the International workshop on Cooperative Information Agents (CIA), pages 18–35, 2002. • JDOM [homepage of JDOM] [on line] available at:http://www.jdom.org [accessed 2006-3-1] • Jin, J., Sarker, B.K., Bhavsar, V.C., Boley, H. and Yang, L., Towards a Weighted-Tree Similarity Algorithm for RNA Secondary Structure Comparison, Proceedings of the 8th International Conference on High Performance Computing in Asia Pacific Region (HPC-Asia 2005), IEEE Computer Society, ISBN 0-7695-2486-9, pp. 639-644, Nov 30-December 3, 2005. • Kifer, M., Lausen, G., Kifer, M. and Wu, J., Logical Foundations of Object-Oriented and Frame-Based Languages, JACM, Vol.43, No.2, 741-843, 1995. • Lassila, O. and Swick, R.R., Resource Description Framework (RDF) Model and Syntax Specification, Recommendation REC-rdf-syntax-19990222, W3C, February 1999. • Li, W. and Clifton, C., Semantic integration in heterogeneous databases using neural networks, Proc. 20th Int. Conf On Very Large Data Bases, pp. 1–12, 1994.

References (cont’d) • Lu, S., A Tree-to-Tree Distance and Its Application to Cluster Analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. PAMI-1, No.2, April 1979. • Madhavan, J., Bernstein, P. and Rahm, E., Generic schema matching using Cupid, Proceedings of 27th International Conf. on Very Large Data Bases (VLDB’01), Rome, Italy , pp. 49-58, 2001. • Melnik, S., Molina, H.G. and Rahm, E., Similarity flooding: A versatile graph matching algorithm, Proceedings of Eighteenth International Conference on Data Engineering, San Jose, California, pp. 117-128, February 2002. • Miller, R., Ioannidis, Y.E. and Ramakrishnan, R., Schema equivalence in heterogeneous systems: bridging theory and practice, Inf. Syst. 19(1):3–31, 1994. • Milo, T. and Zohar, S., Using schema matching to simplify heterogeneous data translation, Proc. 24th Int. Conf On Very Large Data Bases, pp. 122–133, 1998. • Mitra, P., Wiederhold, G. and Jannink, J., Semiautomatic integration of knowledge sources, Proc. of the 2nd Int. Conf. On Information FUSION'99, Sunnyvale, USA, 1999. • Noetzel, A.S. and Selkow, S.M., An analysis of the general tree-editing problem, In D. Sankoff and J.B. Kruskal, editors, Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, pages 237-252, Addison-Wesley, Reading, MA, 1983. • Palopoli, L., Sacca, D. and Ursino, D., Semi-automatic, semantic discovery of properties from database schemas. Proc Int. Database Engineering and Applications Symp. (IDEAS), IEEE Comput, pp. 244–253, 1998. • Parent, C. and Spaccapietra, S., Issues and approaches of database integration, CACM, 41(5), pp. 166–178, 1998. • Rahm, E. and Bernstein, P., A Survey of Approaches to Automatic Schema Matching, VLDB Journal, 10(4), pp. 334-350, Dec. 2001. • Rahm, E., Do, H.H., Maßmann, S., Matching Large XML Schemas, SIGMOD Record 33(4), pp. 26-31, 2004. • Shasha, D., Wang, J. and Zhang, K., Treediff: Approximate Tree Matcher for Ordered Trees, http://www.cs.nyu.edu/cs/faculty/shasha/papers/tree.html, October 25, 2001. • Sheth, A.P. and Larson, J.A., Federated database systems for managing distributed, heterogeneous, and autonomous databases, ACM Comput Surv, 22(3), pp. 183–236, 1990. • Shvaiko, P. and Euzenat, J., A Survey of Schema-based Matching Approaches. Journal on Data Semantics (JoDS), IV, LNCS 3730, pp. 146-171, 2005. • Wagner, R.A. and Fischer, M.J., The string-to-string correction problem, J. ACM, Vol. 21, No. 1, pp. 168-173, 1974. • Wang, J.T.L., Zhang, K. and Chirn, G.W., Algorithms for Approximate Graph Matching, Inf. Sci., 82(1-2), pp. 45-74, 1995. • Yang, L., Sarker, B.K., Bhavsar, V.C. and Boley, H., A Weighted-Tree Simplicity Algorithm for Similarity Matching of Partial Product Descriptions, Proceedings of ISCA 14th International Conference on Intelligent and Adaptive Systems and Software Engineering, Toronto, pp. 55-60, 2005. • Yang, L., Sarker, B.K., Bhavsar, V.C. and Boley, H., Range Similarity Measures between Buyers and Sellers in e-Marketplaces, Proceedings of the Second Indian International Conference on Artificial Intelligence, Pune, pp. 2559-2572, Dec 20-22, 2005. • Zhang, K. and Shasha, D., Simple fast algorithms for the editing distance between trees and related problems, SIAM J. Comput, Vol.18, No.6, pp.1245–1262, 1989. • Zhang, K., Shasha, D. and Wang, J.T.L., Approximate Tree Matching in the Presence of Variable Length Don’t Cares, J. of Algorithms, Vol. 16, No. 1, pp. 33-66, 1994. • Zhang, K., Wang, J.T.L. and Shasha, D., On the Editing Distance between Undirected Acyclic Graphs and Related Problems, Combinatorial Pattern Matching (CPM), pp. 395-407, 1995.

Thank you! Mercy! 谢谢!

SIMILARITY OF WEIGHTED DIRECTED ACYCLIC GRAPHS

SIMILARITY OF WEIGHTED DIRECTED ACYCLIC GRAPHS

Presentation Transcript

Directed Acyclic Graphs

Parameterized Tractability of Edge-Disjoint Paths on Directed Acyclic Graphs

Using Directed Acyclic Graphs (DAGs) to assess confounding

Directed graphs

Confounding and DAGs (Directed Acyclic Graphs)

Weighted graphs

Seepage in Directed Acyclic Graphs

Distance sensitivity oracles in weighted directed graphs

Directed acyclic graphs with the unique dipath property

Demand-driven Execution of Directed Acyclic Graphs Using Task Parallelism

Directed Acyclic Graphs and Topological Sort

Weighted Graphs

SSSP in DAGs (directed acyclic graphs)

Directed Acyclic Graph

A dynamic algorithm for topologically sorting directed acyclic graphs

Sparse Compact Directed Acyclic Word Graphs

Ternary Directed Acyclic Word Graphs (TDAWG)

Improved Shortest Path Algorithms for Nearly Acyclic Directed Graphs

Adapting Prime Number Labeling Scheme for Directed Acyclic Graphs

Weighted Graphs

WEIGHTED GRAPHS

Directed Graphs