320 likes | 342 Views
This paper presents a weighted tree similarity algorithm for semantic buyer-seller match-making, focusing on partonomy similarity and local similarity measures. The research covers tree representation, simplicity, inner-node and leaf-node similarity, and experimental results. The motivation lies in e-business and e-learning scenarios, enhancing match-making systems for learners and learning objects.
E N D
Weighted Partonomy-Taxonomy Trees with Local Similarity Measures for Semantic Buyer-Seller Match-Making Lu Yang, Marcel Ball, Virendra C. Bhavsar and Harold Boley BASeWEB, May 8, 2005
Outline • Introduction • Motivation • Partonomy Similarity Algorithm • Tree representation • Tree simplicity • Partonomy similarity • Experimental Results • Node Label Similarity • Inner-node similarity • Leaf-node similarity • Conclusion
Introduction • Buyer-Seller matchingin e-business, e-learning Main Server Agents User Info Web Browser User Profiles User User Agents To other sites (network) Cafe-n Cafe-1 Matcher 1 Matcher n A multi-agent system
Introduction • An e-Learning scenario Learner 1 Course Provider 1 Learner 2 Cafe Course Provider 2 Matcher Learner n Course Provider m H. Boley, V. C. Bhavsar, D. Hirtle, A. Singh, Z. Sun and L. Yang, A match-making system for learners and learning Objects. Learning & Leading with Technology, International Society for Technology in Education, Eugene, OR, 2005 (to appear).
Motivation • Metadata for buyers and sellers • Keywords/keyphrases • Trees • Tree similarity
0.5 0.3 0.2 Car Make Year Model 2002 Explorer Ford Tree representation • Characteristics of our trees • Node-labled, arc-labled and arc-weighted • Arcs are labled in lexicographical order • Weights sum to 1
Tree representation– Serialization of trees • Weighted Object-Oriented RuleML • XML attributes forarc weights and subelements for • arc labels <Cterm> <Ctor>Car</Ctor> <slot weight="0.3"><Ind>Make</Ind><Ind>Ford</Ind></slot> <slot weight="0.2"><Ind>Model</Ind><Ind>Explorer</Ind></slot> <slot weight="0.5"><Ind>Year</Ind><Ind>2002</Ind></slot> </Cterm> Tree serialization in WOO RuleML
A b a 0.7 0.3 C B c d e f 0.9 0.1 0.8 0.2 D E F G Tree simplicity (0.9) tree simplicity: 0.0563 (0.45) (0.225) • The deeper the leaf node, the less its contribution to the tree simplicity • Depth degradation index (0.9) • Depth degradation factor (0.5) • Reciprocalof tree breadth L. Yang, B. Sarker, V.C. Bhavsar and H. Boley, A weighted-tree simplicity algorithm for similarity matching of partial product descriptions (submitted for publication).
if T is a leaf node, otherwise. Tree simplicity – Computation Š(T): the simplicity value of a single tree T DI and DF: depth degradation index and depth degradation factor d: depth of a leaf node m: root node degree of tree T that is not a leaf wj: arc weight of the jtharc below the root node of tree T Tj: subtree below the jth arc with arc weight wj
1 0 Inner nodes 0 1 Leaf nodes Partonomy similarity – Simple trees tree t tree t´ (House) Car Car Make Model Model Make 0.7 0.3 0.7 0.3 Mustang Ford Escape Ford
t´ lom t lom technical general educational general technical 0.7 0.3 0.3334 0.3333 0.3333 tec-set gen-set tec-set edu-set gen-set format format platform platform language title language title 0.9 0.5 0.1 0.5 0.8 0.2 0.5 0.5 en en Basic Oracle * WinXP Introduction to Oracle HTML WinXP *:Don’t Care A(si) ≥ si (A(si)(wi + w'i)/2) Partonomy similarity – Complex trees (si (wi + w'i)/2)
Partonomy similarity – Main functions • Three main functions (Relfun) • Treesim(t,t'): Recursively compares any (unordered) pair of trees • Paremeters N and i • Treemap(l,l'):Recursively maps two lists, l and l', of labeled • and weighted arcs: descends into identical– • labeled subtrees • Treeplicity(i,t): Decreases the similarity with decreasing simplicity V. C. Bhavsar, H. Boley and L. Yang, A weighted-tree similarity algorithm for multi-agent systems in e-business environments. Computational Intelligence, 2004, 20(4):584-602.
Experiments Results Tree Tree auto auto make make year year 0.1 0.5 0.5 0.5 0.5 1 chrysler ford 2002 t2 1998 t1 auto auto year make make year 0.55 0.0 1.0 1.0 0.0 ford ford 2002 1998 t1 t2 2 auto auto make year year make 1.0 0.0 1.0 0.0 1.0 ford ford 2002 2002 t4 t3 Similarity of simple trees
Similarity of simple trees (Cont’d) Tree Tree Results Experiments auto auto year make model model 0.45 0.45 0.2823 1.0 0.1 2000 explorer ford mustang t1 t2 3 auto auto year make model model 0.05 0.05 1.0 0.9 0.1203 explorer 2000 ford mustang t3 t4
Similarity of identical tree structures Results Experiments Tree Tree auto auto year year make make model model 0.3 0.5 0.5 0.3 0.2 0.2 0.55 1999 explorer ford explorer 2002 ford t2 t1 4 auto auto make year year make model model 0.3334 0.3333 0.3334 0.3333 0.3333 0.3333 0.7000 explorer ford explorer ford 2002 1999 t3 t4
Similarity of complex trees A A t´ t d b b d c 0.3333 0.3333 0.3334 c 0.3334 0.3333 0.3333 B C B D C D b3 c1 b1 b1 c1 c4 c3 d1 0.3334 d1 0.25 b2 c2 c2 0.3333 0.3333 0.3334 c3 b4 0.5 0.25 1.0 0.25 1.0 0.3333 0.25 0.3333 0.5 B1 C4 B1 B2 B3 F C3 D1 C1 B4 D1 C3 E C1 0.9316 0.8996 0.9230 0.9647 0.9793 0.8160
Similarity of complex trees (Cont’d) A A t´ t d b b d c 0.3333 0.3333 0.3334 c 0.3334 0.3333 0.3333 B C B D C D b3 c1 b1 b1 c1 c4 c3 d1 0.3334 d1 0.25 b2 c2 c2 0.3333 0.3333 0.3334 c3 b4 0.5 0.25 1.0 0.25 1.0 0.3333 0.25 0.3333 0.5 B1 C4 B1 B2 B3 E F C3 D1 C1 B4 D1 C3 E C1 0.9626 0.9314 0.9499 0.9824 0.9902 0.8555
Similarity of complex trees (Cont’d) A A t´ t d b b d c 0.3333 0.3333 0.3334 c 0.3334 0.3333 0.3333 B * B D C D b3 c1 b1 b1 c1 c4 c3 d1 0.3334 d1 0.25 b2 c2 c2 0.3333 0.3333 0.3334 c3 b4 0.5 0.25 1.0 0.25 1.0 0.3333 0.25 0.3333 0.5 B1 C4 B1 B2 B3 F C3 D1 C1 B4 D1 C3 E C1 0.9697 0.9530 0.9641 0.9844 0.9910 0.9134
Number of identical words Maximum length of the two strings 2 = 0.5 4 Node label similarity • For inner nodes and leaf nodes • Exact string matching binary result 0.0 or 1.0 • Permutation of strings “Java Programming” vs. “Programming in Java” Example For two node labels “a b c” and “a b d e”, their similarity is:
1 = 0.5 2 Node label similarity (Cont’d) Example Node labels “electric chair” and “committee chair” meaningful? • Semantic similarity
Node label similarity – Inner nodes vs. leaf nodes • Inner nodes — class-oriented • Inner node labels can be classes • classes are located in a taxonomy tree • taxonomic class similarity measures • Leaf nodes — type-oriented • address, currency, date, price and so on • type similarity measures (local similarity measures)
Semantic Matching Non-Semantic Matching String Permutation (both inner and leaf nodes) Node label similarity Taxonomic Class Similarity (inner nodes) Exact String Matching (both inner and leaf nodes) Type Similarity (leaf nodes)
Inner node similarity – Partonomy trees Distributed Programming Object-Oriented Programming Tuition Tuition Credit Credit 0.4 0.2 0.2 0.1 Duration Duration Textbook Textbook 0.1 0.5 0.3 0.2 $1000 $800 2months 3months 3 3 “Introduction to Distributed Programming” “Objected-Oriented Programming Essentials” t1 t2
Programming Techniques 0.3 0.5 Object-Oriented Programming 0.5 0.7 0.4 0.2 General Concurrent Programming Sequential Programming Applicative Programming Automatic Programming 0.9 0.3 Parallel Programming Distributed Programming Inner node similarity – Taxonomy tree • Arc weights • same level of a subtree: do not need to add up to 1 • assigned by human experts or extracted from documents A. Singh, Weighted tree metadata extraction. MCS Thesis (in preparation), University of New Brunswick, Fredericton, Canada, 2005.
Inner node similarity – Taxonomic class similarity Programming Techniques 0.3 0.5 Object-Oriented Programming 0.5 0.7 0.4 0.2 General Concurrent Programming Sequential Programming Applicative Programming Automatic Programming 0.3 0.9 Parallel Programming Distributed Programming • red arrows stop at the nearest common ancestor • the product of subsumption factors on the two paths = 0.018
Inner node similarity – Integration of taxonomy tree into partonomy trees • Taxonomy tree • extra taxonomic class similarity measures • Semantic similarity without • changing our partonomy similarity algorithm • losing taxonomic semantic similarity Encode the (subsections) of taxonomy tree into partonomy trees www.teclantic.ca
Inner node similarity – Encoding taxonomy tree into partonomy tree Programming Techniques Applicative Programming Sequential Programming Concurrent Programming 0.1 0.15 Automatic Programming 0.1 Object-Oriented Programming General 0.3 0.2 0.15 * * * * * * Distributed Programming Parallel Programming 0.4 0.6 * * encoded taxonomy tree
Inner node similarity – Encoding taxonomy tree into partonomy tree (Cont’d) course course Classification Classification Tuition Tuition 0.65 Duration 0.65 Duration Title Credit 0.05 0.05 Title Credit taxonomy 0.05 0.1 taxonomy 0.2 0.05 0.15 0.05 Programming Techniques 2months 3months $800 Distributed Programming Object-Oriented Programming 3 3 $1000 Programming Techniques 1.0 * 1.0 Concurrent Programming Sequential Programming * Sequential Programming Object-Oriented Programming 0.7 0.3 * * 0.8 0.2 Distributed Programming Parallel Programming * * 0.4 0.6 t2 t1 * * encoded partonomy trees
Leaf node similarity (local similarity) • Different leaf node types different type similarity measures • Various leaf node types • “Price”-typed leaf nodes e.g. for buyer ≤$800 [0, Max] for seller ≥$1000 [Min, ∞]
Example: “Date”-typed leaf nodes { Project Project if | d1 – d2 | ≥ 365, 0.0 DS(d1, d2) = | d1–d2 | start_date start_date end_date end_date – 1 otherwise. 0.5 0.5 0.5 0.5 365 Nov 3, 2004 May 3, 2004 Jan 20, 2004 Feb 18, 2005 t 2 t1 Leaf node similarity (local similarity) 0.74
Conclusion • Arc-labeled and arc-weighted trees • Partonomy similarity algorithm • Traverses trees top-down • Computes similarity bottom-up • Node label similarity • Exact string matching (inner and leaf nodes) • String permutation (inner and leaf nodes) • Taxonomic class similarity (inner nodes) • Taxonomy tree • Encoding taxonomy tree into partonomy tree • Type similarity (leaf nodes) • date-typed similarity measures