440 likes | 562 Views
Restrictions on Concept Lattices for Pattern Management. Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor , Lahcen Boumedjout, Jean Vaillancourt October 20, 2010. Outline. Introduction Pattern management Restrictions on concept lattices Projection Selection Algorithms
E N D
Restrictions on Concept Lattices for Pattern Management Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor, Lahcen Boumedjout, Jean Vaillancourt October 20, 2010
Outline • Introduction • Pattern management • Restrictions on concept lattices • Projection • Selection • Algorithms • Depth-first Search (DFS) • Breadth-first Search (BFS) • Leading Bits Sort (LBS) • Bottom-up Search (BUS) • Experiments • Conclusion
Objectives • Adapt the relational operators (e.g. projection) to the formal concept analysis framework to manipulate set of concepts. • Manage patterns using restriction on objects or attributes of a given data set. • Query a concept latticethrought a restriction (projection or selection). • Compare restriction on formal contexts vsrestriction on concept lattices.
Pattern Management • Objective • Store, process and retrieve patterns defined over raw data. • Different types of patterns • Rules, clusters, decisiontrees, …. • Basic operations • Selection, projection, join, union, difference, … • Cross-over operations • Drill-through: from a pattern to raw data • Covering: does a pattern hold for a givendataset? • Approximation (Quafafou, Missaoui & Kwuida)
Pattern Management • European PANDA Project • a generic framework to model various classes of patterns. • SQL operators • CINQ Project • Inductive databases. • Terrovitis and al. (2007) • A uniformframework for data and pattern management. • Links between data and pattern spaces. • Jeudy and al.(2007) • A Model for Managing Collections of Patterns.
Restrictions on Concept lattices • Projection of a concept set on to N . • The projection of a concept set r over a set of attributes N M isgiven by: N(r)= Project(r, N) ={c1=(Ext(c), Int(c)N) c r and c1 is maximal in its equivalence class}. • Two concepts c1 and c2 are equivalent if Int(c1)N= Int(c2)N.
Restrictions on Concept lattices • Selection on a concept set . • The selection on a concept set r w.r.t. a (conjunctive) restriction F on attributes Ai (i N) is a set of concepts cthatlogicallysatisfythat restriction. Select(r, F= {A1=a1 … AN=aN})={c c r and c = F} • The output corresponds to the orderidealin r generated by i N(ai) where (ai)=(ai’, ai”) • For simplicityreasons, we assume that F is in a conjunctiveform.
Example • Basket marketanalysis • Transactions and items (products) • Context K:= (G, M, I) Properties - Items i a b c d e g h Objects f X 1 X X Tr ansact ions X X 2 X X X X 3 X X X X X X 4 X X 5 X X X X 6 X X X X X 7 X X X X 8 X X X X
Example 12345678 a ab ag ac 34678 ad 1234 5678 12356 adf agh 234 568 abg acd acgh abc abdf 123 34 678 56 36 abgh acdf 23 68 4 abcdf abcgh acde 7 3 6 acghi abcdefghi Concept Lattice
Example - Projection a 12345678 Project(r, {abcd}) 34678 12356 ab ac ag ad 5678 1234 568 agh 234 adf acd 34 123 abc abdf acgh abg 678 56 36 abgh acdf 23 68 abcdf 4 abcgh acde 3 7 6 acghi abcdefghi
Projection Projection on {S; T;U; V } of the initial concept lattice. On the left we can see equivalence classes marked on the initial lattice. On the right we note that each equivalence classis represented by a single node (behind which a whole class is attached).
Algorithms - Projection • Depth-first Search (DFS) • Breadth-first Search (BFS) • Leading Bits Sort (LBS) • Bottom-up Search (BUS)
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Algorithm idea: • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34 34678 123 ab 3 abc abcd
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678 ab 12356 ac 345678 ad 5678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678 ab 12356 ac 345678 ad 5678
Leading Bits Sort • Intents of the input lattice B • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd})
Leading Bits Sort • Intents of the input lattice B • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd})
Leading Bits Sort • Output latticeB1 • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd}) a 12345678 ac 34678 ab 5678 ad 12356 678 acd abd 68 36 abc 6 abcd
Bottom-up Search • Input lattice B • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” ∩ N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • Input lattice B • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • The filter c • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • Output lattice B1 • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Experiments • Environment • Java,1.9 GHz processor and 3GB memory • Parameters • Nb of concepts in K= (G, M, I) • Density of K: 40%, 50%,60% • Ratio N/M: (10%,...,80%) • data: from 71114 to 234946 concepts
Experiments • Results • Better performance for LBS and BUS when the percentage of projection ishigherthan 40% • LBS has lower variation than BUS • DFS is the worstalgorithm • Projection on contextis not the best choice!
Conclusion • Focus on projection • Work can be adapted for the selection • Possibility to handle the twooperations in one shot on a given concept lattice • Projection on lattices vs on contexts • Special cases where the projection on latticesis more efficient • More experiments are needed
Future Work • An important fact: the projection is the inverse operation of the assembly of twolattices! • Projection on implication sets • Algorithmimprovement • Execution time and memoryconsumption • Otheroperations on concept lattice
Projection • K=(G, M, W, I) • Projection on a set N of attributes
Selection • K=(G, M, W, I) • Selection on a set of objects
DFS complexity • To analyze the complexity of this procedure, we consider the number of accesses to each node and the number of comparisons. • Each node is visited at least twice (on the way down and back). • If q is the number of equivalence classes, then there are in average q/2 comparisons to mark a node.
BFS complexity • To evaluate the complexity of this algorithm, we consider two parameters: the number of needed comparisons and the number of times each node is accessed. Each node o is visited exactly #parent(o) + 1 times. Then, the overall access to nodes is :
LBS complexity • The sortingprocesswith respect to the lectic order can be done in O(n x ln(n)), where n is the number of concepts in B. The marking of equivalence classes on B is straightforward since there is one linear pass in the linearly sorted set of concepts. Thus, the overall process has a complexity of O(n x ln(n)).
ipred • It sorts the elements of the lattice by size. • All the Δ[ci] in each element of the input set is initialized to the empty set. • This Δ[ci] will contain the accumulation of faces for each element. • The first element in the border is the first element in the sequence • All remaining elements in the input sequence are processed in the order in • which they appear in the enumeration. • The candidate set is computed by intersecting the current element ciwith • all the elements in the border. • We check if the current element belongs to the upper set of the elements that are in the candidate set • If the test result is positive, ci ≺ ˜c, so we can add this connection to the output set, then we add that face to the set of accumulated faces of ˜c and finally, we remove ˜c from the Border • Before the next element is processed, we make sure that ci is added to the • border
BUS complexity • The complexity of this procedure depends on two factors: • When we find the most general concept whose intent contains the set of attributes N. • The number of attributes to be deleted
Work of Jeudy and al. • Sort the concepts on the topological order • Find the equivalence classes and their representatives. • Scan an other time the input lattice to built links between the representatives of equivalence classes.