440 likes | 463 Views
This article explores the use of restrictions on concept lattices to manage patterns in data sets, including the concepts of projection and selection. Various algorithms are discussed, such as DFS, BFS, LBS, and BUS.
E N D
Restrictions on Concept Lattices for Pattern Management Léonard Kwuida, Rokia Missaoui, Beligh Ben Amor, Lahcen Boumedjout, Jean Vaillancourt October 20, 2010
Outline • Introduction • Pattern management • Restrictions on concept lattices • Projection • Selection • Algorithms • Depth-first Search (DFS) • Breadth-first Search (BFS) • Leading Bits Sort (LBS) • Bottom-up Search (BUS) • Experiments • Conclusion
Objectives • Adapt the relational operators (e.g. projection) to the formal concept analysis framework to manipulate set of concepts. • Manage patterns using restriction on objects or attributes of a given data set. • Query a concept latticethrought a restriction (projection or selection). • Compare restriction on formal contexts vsrestriction on concept lattices.
Pattern Management • Objective • Store, process and retrieve patterns defined over raw data. • Different types of patterns • Rules, clusters, decisiontrees, …. • Basic operations • Selection, projection, join, union, difference, … • Cross-over operations • Drill-through: from a pattern to raw data • Covering: does a pattern hold for a givendataset? • Approximation (Quafafou, Missaoui & Kwuida)
Pattern Management • European PANDA Project • a generic framework to model various classes of patterns. • SQL operators • CINQ Project • Inductive databases. • Terrovitis and al. (2007) • A uniformframework for data and pattern management. • Links between data and pattern spaces. • Jeudy and al.(2007) • A Model for Managing Collections of Patterns.
Restrictions on Concept lattices • Projection of a concept set on to N . • The projection of a concept set r over a set of attributes N M isgiven by: N(r)= Project(r, N) ={c1=(Ext(c), Int(c)N) c r and c1 is maximal in its equivalence class}. • Two concepts c1 and c2 are equivalent if Int(c1)N= Int(c2)N.
Restrictions on Concept lattices • Selection on a concept set . • The selection on a concept set r w.r.t. a (conjunctive) restriction F on attributes Ai (i N) is a set of concepts cthatlogicallysatisfythat restriction. Select(r, F= {A1=a1 … AN=aN})={c c r and c = F} • The output corresponds to the orderidealin r generated by i N(ai) where (ai)=(ai’, ai”) • For simplicityreasons, we assume that F is in a conjunctiveform.
Example • Basket marketanalysis • Transactions and items (products) • Context K:= (G, M, I) Properties - Items i a b c d e g h Objects f X 1 X X Tr ansact ions X X 2 X X X X 3 X X X X X X 4 X X 5 X X X X 6 X X X X X 7 X X X X 8 X X X X
Example 12345678 a ab ag ac 34678 ad 1234 5678 12356 adf agh 234 568 abg acd acgh abc abdf 123 34 678 56 36 abgh acdf 23 68 4 abcdf abcgh acde 7 3 6 acghi abcdefghi Concept Lattice
Example - Projection a 12345678 Project(r, {abcd}) 34678 12356 ab ac ag ad 5678 1234 568 agh 234 adf acd 34 123 abc abdf acgh abg 678 56 36 abgh acdf 23 68 abcdf 4 abcgh acde 3 7 6 acghi abcdefghi
Projection Projection on {S; T;U; V } of the initial concept lattice. On the left we can see equivalence classes marked on the initial lattice. On the right we note that each equivalence classis represented by a single node (behind which a whole class is attached).
Algorithms - Projection • Depth-first Search (DFS) • Breadth-first Search (BFS) • Leading Bits Sort (LBS) • Bottom-up Search (BUS)
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Algorithm idea: • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34
Depth-first Search • Input lattice B • Set the first class with the top element. • Test if the current node is in the same class with one of his marked parents or children. • If they do not belong to the same class, then create a new membership class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 12345678 a ac 34 34678 123 ab 3 abc abcd
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678 ab 12356 ac 345678 ad 5678
Breadth-first Search • Input lattice B • Start with the top element e. • Move to each node in the children of this element and compare it with e. • If it is not in the same class, then check whether all parents are marked. If so, then we create a new class for it. • Set up the links between the representatives of equivalence classes. • Algorithm idea: • Output latticeB1 a 12345678 ab 12356 ac 345678 ad 5678
Leading Bits Sort • Intents of the input lattice B • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd})
Leading Bits Sort • Intents of the input lattice B • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd})
Leading Bits Sort • Output latticeB1 • The lectic order on subsets of M states that A precedes B if the first position in which A and B differ contains 0 in A and 1 in B. • The equivalent concepts/intents are necessarily consecutive. • Use the iPred procedure of Baixerie and al. to set links between the representatives of equivalence classes. • Algorithm idea: Project(r, {abcd}) a 12345678 ac 34678 ab 5678 ad 12356 678 acd abd 68 36 abc 6 abcd
Bottom-up Search • Input lattice B • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” ∩ N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • Input lattice B • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • The filter c • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Bottom-up Search • Output lattice B1 • Westart the exploration of the lattice (upwards from the bottom) with the most general concept c,whose intent contains N. • There are twopossibilities: • If the concept c has exactly N as intent then the output of the projection is the filter generated by c. • If N is not an intent, then the attributes that are in N” n N will be deleted one by one from the intent of concepts in the filter c. • Algorithm idea:
Experiments • Environment • Java,1.9 GHz processor and 3GB memory • Parameters • Nb of concepts in K= (G, M, I) • Density of K: 40%, 50%,60% • Ratio N/M: (10%,...,80%) • data: from 71114 to 234946 concepts
Experiments • Results • Better performance for LBS and BUS when the percentage of projection ishigherthan 40% • LBS has lower variation than BUS • DFS is the worstalgorithm • Projection on contextis not the best choice!
Conclusion • Focus on projection • Work can be adapted for the selection • Possibility to handle the twooperations in one shot on a given concept lattice • Projection on lattices vs on contexts • Special cases where the projection on latticesis more efficient • More experiments are needed
Future Work • An important fact: the projection is the inverse operation of the assembly of twolattices! • Projection on implication sets • Algorithmimprovement • Execution time and memoryconsumption • Otheroperations on concept lattice
Projection • K=(G, M, W, I) • Projection on a set N of attributes
Selection • K=(G, M, W, I) • Selection on a set of objects
DFS complexity • To analyze the complexity of this procedure, we consider the number of accesses to each node and the number of comparisons. • Each node is visited at least twice (on the way down and back). • If q is the number of equivalence classes, then there are in average q/2 comparisons to mark a node.
BFS complexity • To evaluate the complexity of this algorithm, we consider two parameters: the number of needed comparisons and the number of times each node is accessed. Each node o is visited exactly #parent(o) + 1 times. Then, the overall access to nodes is :
LBS complexity • The sortingprocesswith respect to the lectic order can be done in O(n x ln(n)), where n is the number of concepts in B. The marking of equivalence classes on B is straightforward since there is one linear pass in the linearly sorted set of concepts. Thus, the overall process has a complexity of O(n x ln(n)).
ipred • It sorts the elements of the lattice by size. • All the Δ[ci] in each element of the input set is initialized to the empty set. • This Δ[ci] will contain the accumulation of faces for each element. • The first element in the border is the first element in the sequence • All remaining elements in the input sequence are processed in the order in • which they appear in the enumeration. • The candidate set is computed by intersecting the current element ciwith • all the elements in the border. • We check if the current element belongs to the upper set of the elements that are in the candidate set • If the test result is positive, ci ≺ ˜c, so we can add this connection to the output set, then we add that face to the set of accumulated faces of ˜c and finally, we remove ˜c from the Border • Before the next element is processed, we make sure that ci is added to the • border
BUS complexity • The complexity of this procedure depends on two factors: • When we find the most general concept whose intent contains the set of attributes N. • The number of attributes to be deleted
Work of Jeudy and al. • Sort the concepts on the topological order • Find the equivalence classes and their representatives. • Scan an other time the input lattice to built links between the representatives of equivalence classes.