1 / 25

Data Mining in Clinical Databases by using Association Rules

Data Mining in Clinical Databases by using Association Rules. Department of Computing Charles Lo. Outline. What is Association Rule ? Previous Works Target Problems Methodology and Algorithm Experiment and Discussion Q & A. What is Association Rule ? (1) .

zena
Download Presentation

Data Mining in Clinical Databases by using Association Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining in Clinical Databases by using Association Rules Department of Computing Charles Lo

  2. Outline • What is Association Rule ? • Previous Works • Target Problems • Methodology and Algorithm • Experiment and Discussion • Q & A

  3. What is Association Rule ? (1) It was introduced in “Agrawal, Imielinski, & Swami 1993”. Database A, B C 30% of the transactions that contain A and B also contain C, 5% of all the transactions contain all of them.

  4. What is Association Rule (2) • In a supermarket, 20% of transactions that contain Coke Cola also contain Pepsi, 3% of all transactions contain both items. • 20% is the confidence of the rule • 3% is the support of the rule • Association rule can be applied in • Decision Support • Market Strategy • Financial Forecast

  5. Related Work (1) In 1993, Agrawal, Imielinski and Swami • Generate all significant association rules between items • Algorithm Apriori • Pruning Techniques • Buffer management Significant association rule if support > min support if confidence  min confidence

  6. AC AB ABC BC Related Work (2) • Pruning Technique • Frequency Constraint • Memory Management • Memory to store any itemset and all its 1-extensions

  7. Related Work (3) In 1997, Srikant, Vu and Agrawal • Consider constraints that are boolean expression over the presence or absence of items in the rules • Incomplete candidate generation AC ABC AB The boolean constraint: (BC)  (X Y)

  8. Related Work (4) • Selected Items approaches 1. generate a set of selected items • for B= (1  2)  3 2. only count candidates that contain selected items 3. Discard frequent itemsets that do not satisfy the boolean expression 1,2,3,4,5 1,3 2,3 • any (non-empty) itemset that satisfies B will contain • an item from this set

  9. Related Work (5) In 1998, Ng, Lakshmanan, Han and Pang • Achieved a maximized degree of pruning for different categories of constraints. • Two critical properties to pruning • Anti-monotonicity • Succinctness • Algorithm CAP 1. Both anti-monotone and succinct 2. Succinct but Non-anti-monotone 3. Anti-monotone and Non-Succinct 4. Non-anti-monotone and Non-succinct

  10. min(S)  v, max(S)  v, count(s)  v, sum(s)  v S  v, S = v, S  v, S  V Related Work (6) • Anti-Monotone Constraint • S  S’ & S satisfied C  S’ satisfied C Domain Constraint Aggregate Constraint

  11. min(S)  v, min(S)  v, max(S)  v, max(S)  v, count(s)  v, sum(s)  v S  v, S  V S  V S  v, S = v, Domain Constraint Aggregate Constraint Related Work (7) • Succinct Constraint • pruning can be done once-and-for-all before any iteration take place

  12. Target Problems (1) • Association of quantitative items satisfy a given inequality constraint which are composed of either (+ , -) or (* , /) • ( Ii1  Ii2 . . .  Iim )  ( Ij1 Ij2 . . .  Ijn )  C 1. size m 2. size n 3.  + ( * ) 4.  - [ /] 5.  (<, >, =,   ] 6. constant C • (3,2,+,-,>,100) • (1,1,0,/,=,2)

  13. A B C A B C D Target Problems (2) • Temporal aspect of the data • Hierarchies over the data A B C Serial pattern Parallel pattern Sequence pattern Computer Engineering Civil PolyU Arts

  14. Problem Statement • V= I1I2, . . . , IM , a set of quantitative items • T , the transactions of a database D • t[k] > 0 means t contain item Ik t[k] = 0 means Ik does not exist • Association of items which satisfy ( Ii1  Ii2 . . .  Iim )  ( Ij1 Ij2 . . .  Ijn )  C where is + ( * ) ,  is - [ /] ,  is (<, >, =,   ] and cis a scalar value

  15. Application in Clinical Database • Relationship between the treatments and clinical diagonsis • nursing : 100, clinical test : 30, pharmacies : 165, . . . • nursing : 120, injection : 130, pharmacies : 100, . . . • Operation : 220, injection : 542, clinical test : 60, . . . • (X + Y ) - Z> 100 • X / Y = 2

  16. QMIC (1) • QMIC (Quantitative Mining under Inequality Constraints) • Candidate generation • reduce the number of itemsets • Max_Min pruning • Support counting • reduce the iteration of database scanning • Generation sequence • Memory requirement • limitation of the available memory

  17. L L L L L L 1 2 3 4 5 8 . . . QMIC (2) • Skip generation steps by the pre-defined size m and n • Generation Steps • Algorithm Apriori : Lk-1 Lk • Algorithm QMIC : LK/2 Lk

  18. QMIC (3) • Candidate itemsets generation

  19. QMIC (4) • why in this sequence ? • How about using 3, 4 or larger factor ? • Or even the power series ? • Memory Management • keep the previous L’s to generate next level of large itemsets • Only limited memory is available • In QMIC, only three previous L’s are need in order to generate the next level of large itemsets in the generation sequence.

  20. QMIC (5) • What is the trade off of generation sequence ? • more number of candidate itemsets • longer process time in pruning • Max_Min Pruning • involve the inequality constraint to the pruning • Maximum value itemset list (maxlst) • Sorted list in a descending order according to the maximum value of sum (product) • Minimum value itemset list (minlst] • Sorted list in an ascending order according to the minimum value of sum (product)

  21. QMIC (6) • Max_Pruning •  = { , >} • A  B  C where A = ( Ii1  Ii2 . . .  Iim ), B=( Ij1 Ij2 . . .  Ijn ) • Minimum value of A • Over pruning ? • Using maxlst • Sliding Window with size m+ . . . . . . Window of maxlst1 stop sliding if total sum of inside items is smaller than C

  22. QMIC (7) • Max_Pruning procedure

  23. Experiments (1) • Number of items

  24. Experiments (2) • Number of transactions

  25. Future Plan • Association Rules of Sequence Patterns • Time constraint • Association Rules of Multi-layer data

More Related