1 / 16

Scalable Rule-Based Gene Expression Data Classification

Scalable Rule-Based Gene Expression Data Classification. Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008). Outline. Introduction Preliminaries Method – Boolean Structure Table (BST)

hawa
Download Presentation

Scalable Rule-Based Gene Expression Data Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable Rule-Based Gene Expression Data Classification Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008)

  2. Outline • Introduction • Preliminaries • Method– Boolean Structure Table(BST) Boolean Structure Table Classifier (BSTC) • Experiments • Conclusions

  3. Introduction • association rule-based classifiers for gene expression data operate in two phases: (i) Association rule mining from training data followed by (ii) Classification of query data using the mined rules • require an exponential search of the training data set’s samples • heuristic rule-based gene expression data classifier

  4. Introduction • g1,g2 → cancer • g5, g6 → healthy

  5. Preliminaries • A finite set G of genes and N collections of subsets from G • Ci : class type or class label • si,j ⊂ G as a sample • every element g ∈ G as a gene • C1 ={s1,1, . . . , s1,m1}, . . . , CN = {sN,1, . . . , sN,mN} • if g ∈ G , g ∈ si,jwe will say that samplesi,jexpressesgene g

  6. Preliminaries • Conjunctive associationrule (CAR) • If a query sample s contains all genes gj1, . . . , gjr then it should be grouped with class type Cn. gj1, . . . , gjr⇒ n • Support: • Confidence:

  7. Preliminaries • CARg1, g3 ⇒ Cancer • Support = 2, conf = 1

  8. Preliminaries • Boolean association rule (BAR) • if B(s[g1], . . . , s[gn]) evaluates to true for a given sample s,then s should belong to class Ci.” • Support • Confidence • B(x1, x2, x3, x4, x5, x6) = (x1 ∧ x3) ∨ (x2 ∧ x4) • support 3 and confidence 1

  9. Boolean Structure Table(BST) • if g ∈ G , g ∈ si,jwe will say that samplesi,jexpressesgene g

  10. Boolean association rule (BAR) • Gene Row BARs with 100% Confidence Values • Gene g1: (g1 expressed) ⇒ Cancer. • Gene g2: (g2 expressed AND [EITHER (g1 expressed) OR (either g5 or g3 not expressed)] ) ⇒ Cancer. • Gene g3: (g3 expressed AND [EITHER {(g1 expressed) AND (either g4 or g6 not expressed)} OR { (either g2 or g5 not expressed) AND (either g4 or g5 not expressed)} ] ) ⇒ Cancer. • … • exclusion list

  11. Why BARs with 100% Confidence? • We remove all exclusion list clauses related to sample row s5 • (g3 expressed AND [EITHER (g1 expressed) OR (either g2 or g5 not expressed) ] ⇒ Cancer • Confidence = • theorem CAR , exists a 100% confident BST generated BAR B ⇒ C supp( ) = supp, non-C samples.

  12. Boolean Structure Table Classifier (BSTC) • BSTEC (T(i),Q) • Q = {g1 expressed, g2 not expressed, g3 not expressed, g4 expressed, g5 expressed, g6 not expressed} • Healthy classification value of 3/8

  13. Runtime Complexity for BST Creation • C1, . . . , CNis • BSTs canbe constructed for all Cis in time

  14. Runtime Complexity for BST • construct all the BSTs T(1), . . . , T (N) • Thus,BSTC requires time and space O (|S|^2・|G|) • during classification BSTC must calculate BSTCE(T(i),Q) for 1 ≤ i ≤ N. BSTCE runs in O ((|S| − |Ci|)・|G|・|Ci|) time per query sample • worst case evaluation time is also O(|S|^2・|G|) per query sample

  15. Experiments

  16. Conclusions • BSTC retains the classification accuracy of current association rule-based methods while being orders of magnitude faster than the leading classifier RCBT on large datasets • Rulebased classifiers: (i) BSTC is easy to use (requires no parameter tuning) (ii) BSTC can easily handle datasets with any number of class types

More Related