290 likes | 408 Views
Mining for constant valued biclusters using RCB. Sean Landman March 7 th 2011. Outline. Review Biclustering Apriori Range Support Patterns (RAP) RCB ( Atluri et al., 2009) Definition Algorithm Genetic Interactions (GI) data Experimental results. Review: Biclusters.
E N D
Mining for constant valued biclusters using RCB Sean Landman March 7th 2011
Outline • Review • Biclustering • Apriori • Range Support Patterns (RAP) • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Review: Biclusters • Clustering along both dimensions • i.e. Genes co-expressed across a subset of conditions rather than across all conditions • Different types of biclusters: Image: Atluri et al. (2009)
Motivation • Constant value biclusters are important for analyzing genetic interaction data • More later… • Problems with previous approaches: • Reliance on heuristics • i.e. Top-down greedy search • Focus on different types of biclusters • Need a way to find constant valued biclusters without relying on heuristics
Review: Apriori principle Image: Feb. 7 Lecture Slides
Review: Apriori algorithm • Input: support threshold, transaction data • Start with set of all 1-itemsets • Discard itemsets with support less than threshold • For k = 2 to N • Generate all possible k-itemsets from (k-1)-itemets • Discard k-itemsets with support less than threshold
Review: RAP framework • Efficient and exhaustive discovery of all constant row/column biclusters • “Association analysis for real-valued data” • Range Support measure:
Review: RAP framework • Range Support = 1.4 + 0.9 = 2.3 Image: Pandey et al. (2008)
Review RAP framework • Range Support measure is anti-monotonic • i.e. Adding an additional item can only decrease the Range Support • Algorithm: • Apriori-like algorithm using Range Support measure instead of Support count
Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Range Constrained Blocks (RCB) • Similar idea to RAP • Association analysis framework • Exhaustive and efficient discover of all (nearly-) constant valued biclusters • RAP : constant-row/column :: RCB : constant-value
Range Constrained Blocks (RCB) • Definition: • i.e., Submatrices with all values within a relative range • Range measure is monotonic • i.e. Adding anything to RCB block can only raise its Range score
Range Constrained Blocks (RCB) • Range = (5 – 2) / 2 = 1.5 • Range = (45 - 30) / 30 = 0.5
Why not post-process from RAP? • RCB is 2-dimensional, RAP is 1-dimensional • Combinatorial explosion of examining all submatrices of RAP patterns • Not all RCB patterns are contained within the RAP patterns
Apriori approach? • Not quite… • Item sets are 1-dimensional • Evaluated with Support / Range Support measures • RCB blocks are 2-dimensional • Evaluated with Range measure • Thus, “item set lattice” is exponentially larger
Apriori approach? Image: Feb. 7 Lecture Slides
Algorithm outline • Two separate Apriori-like discovery steps: • 1 – Discover all square RCBs • 2 – Merge square RCBs to discover all RCBs • Examples: • 1.1: Find all 1x1 RCBs • 1.2: Find all 1xN or Nx1 RCBs (for all N) • 2.1: Find all 2x2 RCBs • 2.2: Find all 2xN or Nx1 RCBs (for all N) • etc… • Only keep RCBs of size 3x3 or larger
1 - Discovering all square RCBs Image: Atluri et al. (2009)
2 - Merging square RCBs • For each set of square RCBs of a particular size that share a common dimension: • Merge using an Apriori-like algorithm Image: Atluri et al. (2009)
Outline • Review • Biclustering • Apriori • RAP – constant row/column biclusters • RCB (Atluri et al., 2009) • Definition • Algorithm • Genetic Interactions (GI) data • Experimental results
Application: Genetic Interactions • Rows and columns both represent genes • Entries represent the level of genetic interaction between genes • Determined using gene knockout experiments • ε = FAB – FAFB • i.e. FA represents fitness after gene A is deleted
Application: Genetic Interactions • ε = FAB – FAFB • Negative ε represents functional redundancy • Positive ε represents interactions within a functional pathway • Focus in this paper • Positive RCBs in this context represent a complex of functionally related genes
Application: Genetic Interactions Image: Costanzo et al. (2010)
RAD55 RAD57 RAD51 RAD54 RAD52 Between Pathway Interactions (compensatory) REV7 REV1 REV3 RAD55 RAD57 RAD51 RAD54 RAD52 RAD55 RAD57 RAD51 RAD54 RAD52 Within Complex/Pathway Interactions Application: Genetic Interactions Images: Kelly & Ideker (2005), Schuldiner et al. (2005)
Results • Small biclusters • Low Range score Image: Atluri et al. (2009)
Results • Mean functional evaluation (FE) score corresponds well with the Range measure used to define RCB blocks Image: Atluri et al. (2009)
Results • RCB patterns tend to have a much tighter spread between minimum and maximum values than FP or RAP (i.e. better Range score) Image: Atluri et al. (2009)
Conclusion • RCB framework is used to find constant valued biclusters… • Exhaustively • Efficiently • Used for discovering functionally related gene modules in GI data • Other applications: gene expression data?