280 likes | 404 Views
Tuck et al., BMC Bioinformatics, 2006. Case Study: Characterizing Diseased States from Expression/Regulation Data. Background. How do we classify processes/expression related to disease/phenotype (separating signal/data)?
E N D
Tuck et al.,BMC Bioinformatics, 2006. Case Study: Characterizing Diseased States from Expression/Regulation Data
Background • How do we classify processes/expression related to disease/phenotype (separating signal/data)? • How do we use all of the data available to us – sequences, expression, regulation? • Present case study of acute leukemia and breast cancer (normal vs. diseased cells).
Summary of Contributions • Constructing sample-specific regulatory networks. • Identify links between transcription factors and regulated genes that differentiate healthy states from diseased states. • Generalize to simultaneous changes in functionality of multiple regulatory links, pointing to a regulatory gene / emanating from one TF.
Summary of Contributions • Examine distances in transcriptional networks for subsets of genes that characterize diseased state. • Observation that genes that optimally classify samples are concentrated in neighborhoods. • Genes that are deregulated in diseased sttes exhibit high connectivity. • TF-regulated gene links and centrality of genes can be used to characterize diseased cells.
Background • Current work largely focuses on identification of individual differentially expressed genes, or co-regulated gene sets. • There is significant work on module identification (graph models, SVD, connected components, etc.) • There is work on expression patterns of genes that can classify tumor types. • There is some work on transcription networks prior to this work as well [TRANSFAC/CREME]
Constructing Disease Cell Networks • Intersect connectivity network representing TF binding to gene promoter regions, with co-expression networks representing TF target gene co-expression. • Use TRANSFAC to relate known TF binding sites to promoter regions of genes and known TF-target gene interactions. • For data derived from each microarray (Sample or patient), construct a co-expression network such that each TF-gene pair is assigned +1 or -1 based on up/down co-regulation.
Constructing Disease Cell Networks • Intersection of connectivity and individual co-expression networks gives condition specific (CS) regulatory networks. • CS networks derived from 6 gene expression studies using 3 types of datasets – normal cell lineages, tumor vs. normal tissues, and disease specific tumors associated with variable climical outcomes. • 4821 genes and 196 Tfs on early Affy arrays and 13363 genes and 233 Tfs on newer arrays.
Classifying based on network features. • Assume that each disease sample has a distinct regulatory network (pattern of activated links that gives rise to its expression profile). • Examine how different aspects of network structure characterize different phenotypes.
Classifying based on network features. Link Based Approach • Examine differences between patient samples by analyzing activity status of regulatory links • Construct networks unique to patients • Yields complete discriminatory networks.
Classifying based on network features. Degree Based Approach • “Centrality” of individual genes in networks • Degree – number of TFs activating or suppressing a particular gene (in degree), or number of genes regulated by a single TF (out degree). • Use genome wide degree profile – identifying nodes with largest changes in centrality (rewiring) will assist is in detecting hotspots.
Classifying based on network features. Sample Classification • Create regulatory networks for every sample and apply a classifier. • Rank features to identify set of TF-gene links • Use training sets to identify features and rank links, genes, and degree of nodes that undergo most substantial changes • Acute lymphoblastic leukemia vs. acute mueloid leukemia • Two different myeloid leukemia types • Different matched cell types (renal-cell carcinoma vs. normal)
Classifying based on network features. Sample Classification • Create regulatory networks for every sample and apply a classifier. • Rank features to identify set of TF-gene links • Use training sets to identify features and rank links, genes, and degree of nodes that undergo most substantial changes • Acute lymphoblastic leukemia vs. acute mueloid leukemia • Two different myeloid leukemia types • Different matched cell types (renal-cell carcinoma vs. normal)
Classifying based on network features. Sample Classification • Pass top links to train a basic classifier • Cross validate.