1 / 54

From Expression to Regulation: the online analysis of microarray data

From Expression to Regulation: the online analysis of microarray data. Gert Thijs K.U.Leuven, Belgium ESAT-SCD. K.U.Leuven. Founded in 1425 Situated in the center of Belgium Some numbers: 25.000 students 2.500 researchers 1.000 professors University Hospital with 1.500 beds. ESAT-SCD.

ova
Download Presentation

From Expression to Regulation: the online analysis of microarray data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Expression to Regulation:the online analysis of microarray data Gert Thijs K.U.Leuven, Belgium ESAT-SCD

  2. K.U.Leuven • Founded in 1425 • Situated in the center of Belgium • Some numbers: • 25.000 students • 2.500 researchers • 1.000 professors • University Hospital with 1.500 beds http://www.esat.kuleuven.ac.be/~dna/BioI/

  3. ESAT-SCD • Faculty of Engineering • Mathematical engineering (120) • Systems and control • Data mining and Neural Nets • Biomedical signal processing • Telecommunications • Bioinformatics • Cryptography http://www.esat.kuleuven.ac.be/~dna/BioI/

  4. Bioinformatics team • Research in medical informatics and bioinformatics • Research on algorithmic methods • Interdisciplinary team • 15 researchers (1 full professor, 4 post-docs, 10 Ph.D. students) • Engineering, physics, mathematics, computer science, biotech, and medicine • Collaborative research with molecular biologists and clinicians • VIB MicroArray Facility: primary analysis of microarray data • University of Gent-VIB, Plant Genetics: motif discovery • KUL-VIB, Center for Human Genetics • Neuronal development in mice neurons • Targets of PLAG1 (pleiomorphic adenoma gene) • KUL, Obstetrics and Gynecology • Diagnosis of ovarian tumors from ultrasonography (IOTA) • Microarray analysis of ovarian tumor biopsies http://www.esat.kuleuven.ac.be/~dna/BioI/

  5. Overview • Short introduction to microarrays • Exploratory analysis of microarray data • Clustering gene expression profiles • Upstream sequence retrieval • Motif finding in sets of co-expressed genes http://www.esat.kuleuven.ac.be/~dna/BioI/

  6. cDNA microarrays • Collaboration with VIB microarray facility. • 5000 cDNAs (genes, ESTs) spotted on array • Cy3, Cy5 labeling of samples • Hybridization (test, control) • Laser scanning & image analysis • Arabidopsis, mouse, and human http://www.esat.kuleuven.ac.be/~dna/BioI/

  7. Microarray experiment • Collecting samples • Extracting mRNA • Labeling • Hybridizing • Scanning • Visualizing http://www.esat.kuleuven.ac.be/~dna/BioI/

  8. Clones Plasmide preparation PCR amplification Reordering Spotting Zoom - pins Microarray production http://www.esat.kuleuven.ac.be/~dna/BioI/

  9. GenBank A1234 Z4321 Microarrays Blast start Clustering Gibbs sampler start From expression to regulation http://www.esat.kuleuven.ac.be/~dna/BioI/

  10. Exploratory data analysis http://www.esat.kuleuven.ac.be/~dna/BioI/

  11. Data exploration • Subset selection based on • Gene Ontology functional classes • Keywords, gene names • Check the expression profiles of individual genes • Visualization expression profiles of gene families • Link to upstream sequence retrieval http://www.esat.kuleuven.ac.be/~dna/BioI/

  12. Gene Ontology http://www.esat.kuleuven.ac.be/~dna/BioI/

  13. Subset selection http://www.esat.kuleuven.ac.be/~dna/BioI/

  14. Profile inspection http://www.esat.kuleuven.ac.be/~dna/BioI/

  15. Profile visualization http://www.esat.kuleuven.ac.be/~dna/BioI/

  16. Sequence Retrieval http://www.esat.kuleuven.ac.be/~dna/BioI/

  17. Clustering http://www.esat.kuleuven.ac.be/~dna/BioI/

  18. Goal of clustering • Exploration of microarray data • Form coherent groups of • Genes • Patient samples (e.g., tumors) • Drug or toxin response • Study these groups to get insight into biological processes • Genes in same clusters can have the same function or same regulation http://www.esat.kuleuven.ac.be/~dna/BioI/

  19. Initialization K-means • Initialization • Choose the number of clusters Kand start from random positions for the K centers • Iteration • Assign points to the closest center • Move each center to the center of mass of the assigned points • Termination • Stop when the centers have converged or maximum number of iterations http://www.esat.kuleuven.ac.be/~dna/BioI/

  20. Iteration 1 K-means • Initialization • Choose the number of clusters Kand start from random positions for the K centers • Iteration • Assign points to the closest center • Move each center to the center of mass of the assigned points • Termination • Stop when the centers have converged or maximum number of iterations http://www.esat.kuleuven.ac.be/~dna/BioI/

  21. Iteration 1 K-means • Initialization • Choose the number of clusters Kand start from random positions for the K centers • Iteration • Assign points to the closest center • Move each center to the center of mass of the assigned points • Termination • Stop when the centers have converged or maximum number of iterations http://www.esat.kuleuven.ac.be/~dna/BioI/

  22. Iteration 3 K-means • Initialization • Choose the number of clusters Kand start from random positions for the K centers • Iteration • Assign points to the closest center • Move each center to the center of mass of the assigned points • Termination • Stop when the centers have converged or maximum number of iterations http://www.esat.kuleuven.ac.be/~dna/BioI/

  23. Hierarchical clustering • Construction of gene tree based on correlation matrix http://www.esat.kuleuven.ac.be/~dna/BioI/

  24. K-means clustering Need for new clustering algorithms • Noisy genes deteriorate consistency of profiles in cluster • All genes forced into cluster http://www.esat.kuleuven.ac.be/~dna/BioI/

  25. Adaptive quality-based clustering • For discovery, biologists are looking for highly coherent, reliable clusters • Other needs for clustering microarray data • Fast + limited memory (need to analyze thousands of genes) • No need to specify number of clusters in advance • Few and intuitive parameters • AQBC = 2 step algorithm • Cluster center localization • Cluster radius estimation with EM • Read more: • De Smet et al. (2002) Bioinformatics, in press. http://www.esat.kuleuven.ac.be/~dna/BioI/

  26. Step 1: localization of cluster center http://www.esat.kuleuven.ac.be/~dna/BioI/

  27. Step 2: re-estimation of cluster radius • Distance from cluster center randomly distributed except for small group (= cluster elements) • Size of cluster can be estimated automatically by EM • Step 3: remove cluster points and look for new cluster http://www.esat.kuleuven.ac.be/~dna/BioI/

  28. Comparison with K-means A.Q.B.C. K-means: • User defined parameters • Quality criterion (QC): • % defines how significant a cluster should be separated from background • Minimal number of genes in a cluster • User-defined parameters • Number of clusters • Number of iterations • Advantages • Outcome not sensitive to parameter setting • Number of clusters is determined automatically • Based on QC an optimal radius is calculated for each cluster • Set of smaller clusters containing genes with highly similar expression profile (fewer false positives) • Noisy genes are rejected • Disadvantages • Outcome sensitive towards parameter setting • Extensive fine-tuning required to find optimal number of clusters • Separation and merging of clusters based on visual inspection and not on statistical foundation • No quality criterion: more false positives • All genes will be clustered (noisy clusters) • Disadvantages • Some information is rejected: clusters too small • Advantages • Fewer true positives are rejected

  29. Adaptive Quality-Based Clustering Web Interface http://www.esat.kuleuven.ac.be/~dna/BioI/

  30. Cluster results page Upstream Sequence Retrieval http://www.esat.kuleuven.ac.be/~dna/BioI/

  31. Upstream sequence retrieval http://www.esat.kuleuven.ac.be/~dna/BioI/

  32. Upstream Sequence Retrieval • Identify all genes in cluster based on given accession number and gene name. • Delineate upstream region based on sequence annotation. • Check for presence of annotated upstream gene. • IF upstream gene found THEN select intergenic region ELSE blast gene to find genomic DNA where gene is annotated. • Parse blast reports to find intergenic regions • Report results in GFF. http://www.esat.kuleuven.ac.be/~dna/BioI/

  33. Gene Identification http://www.esat.kuleuven.ac.be/~dna/BioI/

  34. Selected sequences & genes to be blasted http://www.esat.kuleuven.ac.be/~dna/BioI/

  35. Results blast report parsing http://www.esat.kuleuven.ac.be/~dna/BioI/

  36. Selected sequences http://www.esat.kuleuven.ac.be/~dna/BioI/

  37. Motif Finding http://www.esat.kuleuven.ac.be/~dna/BioI/

  38. Transcriptional regulation • Complex integration of multiple signals determines gene activity • Combinatorial control http://www.esat.kuleuven.ac.be/~dna/BioI/

  39. Identifying regulatory elements from expression data • Cluster genes from microarray expression data to build clusters of co-expressed genes • Co-expressed genes may share regulatory mechanisms • Most regulatory sequences are found in the upstream region of the genes (up to 2kb from A. thaliana) • Motifs that are statistically overrepresented in the upstream regions are candidate regulatory sequences http://www.esat.kuleuven.ac.be/~dna/BioI/

  40. Upstream sequence model • Motifs are hidden in noisy background sequence. • Data set contains two types of sequences: • Sequences with one or more copies of the common motif. • Sequences with no copy of the common motif. http://www.esat.kuleuven.ac.be/~dna/BioI/

  41. Motif Sampler • Algorithm based on the original Gibbs Sampling algorithm (Lawrence et al. 1993, Science 262:208-214) • Probabilistic sequence model • Changes and additions: • Use of higher-order background model. • Use of probability distribution to estimate number of copies. • Different motifs are found and masked in consecutive runs of the algorithm. • Read more: • Thijs et al. (2001) Bioinformatics 17(12), 1113-1122 • Thijs et al. (2002) J.Comp.Biol. 9(2), 447-464 http://www.esat.kuleuven.ac.be/~dna/BioI/

  42. Intergenic region Core promoter gene Background model • Representation of DNA sequence by higher-order Markov Chain: • Reliable model can be build from selected intergenic DNA sequences. • Intergenic sequence = non-coding region between two consecutive genes. • Only regions that contain core promoter are selected. http://www.esat.kuleuven.ac.be/~dna/BioI/

  43. Algorithm: Initialization • Calculate background model score • Start from random set of motif positions • Create initial motif model http://www.esat.kuleuven.ac.be/~dna/BioI/

  44. Algorithm: iterative procedure • Score sequences with current motif model Calculate distribution Sample new alignment position Iterate for fixed number of steps http://www.esat.kuleuven.ac.be/~dna/BioI/

  45. Algorithm: Convergence Select best scoring positions from Wx to create motif and alingment http://www.esat.kuleuven.ac.be/~dna/BioI/

  46. Motif Sampler http://www.esat.kuleuven.ac.be/~dna/BioI/

  47. Motif Sampler results page http://www.esat.kuleuven.ac.be/~dna/BioI/

  48. Example: Plant wounding • 150 Arabidopsis genes • Mechanical plant wounding • 7 (or 8) time points over a 24h period • Adaptive quality-based clustering produces 8 clusters of which 4 contain 5 or more genes. • Search for a motif of length 8 and a motif of length 12 in 4 clusters Reymond, P et al.. 2000. Differential gene expression in response to mechanical wounding and insect feeding in Arabidopsis. Plant Cell12(5): 707--20. http://www.esat.kuleuven.ac.be/~dna/BioI/

  49. Results: Cluster 1 http://www.esat.kuleuven.ac.be/~dna/BioI/

  50. Results: Cluster 2 http://www.esat.kuleuven.ac.be/~dna/BioI/

More Related