1 / 50

Biology and Cells

CZ5211 Topics in Computational Biology Lecture 2: Gene Expression Profiles and Microarray Data Analysis Prof. Chen Yu Zong Tel: 6874-6877 Email: yzchen@cz3.nus.edu.sg http://xin.cz3.nus.edu.sg Room 07-24, level 7, SOC1, NUS. Biology and Cells. All living organisms consist of cells.

kswinney
Download Presentation

Biology and Cells

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CZ5211 Topics in Computational BiologyLecture 2: Gene Expression Profiles and Microarray Data AnalysisProf. Chen Yu ZongTel: 6874-6877Email: yzchen@cz3.nus.edu.sghttp://xin.cz3.nus.edu.sgRoom 07-24, level 7, SOC1, NUS

  2. Biology and Cells • All living organisms consist of cells. • Humans have trillions of cells. Yeast - one cell. • Cells are of many different types (blood, skin, nerve), but all arose from a single cell (the fertilized egg) • Each* cell contains a complete copy of the genome (the program for making the organism), encoded in DNA.

  3. DNA • DNA molecules are long double-stranded chains; 4 types of bases are attached to the backbone: adenine (A), guanine (G), cytosine (C), and thymine (T). A pairs with T, C with G. • A gene is a segment of DNA that specifies how to make a protein. • Human DNA has about 25-35K genes; Rice about 50-60K but shorter genes.

  4. Exons and Introns • exons arecoding DNA (translated into a protein), which are only about 2% of human genome • introns are non-coding DNA, which provide structural integrity and regulatory (control) functions • exons can be thought of program data, while introns provide the program logic • Humans have much more control structure than rice

  5. Gene Expression • Cells are different because of differential gene expression. • About 40% of human genes are expressed at one time. • Gene is expressed by transcribing DNA into single-stranded mRNA • mRNA is later translated into a protein • Microarrays measure the level of mRNA expression

  6. Molecular Biology Overview Nucleus Cell Chromosome Protein Gene (DNA) Gene (mRNA), single strand cDNA

  7. Gene Expression • Genes control cell behavior by controlling which proteins are made by a cell • House keeping genes vs. cell/tissue specific genes • Regulation: • Transcriptional (promoters and enhancers) • Post Transcriptional (RNA splicing, stability, localization -small non coding RNAs)

  8. Gene Expression • Regulation: • Translational (3’UTR repressors, poly A tail) • Post Transcriptional (RNA splicing, stability, localization -small non coding RNAs) • Post Translational (Protein modification: carbohydrates, lipids, phosphorylation, hydroxylation, methlylation, precursor protein) cDNA

  9. Gene Expression Measurement • mRNA expression represents dynamic aspects of cell • mRNA expression can be measured with latest technology • mRNA is isolated and labeled with fluorescent protein • mRNA is hybridized to the target; level of hybridization corresponds to light emission which is measured with a laser

  10. Traditional Methods • Northern Blotting • Single RNA isolated • Probed with labeled cDNA • RT-PCR • Primers amplify specific cDNA transcripts

  11. Microarray Technology • Microarray: • New Technology (first paper: 1995) • Allows study of thousands of genes at same time • Glass slide of DNA molecules • Molecule: string of bases (25 bp – 500 bp) • uniquely identifies gene or unit to be studied

  12. Gene Expression Microarrays The main types of gene expression microarrays: • Short oligonucleotide arrays (Affymetrix) • cDNA or spotted arrays (Brown/Botstein). • Long oligonucleotide arrays (Agilent Inkjet); • Fiber-optic arrays • ...

  13. Fabrications of Microarrays • Size of a microscope slide Images: http://www.affymetrix.com/

  14. Differing Conditions • Ultimate Goal: • Understand expression level of genes under different conditions • Helps to: • Determine genes involved in a disease • Pathways to a disease • Used as a screening tool

  15. Gene Conditions • Cell types (brain vs. liver) • Developmental (fetal vs. adult) • Response to stimulus • Gene activity (wild vs. mutant) • Disease states (healthy vs. diseased)

  16. Expressed Genes • Genes under a given condition • mRNA extracted from cells • mRNA labeled • Labeled mRNA is mRNA present in a given condition • Labeled mRNA will hybridize (base pair) with corresponding sequence on slide

  17. Two Different Types of Microarrays • Custom spotted arrays (up to 20,000 sequences) • cDNA • Oligonucleotide • High-density (up to 100,000 sequences) synthetic oligonucleotide arrays • Affymetrix (25 bases) • SHOW AFFYMETRIX LAYOUT

  18. Custom Arrays • Mostly cDNA arrays • 2-dye (2-channel) • RNA from two sources (cDNA created) • Source 1: labeled with red dye • Source 2: labeled with green dye

  19. Two Channel Microarrays • Microarrays measure gene expression • Two different samples: • Control (green label) • Sample (red label) • Both are washed over the microarray • Hybridization occurs • Each spot is one of 4 colors

  20. Microarray Technology

  21. Microarray Image Analysis • Microarrays detect gene interactions: 4 colors: • Green: high control • Red: High sample • Yellow: Equal • Black: None • Problem is to quantify image signals

  22. Single Color Microarrays • Prefabricated • Affymetrix (25mers) • Custom • cDNA (500 bases or so) • Spotted oligos (70-80 bases)

  23. Microarray Animations • Davidson University: • http://www.bio.davidson.edu/courses/genomics/chip/chip.html • Imagecyte: • http://www.imagecyte.com/array2.html

  24. Basic idea of Microarray • Construction • Place array of probes on microchip • Probe (for example) is oligonucleotide ~25 bases long that characterizes gene or genome • Each probe has many, many clones • Chip is about 2cm by 2cm • Application principle • Put (liquid) sample containing genes on microarray and allow probe and gene sequences to hybridize and wash away the rest • Analyze hybridization pattern

  25. Microarray analysis Operation Principle: Samples are tagged with flourescent material to show pattern of sample-probe interaction (hybridization) Microarray may have 60K probe

  26. Microarray Processing sequence

  27. Gene Expression Data Gene expression data on p genes for n samples mRNA samples sample1 sample2 sample3 sample4 sample5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes Gene expression level of gene i in mRNA sample j Log (Red intensity/Green intensity) = Log(Avg. PM - Avg. MM)

  28. Some possible applications • Sample from specific organ to show which genes are expressed • Compare samples from healthy and sick host to find gene-disease connection • Probes are sets of human pathogens for disease detection

  29. Huge amount of data from single microarray • If just two color, then amount of data on array with N probes is 2N • Cannot analyze pixel by pixel • Analyze by pattern – cluster analysis

  30. Major Data Mining Techniques • Link Analysis • Associations Discovery • Sequential Pattern Discovery • Similar Time Series Discovery • Predictive Modeling • Classification • Clustering

  31. Cluster Analysis: Grouping Similarly Expressed Genes, Cell Samples, or Both • Strengthens signal when averages are taken within clusters of genes (Eisen) • Useful (essential ?) when seeking new subclasses of cells, tumours, etc. • Leads to readily interpreted figures

  32. Some clustering methods and software • Partitioning:K-Means, K-Medoids, PAM, CLARA … • Hierarchical:Cluster, HAC、BIRCH、CURE、ROCK • Density-based: CAST, DBSCAN、OPTICS、CLIQUE… • Grid-based:STING、CLIQUE、WaveCluster… • Model-based:SOM (self-organized map)、COBWEB、CLASSIT、AutoClass… • Two-way Clustering • Block clustering

  33. Assessment of various methods • Algorithmic Approaches to Clustering Gene Expression Data, Ron Shamir School of Computer Science, Tel-Aviv University Tel-Aviv • http://citeseer.nj.nec.com/shamir01algorithmic.html • Conclusion: hierarchical clustering exceptional

  34. Partitioning

  35. Density-based clustering

  36. Hierarchical (used most often)

  37. Hierarchical Clustering: grouping similarly expressed genes Gene Expression Profile Analysis Sample … … …. B C A gene 0.4 0.9 0 0.5 .. .. 0.8 0.2 0.8 0.3 0.2 .. .. 0.7 0.6 0.2 0 0.7 .. .. 0.3 … … … … … … … 1 2 3 4 .. .. 1000

  38. After Clustering Gene Expression Profile Analysis sample … … …. B C A gene .. 0 0.4 0.5 .. 0.9 0.8 .. 0.3 0.2 0.2 .. 0.8 0.7 .. 0 0.6 0.7 .. 0.2 0.3 … … … … … … … .. 3 1 4 .. 2 1000

  39. randomized row column both data clustered Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) time

  40. Types of Similarity Measurements • Distance measurements • Correlation coefficients • Association coefficients • Probabilistic similarity coefficients

  41. Correlation Coefficients • The most popular correlation coefficient is Pearson correlationcoefficient (1892) • correlation between X={X1, X2, …, Xn} and Y={Y1, Y2, …, Yn}: • where sXYis the similarity between X & Y sXY

  42. Use of Similarity for Tree Construction • Normalize similarity so that =1 • Then have nxn similarity matrixS whose diagonal elements are 1 • Define distance matrix by (for example) D = 1 – S Diagonal elements of D are 0 • Now use distance matrixto built tree (using some tree-building software recall lecture on Phylogeny) sXX

  43. A dendrogram (tree) for clustered genes E.g. p=5 Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) 1 2 3 4 5

  44. A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling

  45. Validation Techniques: Hubert’s Γ Statistics • X=[X(i, j)] andY=[Y(i, j)] are two n×n matrix • X(i, j): similarity of gene i and gene j • Hubert’s Γ statistic represents the point serial correlation: • where M = n (n - 1) / 2 • A higher value of Γ represents the better clustering quality. if genes i and j are in same cluster, otherwise

  46. Discovering sub-groups

  47. Gene Expression is Time-Dependent Time Course Data

  48. Sample of time course of clustered genes time time time

  49. Limitations • Cluster analyses: • Usually outside the normal framework of statistical inference • Less appropriate when only a few genes are likely to change • Needs lots of experiments • Single gene tests: • May be too noisy in general to show much • May not reveal coordinated effects of positively correlated genes. • Hard to relate to pathways

  50. Useful Links • Affymetrix www.affymetrix.com • Michael Eisen Lab at LBL (hierarchical clustering software “Cluster” and “Tree View” (Windows)) rana.lbl.gov/ • Review of Currently Available Microarray Softwarewww.the-scientist.com/yr2001/apr/profile1_010430.html • ArrayExpress at the EBI http://www.ebi.ac.uk/arrayexpress/ • Stanford MicroArray Database http://genome-www5.stanford.edu/ • Yale Microarray Database http://info.med.yale.edu/microarray/ • Microarray DB www.biologie.ens.fr/en/genetiqu/puces/bddeng.html

More Related