1 / 22

New issues in storage and analysis

Christophe Roos - MediCel ltd christophe.roos@ medicel .fi. New issues in storage and analysis. High throughput data acquisition. Understanding whole genomes. Part 6: Functional genomics. Annotating genomes with functional information: automatic but without errors?. Genome annotation.

yule
Download Presentation

New issues in storage and analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Christophe Roos - MediCel ltd christophe.roos@medicel.fi New issues in storage and analysis High throughput data acquisition Understanding whole genomes Part 6: Functional genomics Annotating genomes with functional information: automatic but without errors?

  2. Genome annotation • Annotations is the sum of all non-sequence information that can be connected to any sequence Phylogenetic inference Metabolic profiles Connectors to other maps Sequence homologs in other genomes Cofactors and metabolites Metabolic map locator Gene Sequence Functional chemistry Experimental data Genome location Expression info Structure Raw images Numerical values Cluster genes Raw data Electron density SS assignments Structure annotation Christophe Roos - 6/6 Functional genomics

  3. Genome annotation • Primary sources of information about what genes do are laboratory experiments. It may take several experiments for one data point. • All that data should ideallically be associated – hyperlinked among DBs. • Magpie is an environment for genome annotation • Compare genomes to learn how their structure affects function • Bacteria have modules of genes functioning together organised in ‘operons’ • Higher organisms need to pack the DNA to fit it in the nucleus. Activating a gene means unpacking and is not efficient if it is done for each gene separately Christophe Roos - 6/6 Functional genomics

  4. Functional genomics • High throughput technologies give us long lists of the parts of systems (chromosomes, genomes, cells, etc). We can now analyse how they work together to produce the complexity of the organisms. • The function of the genome is • Metabolism: metabolic pathways convert chemical energy derived from food into useful work in the cell. • Regulation: regulatory pathways are biochemical mechanisms that control what genomic DNA does. It switches genes on and off in a controlled way. • Signalling: signalling pathways control the movement of information (chemicals) from one component to another on many levels • Construction • Functional genomics tries to map these pathways Christophe Roos - 6/6 Functional genomics

  5. Analysing the activity of the genome • Genomics: look at transcriptional activity of genes • Transcription: When a gene is transcriptionally active, it means that messenger RNA (mRNA) is synthesised. The amount of mRNA from each active gene varies over time. • Turnover: Different mRNA species have different half-lives. • Translation: When a mRNA is produced, it does not imply that the corresponding protein is translated. Transcripts can also be produced for storage and later use. • Technically feasible: it is possible to isolate all mRNAs from cells and to quantitate it within certain limits. • Proteomics: look at proteins instead of transcripts • Limited: Presently acceptable efficiency comes at the expenses of incufficient quality • Closer to ’reality’ since the proteins are the players Christophe Roos - 6/6 Functional genomics

  6. EST: Expressed sequence tags • ESTs are partial sequences of cDNA clones. cDNA clones are DNA synthesised in vitro using mRNA as template. • Why? cDNA is more stable than mRNA • How? cDNA can be made ‘en masse’ starting from total cellular mRNA isolates. cDNA libraries are specific for tissue, developmental time, stimulation etc. • Therefore, looking at cDNA is looking at mRNA is looking at active genes. • To look at cDNA means sequencing (part of) it. • Clones are picked at random (10’000-200’000) • Sequenced from one or both ends once (no proofreading) • Sequences entered into EST sequence databases Christophe Roos - 6/6 Functional genomics

  7. EST: Expressed sequence tags • constucting a clone by inserting a piece of DNA into a ’vector’. • the vector and its insert will behave as an independent unit (’plasmid’) in the bacterial host and carries some additional genes to allow for selection (only those bacterial with the vector will survive on antibiotics) • Amplify and sequence • Iterate (in parallell) Christophe Roos - 6/6 Functional genomics

  8. DNA hybridisation • DNA is a double-helix and can be separated by denaturing treatment into two strands. Each strand becomes ’sticky’ and attempts to renature with homologous single-strand sequences to form hybrids. • Single-strand DNA from all known genes of a given species can be attached to a matrix, then probed with labelled cDNA molecules from a given sample. Only complementary probes will hybridise and can be detected if they have been previously labelled (radioactivity, fluorescent stain, ...) • The technique can be multiplexed: • High density arrays carrying sticky probes from a full genome • Parallel hybridisation with cDNA from various sources Christophe Roos - 6/6 Functional genomics

  9. The process of using microarrays Building the Chip: PCR PURIFICATION and PREPARATION MASSIVE PCR PREPARING SLIDES PRINTING Preparing RNA: Hybridising the Chip: CELL CULTURE AND HARVEST POST PROCESSING ARRAY HYBRIDIZATION RNA ISOLATION DATA ANALYSIS PROBE LABELING cDNA PRODUCTION Christophe Roos - 6/6 Functional genomics

  10. The output: the image raw data cDNA is prepared from two samples (in this example) and labelled, each sample with a distinct color. Then the array is hybridised with the doubble probe and the signal is recorded as images overlay images and normalise scanning laser 2 laser 1 analysis emission Christophe Roos - 6/6 Functional genomics

  11. Problems in image analysis • Noise • Spot detection and intensity • Alignment if overlay Christophe Roos - 6/6 Functional genomics

  12. A set of experiments on yeast... • Each row represents one gene • Each column represents one experiment • The columns have been organised into related sets of experiments (ALPH, ELU,...) • The colors indicate gene activity (from high to absent) Christophe Roos - 6/6 Functional genomics

  13. Clustering the resulting data • Looking at 10’000 genes is not easy • Group genes into clusters of genes that behave the same way over a set of several experiments • Hierarchical clustering • K-means clustering • Self-organising maps (SOM) • Etc. Christophe Roos - 6/6 Functional genomics

  14. The overall process with microarrays • Microarray data has to be used in a larger frame of experimentation Christophe Roos - 6/6 Functional genomics

  15. Making a model of the data Sequence  Structure  Function Interaction  Network  Function Genome  Transcriptome  Proteome • Elements • Binary relations • Networks Assembly Neighbour Cluster Pathway Genome Hierarchical Tree Christophe Roos - 6/6 Functional genomics

  16. Comparing networks Pathway vs. Pathway • Gain new biological information by comparison of networks • What is the metrics? • How is it done? Is it simply a problem of graph isomorphism Pathway vs. Genome Genome vs. Genome Cluster vs. Pathway Christophe Roos - 6/6 Functional genomics

  17. a a A b b B c C c D d d g G g e E e k k K I i i h h H f F f j J j A B C D G E K I H F J Biological graph comparison • Search heuristically for clusters of correspondence Graph 1 Correspondences Graph 2 A - a B - b C - c D - d . . . . . . Clustering algorithm Christophe Roos - 6/6 Functional genomics

  18. Example: genomic, metabolic, structural Genome-pathway comparison, which reveals the correlation of physical coupling of genes in the genome - operon structure (a) andfunctional coupling (b) of gene products in the pathway E. coli genome hisL hisG hisD hisC hisB hisH hisA hisF hisI yefM yzzB Christophe Roos - 6/6 Functional genomics

  19. Example: genomic, metabolic, structural HISTIDINE METABOLISM Pentose phosphate cycle 5P-D-1-ribulosyl- formimine 3.5.1.- Phosphoribulosyl- Formimino-AICAR-P 2.6.1.- Imidazole- acetole P Phosphoribosyl-AMP L-Hisyidinal 3.6.1.31 3.5.4.19 5.3.1.16 4.2.1.19 3.1.3.15 2.4.2.17 2.4.2.- 2.6.1.9 PRPP Phosphoribosyl- Formimino-AICAR-P Imidazole- Glicerol-3P Phosphoriboxyl-ATP L-Histidinol-P 1.1.1.23 5P Ribosyl-5-amino 4- Imidazole carboxamide (AICAR) 1-Methyl- L-histidine L-Hisyidinal 3.4.13.5 Aneserine 6.3.2.11 2.1.1.- 2.1.1.22 Purine metabolism 6.3.2.11 Carnosine 1.1.1.23 3.4.13.3 3.4.13.20 6.1.1 N-Formyl-L- aspartate Imidazolone acetate Imidazole- 4-acetate Imidazole acetaldehyde Histamine Hercyn 4.1.1.22 1.14135 3.5.3.5 3.5.2.- 1.2.1.3 1.4.3.6 4.1.1.28 L-Histidine Christophe Roos - 6/6 Functional genomics

  20. Example: genomic, metabolic, structural SCOP hierarchical tree 1. All alpha 2. All beta 3. Alpha and beta (a/b) 3.1 beta/alpha (TIM)-barrel 3.2 Cellulases . . . . . . . 3.74 Thiolase 3.75 Cytidine deaminase 4. Alpha and beta (a+b) 5. Multi-domain (alpha and beta) 6. Membrane and cell surface pro 7. Small proteins 8. Peptides 9. Designed proteins 10. Non-protein ……..NE, TYROSINE AND TRYPTOPHAN BIOSYNTHESIS Tyrosine metabolism Alkaloid biosynthesis I Tyr-tRNA 6.1.1.1 Tyrosine 2.6.1.1 2.6.1.5 1.4.3.2 1.3.1.43 2.6.1.9 2.6.1.57 4-Hydroxy- phenylpyruvate 1.14.16.1 4.2.1.51 2.6.1.57 2.6.1.1 2.6.1.5 Pretyrosine 4.2.1.91 2.6.1.9 2.6.1.57 Phenylpyruvate 1.4.1.20 2.6.1.1 4.2.1.51 Prephenate 6.1.1.20 Indole RNA 2.6.1.5 2.6.1.9 2.6.1.57 4.2.1.91 Phenylalanine 4.2.1.20 1.4.3.2 5.4.99.5 4.2.1.20 2.5.1.19 4.6.1.4 4.1.3.27 2.4.2.18 5.3.1.24 4.2.1.20 4.1.1.48 3-deoxy- D-arabino- heptonate L-Tryptophan Anthranilate N-(5-Phospho- b-v-ribosyl)- anthranilate 1-(2- Carboxy- Phenylamino)- 1-deoxy-D-ribulose 5-phosphate (3-Indolyl)- Glycerol phosphate Chorismate 2.7.1.71 Shikimate Histidine 4.6.1.3 1.1.1.25 1.1.9925 4.1.3.- 3-Dehydro- quinate 4-Aminobenzoate Tryptophan metabolism 4.2.1.10 Ubiquinone biosynthesis 4.2.1.10 4.2.1.11 3-Dehydro- shikimate Protocatechuate 1.1.1.24 1.1.9925 Folate biosynthesis Christophe Roos - 6/6 Functional genomics

  21. More challenges? The list of genes being activated or inactivated or that are unaffected when comparing two samples becomes more informative if the genes can be mapped onto maps from which functions can be deduced. Christophe Roos - 6/6 Functional genomics

  22. More challenges? Christophe Roos - 6/6 Functional genomics

More Related