340 likes | 445 Views
Bioinformatics and Evolutionary Genomics High throughput “functional” data / functional genomics / Omics. High-throuhput data on gene function. What do I mean: omics, microarray, chip-on-chip Why are people generating these data?
E N D
Bioinformatics and Evolutionary GenomicsHigh throughput “functional” data / functional genomics / Omics
High-throuhput data on gene function • What do I mean: omics, microarray, chip-on-chip • Why are people generating these data? • post-genomic era / systems biology: the challenge to understand the roles of the e.g. 6,000 gene products in yeast and how they interact to create a eukaryotic organism. • Because they can: apply automation also to other areas of molecular biology beyond sequencing • To have “screens” for the research question at hand rather than to have to test each guess at a time • What about evolutionary genomics? • Yeast • Accuracy / noise
HTP data • What do they mean: experimental knowledge, but still what do they in terms of e.g. function? • A deluge • Bioinformatics is needed for basic data handling; and has IMHO only scratched the surface in terms of coming up with biological questions with which we can probe this data
Microarray data two conditions often used for “screens”
(Correlated) mRNA expression • mRNA levels are systematically measured under a variety of different cellular conditions, and genes are grouped if they show a similar transcriptional response to these conditions.
Hughes et al. 2000Cell • Profile Similarity Identifies Sterol-Pathway Disturbance Resulting from Deletion of Uncharacterized ORF YER044c (ERG28) and from Dyclonine Treatment • Prominent gene clusters responding to interference with ergosterol biosynthesis, • Comparison of the transcript profile of an erg28Δ strain to that of an erg3Δ strain. • (C) Sterol content of wild-type (left) and erg28Δ (right) strains.
Ihmels et al. 2002 Nature Genetics Conventional hierarchical clustering of co-expression data could fail, because genes can play a role in multiple cellular processes and their common regulatory element can only be detected in a subset of experiments. detect genes that are co-expressed under a subset of conditions. a comprehensive set of overlapping ‘transcriptional modules’
Citric acid cycle? Different activity under different experimental conditions
Rapid divergence in expression between duplicate genes inferred from microarray & promotor data 0.1 = 3.2 My
Clustering conditions where the conditions are genes: yet another way to get to functional “links”
Yeast-2-hybrid Pairs of proteins to be tested for interaction are expressed as fusion proteins ('hybrids') in yeast: one protein is fused to a DNA-binding domain, the other to a transcriptional activator domain. Any interaction between them is detected by the formation of a functional transcription factor.
Examples from the original Ito publication: A autophagy B spindle pole body function C and vesicular transport Arrows ~ orientation of two-hybrid interaction, beginning from the bait to the prey.
Improving reliability using protein complexes reasoning /internal consistency Internal filtering!
Mass spectrometry of purified complexes. • Individual proteins are tagged and used as 'hooks' to biochemically purify whole protein complexes. These are then separated and their components identified by mass spectrometry.
Exosome Ski socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines, >15. Bait proteins are shown in bold and shaded circles around groups of proteins indicate cores and modules. Stages in mRNA degradation
Cellular Function Phylogenetic profile pdb Y2H
Protein interactions: literature databases • Literature derived, normally manually curated (as opposed to text mining) • Biased? • No new knowledge • Useful for benchmarking & for the study of the evolution of e.g. protein complexes • For example: Munich Informatation center for Protein Sequences (MIPS) • Databases that contain literature and omics: Database of Interacting Proteins (DIP), Biomolecular INteraction Database (BIND),
Systematic screening for lethality of knockouts on a rich medium • The functions of many open reading frames (ORFs) identified in genome-sequencing projects are unknown. New, whole-genome approaches are required to systematically determine their function. A total of 6925 Saccharomyces cerevisiae strains were constructed, by a high-throughput strategy, each with a precise deletion of one of 2026 ORFs Of the deleted ORFs, 17 percent were essential for viability in rich medium. Winzeler et al. 1999 Science
Genetic interactions (synthetic lethal/sick) • Two nonessential genes that cause lethality when mutated at the same time form a synthetic lethal interaction. Such genes are often functionally associated and their encoded proteins may also interact physically. Tong et al. 2001 Science
One thing we can do with synthetic lethals • Ideker: protein interactions
What do to with synthetic lethals? • Kelley and Ideker 2005 Nature Biotech
ChIP-on-chip • Tagged strains (one strain for each regulator). • Micro-array for a strain to see which pieces of DNA are found in excess if you isolate the regulator plus bound DNA. b
Gfp localization • Mating of fluorescent protein markers specific for organelles plus fluorescent protein tags for each gene
Other functional genomics data: the omes • quantitative proteomics • Kinome • PTMome • (almost) All of these data is freely and publicly available • Take home message “wow this exists !!!”
Bioinformatics for Benchmarking & Integration purified complexes TAP Purified Complexes HMS-PCI genomic context mRNA co-expression two methods synthetic lethality combined evidence Coverage fraction of reference set covered by data yeast two-hybrid three methods raw data filtered data parameter choices Accuracy fraction of data confirmed by reference set