200 likes | 539 Views
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics. John Witte. Coding Genotypes. Post-Genomic Era: Lots of Data!. “The study of genetic and other biological information using computer and statistical techniques.” A Genome Glossary, Science, Feb 16, 2001.
E N D
Epidemiology 217Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte
“The study of genetic and other biological information using computer and statistical techniques.” A Genome Glossary, Science, Feb 16, 2001
Bioinformatics in Genetic Epi Some key aspects: • Data management • Candidate regions / genes (selection and SNP mining) • Genetic Analyses (e.g., genotyping) • Statistical Analyses
Laboratory Database Demogr. Database Clinical Database 5/20 Genomic Database Health and Habits Database Nutritional Database CaP Genes Databases Hub Data Management
From gene to polymorphisms Given a gene, how do I… Find its polymorphisms? Find information about those polymorphisms?
Hands-on guide for browsing and analyzing genomic data. Contains worked examples, providing: overview of the types of data available, details on how these data can be browsed, and step-by-step instructions for using many of the most commonly-used tools for sequence based discovery. www.nature.com/cgi-taf/dynapage.taf?file=/ng/journal/v35/n1s/
Nature Genetics: A User's Guide to the Human Genome 3 of the 13 worked example questions How does one find a gene of interest and determine that gene's structure? How would one retrieve the sequence of a gene, along with all annotated exons and introns, as well as a certain number of flanking bases for use in primer design? A user wishes to find all the single nucleotide polymorphisms that lie between two sequence-tagged sites. Do any of these single nucleotide polymorphisms fall within the coding region of a gene? Where can any additional information about the function of these genes be found?
Look for SNPs in Databases • General databases: --- dbSNP (http://www.ncbi.nlm.nih.gov/) --- UCSC Genome Bioinformatics (http://genome.ucsc.edu/) --- HapMap (http://www.hapmap.org/) --- The SNP consortium (TSC) (http://snp.cshl.org/) --- Human gene variation base (HGVbase) (http://hgvbase.cgb.ki.se) • Special databases: --- The UW-FHCRC Variation Discovery Resource (SeattleSNPs) (http://pga.gs.washington.edu/) --- Cancer Genome Anatomy Project - SNP500Cancer Database (http://snp500cancer.nci.nih.gov/home_1.cfm) --- InnateImmunity (http://innateimmunity.net) --- Drug response (http://pharmgkb.org) • More….
UCSC Browser Gene structure Comparative Genomics SNPs
SeattleSNPs • Resequencing the complete genomic region of each gene among 24 African-American (AA) subjects and 23 European (CEPH) subjects • 2000 bp upstream of first exon • 1500 bp downstream of poly-A signal • All exons and introns for genes below 35 kbp • Summary data (2/18/05) • Number of genes sequenced: 208 • Total kilobases sequenced: 4408.78 • Number of SNPs found: 23,590 • SNPs in AA sample: 20,765 • SNPs in CEPH sample: 12,937
From Genomics to Proteomics • Our ~ 25,000 genes carry the blueprint for making proteins, of which all living matter is made. • Each protein has a particular shape and function that determine its role in the body. • Proteomics is the study of protein shape, function, and patterns of expression.
DNA 5` 3` Pre-splicing RNA Post-splicing RNA Protein Anatomy of a gene Exon, coding Promoter Exon, non-coding (5`UTR, 3`UTR) Enhancer Poly-adenilation Intron
Proteomics • Characterize proteins derived from genetic code • Compare variations in their expression levels under different conditions • Study their interactions • Identify their functional role.
Proteome Complexity • Recall that genome is relatively static. • In contrast, many cellular proteins are continually moving and undergoing changes such as: • binding to a cell membrane, • partnering with another protein, • gaining or losing a chemical group such as a sugar, fat, or phosphate, or • breaking into two or more pieces.
Size of Proteome? • > 1 Million Proteins >>> 25,000 genes in humans. • Large number due to complexity (a given gene can make many different proteins) • Features such as folds and motifs, allow them to be categorized into groups and families. • This should help make it easier to undertake proteomic research. • But no proteome has yet been sequenced.
How to Analyze Proteomes • Broad range of technologies • Central paradigm: • 2-D gel electrophoresis (2D-GE), and mass spectrometry (MS). • 2D-GE is used to separate the proteins by isoelectric point and then by size. • MS determines their identity and characteristics.
Bioinformatics in Proteomics • Creation and maintenance of databases of protein info. • Development of methods to predict the structure and/or function of newly discovered proteins and structural RNA sequences. • Clustering protein sequences into families of related sequences and the development of protein models. • Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships