810 likes | 987 Views
Introduction to Genomics and the Tree of Life Chapter 13. Extra-Reading. Next generation sequencer What next generation sequencer can do for genetics/genomics research? Compar_genomics What can we learn from comparative genomics?. Outline of today’s lecture.
E N D
Introduction to Genomics and the Tree of Life Chapter 13
Extra-Reading • Next generation sequencer • What next generation sequencer can do for genetics/genomics research? • Compar_genomics • What can we learn from comparative genomics?
Outline of today’s lecture Introduction: 5 perspectives, history of life Genome-sequencing projects: chronology Genome analysis: criteria, resequencing, metagenomics DNA sequencing technologies: Sanger, 454, Solexa Process of genome sequencing: centers, repositories Genome annotation: features, prokaryotes, eukaryotes
Five approaches to genomics As we survey the tree of life, consider these perspectives: Approach I: cataloguing genomic information Genome size; number of chromosomes; GC content; isochores; number of genes; repetitive DNA; unique features of each genome Approach II: cataloguing comparative genomic information Orthologs and paralogs; COGs; lateral gene transfer Approach III: function; biological principles; evolution How genome size is regulated; polyploidization; birth and death of genes; neutral theory of evolution; positive and negative selection; speciation Approach IV: Human disease relevance Approach V: Bioinformatics aspects Algorithms, databases, websites Page 519
Introduction Lessons learned form comparative genomics What have we learned about genes by comparing genomic sequences? What have we learned about regulation? About 5% of the human genome is under purifying selection Positively regulated regions Mechanisms and history of mammalian evolution Nonuniformity of neutral evolutionary rates within species Nonuniformity of evolution along the branches of phylogeny Learning more form existing data Choice of species Choice of tools Future of comparative genomics
Levels of analysis in genomics leveltopicsdatabases DNA genes, chromosomes GenBank RNA ESTs, ncRNA UniGene, GEO protein ORFs, composition UniProt complexes binary, multimeric BIND pathways COGs, KEGG organelles organs individuals variation and disease HapMap species speciation TaxBrowser; SGD genus JAX mouse phylum FishBase kingdom TOL
Definitions of terms Genomics is the study of genomes (the DNA comprising an organism) using the tools of bioinformatics. Bioinformatics is the study protein, genes, and genomes using computer algorithms and databases. Systematics is the scientific study of the kinds and diversity of organisms and of any and all relationships among them. Classification is the ordering of organisms into groups on the basis of their relationships. The relationships may be evolutionary (phylogenetic) or may refer to similarities of phenotype (phenetic). Taxonomy is the theory and practice of classifying organisms.
Pace (2001) described a tree of life based on small subunit rRNA sequences. This tree shows the main three branches described by Woese and colleagues. Fig. 13.1 Page 521
Molecular sequences as basis of trees Historically, trees were generated primarily using characters provided by morphological data. Molecular sequence data are now commonly used, including sequences (such as small-subunit RNAs) that are highly conserved. Visit the European Small Subunit Ribosomal RNA database for 20,000 SSU rRNA sequences. Page 523
Tree of life from David Hillis’ lab (based on ~3000 rRNAs) animals plants you are here protists bacteria fungi archaea http://www.zo.utexas.edu/faculty/antisense/Download.html
Tree of life from David Hillis’ lab (based on ~3000 rRNAs) you are here http://www.zo.utexas.edu/faculty/antisense/Download.html
Ribosomal RNA Database Ribosomal Database Project http://rdp.cme.msu.edu/index.jsp Santos, S. R. and Ochman H. Identification and phylogenetic sorting of bacterial lineages with universally conserved genes and proteins. Environmental Microbiology. 2004. Jul(6)7:754-9. ►Download fusA (translation elongation factor 2 [EF-2]) ►Obtain DNA in the fasta format ►Align by ClustalW in MEGA ►Create a neighbor-joining tree Page 524
European Small Subunit Ribosomal RNA database (http://www.psb.ugent.be/rRNA/ssu/)
Neighbor-joining tree of ~150 fusA (GTPase) DNA sequences Yersinia pestis Clostridium Aquifex aeolicus Mycoplasma Bac. antracis Mycobacterium Rickettsia Treponema
History of life on earth 4.55 BYA formation of earth (violent 100 MY period) 4.4-3.8 BYA last ocean-evaporating impacts 3.9 BYA oldest dated rocks 3.8 BYA sun brightened to 70% of today’s luminosity Ammonia, methane, or carbon dioxide atmosphere. Earliest life: RNA, protein Source: Schopf J.W. (ed.), Life’s Origins (U. Calif. Press, 2002) Page 521
Millions of years ago (MYA) deuterostome/ protostome echinoderm/ chordate Cambrian explosion Age of Reptiles ends Land plants Insects Proterozoic eon Phanerozoic eon 1000 500 100 0 Page 522
Millions of years ago (MYA) Human/chimp divergence Mass extinction Dinosaurs extinct; Mammalian radiation 100 50 10 0 Page 522
Millions of years ago (MYA) Homo sapiens/ Chimp divergence Australepithecus Lucy Earliest stone tools Emergence of Homo erectus 10 5 0 1 Page 522
Years ago Homo erectus emerges in Africa Mitochondrial Eve 1,000,000 500,000 100,000 0 Page 523
Years ago Emergence of anatomically modern H. sapiens Neanderthal and Homo erectus disappear 10,000 0 100,000 50,000 Page 523
Years ago “Ice Man” from Alps Earliest pyramids Aristotle 1,000 0 10,000 5,000 Page 523
Years ago Darwin, Mendel algebra Gutenberg calculus 100 0 1,000 500 Page 523
Chronology of genome sequencing projects We will next summarize the major achievements in genome sequencing projects from a chronological perspective. Page 525
Chronology of genome sequencing projects 1976: first viral genome Fiers et al. sequence bacteriophage MS2 (3,569 base pairs, Accession NC_001417). 1977:Sanger et al. sequence bacteriophagefX174. This virus is 5,386 base pairs (encoding 11 genes). See accession J02482; NC_001422. Page 527
Chronology of genome sequencing projects 1981 Human mitochondrial genome 16,500 base pairs (encodes 13 proteins, 2 rRNA, 22 tRNA) Today (10/09), over 1800 mitochondrial genomes sequenced 1986 Chloroplast genome 156,000 base pairs (most are 120 kb to 200 kb) Page 527
mitochondrion chloroplast Lack mitochondria (?)
Entrez Genomes organelle resource at NCBI http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html
GOBASE: resource for organelle genomes http://megasun.bch.umontreal.ca/gobase/
MitoDat: resource for organelle genomes “This database is dedicated to the nuclear genes specifying the enzymes, structural proteins, and other proteins, many still not identified, involved in mitochondrial biogenesis and function. MitoDat highlights predominantly human nuclear-encoded mitochondrial proteins.” Not updated recently. http://www-lecb.ncifcrf.gov/mitoDat/
MitoMap: resource for organelle genomes http://www.mitomap.org/
It is possible to map mutations in human mitochondrial DNA that are responsible for disease
Chronology of genome sequencing projects 1995: first genome of a free-living organism, the bacterium Haemophilus influenzae Page 530
Chronology of genome sequencing projects 1996: first eukaryotic genome The complete genome sequence of the budding yeast Saccharomyces cerevisiae was reported. We will describe this genome soon. Also in 1996, TIGR reported the sequence of the first archaeal genome, Methanococcus jannaschii. Page 532
Chronology of genome sequencing projects 1997: More bacteria and archaea Escherichia coli 4.6 megabases, 4200 proteins (38% of unknown function) 1998: first multicellular organism Nematode Caenorhabditis elegans 97 Mb; 19,000 genes. 1999: first human chromosome Chromosome 22 (49 Mb, 673 genes) Page 532
Chronology of genome sequencing projects 2000: Fruitfly Drosophila melanogaster (13,000 genes) Plant Arabidopsis thaliana Human chromosome 21 2001: draft sequence of the human genome (public consortium and Celera Genomics) Page 534
Overview of genome analysis • Selection of genomes for sequencing • Sequence one individual genome, or several? • How big are genomes? • Genome sequencing centers • Sequencing genomes: strategies • When has a genome been fully sequenced? • Repository for genome sequence data • Genome annotation Page 537
Overview of genome analysis Fig. 13.8 p.539
Criteria for selecting genomes for sequencing • Criteria include: • genome size (some plants are >>>human genome) • cost • relevance to human disease (or other disease) • relevance to basic biological questions • relevance to agriculture Page 538
Criteria for selecting genomes for sequencing • Criteria include: • genome size (some plants are >>>human genome) • cost • relevance to human disease (or other disease) • relevance to basic biological questions • relevance to agriculture • Recent projects: • Chicken Fungi (many) • Chimpanzee Honey bee • Cow Sea urchin • Dog Rhesus macaque Page 540
Selection criteria Selection of genomes for sequencing is based on specific criteria. For an overview, see a series of white papers posted on the National Human Genome Research Institute (NHGRI) website: http://www.genome.gov/10002154 For a description of NHGRI selection criteria, visit: http://www.genome.gov/10001495 Page 540
Criteria for selecting genomes for sequencing Sequence one individual genome, or several? Try one… --Each genome center may study one chromosome from an organism --It is necessary to measure polymorphisms (e.g. SNPs) in large populations For viruses, thousands of isolates may be sequenced. For the human genome, cost is the impediment. Page 540