200 likes | 255 Views
Sequence Databases. What are they and why do we need them. What is sequence data?. DNA, RNA and Protein (Amino Acids). Why do I need it?. Evolution Mutation Natural Selection Intra and Inter-species relationships Niche exploitation Ecosystems. REALLY?. YES!. Evolution Mutation
E N D
Sequence Databases What are they and why do we need them
What is sequence data? DNA, RNA and Protein (Amino Acids) Why do I need it? • Evolution • Mutation • Natural Selection • Intra and Inter-species relationships • Niche exploitation • Ecosystems REALLY?
YES! Evolution Mutation Natural Selection Intra and Inter-species relationships Niche exploitation Ecosystems Intra and Inter-species relationships Niche exploitation Ecosystems Phenotypes • Phenotypes come from the proteins. • Proteins come from the DNA via RNA. • Changes in DNA cause changes in proteins. • Changes in proteins cause changes in phenotypes. How do we find those changes? Sequencing
Is the Sequence everything? The sequence itself is not informative; it must be analyzed by comparative methods against existing databases to develop hypothesis concerning relatives and function. What do Databases let you do? • Explore and investigate sequence data • Classify organisms • Assign a possible function to a gene • Verify a sequences identity • Annotate a genome • Design primers for PCR and probe experiments
What is a Database? Databases allow us to more easily find what we need
What Databases are there? Many other specialized Databases are available. Bioinformatics for Dummies, 2003
What Database should I use? A.K.A. GenBank
How big is GenBank? 1977 DNA Sequencing 1985 PCR 1987 Automated Sequencing 1997 Capillary Sequencing
Who can put data into GenBank? Sequence data are submitted to GenBank from scientists from around the world. Warning: GenBank does not check the validity or accuracy of sequences submitted. This is left up to the scientific community to verify, like all published scientific data.
How do I use GenBank? www.ncbi.nlm.nih.gov Problem 1. You are constructing a phylogeny of Euglenoids and you have determined from the literature that the Beta-tubulin gene is a good gene for this purpose. How do I start???
How do I use GenBank? www.ncbi.nlm.nih.gov Euglenozoa AND tubulin NOT kinetoplastida AF182759
How do I use GenBank? Problem 2. You are studying domestication of Sorghum vulgare. From reading about sorghum you find out that it is closely related to Zea mays. You also find out that maize has a wild relative teosinte that forms multiple stocks. Domesticated maize forms a single stock. Domesticated sorghum has a single stock while wild sorghum (Johnsongrass) has multiple stocks.
Broomcorn (Sorghum) Domesticated Johnsongrass Wild Sorghum vulgare Sorghum halepense
How do I use GenBank? Problem 2. Continued Moreover, the paper states that this trait is controlled by a single gene teosinte branched 1 (tb1). You wonder “Does sorghum have this gene?”. The paper does provide a set (Forward and Reverse) PCR primers that where used to isolate and sequence the tb1 gene. Will they work for Sorghum?
Sequencing Sorghum Does sorghum have the tb1 gene? >Sorghum_vulgare_sequence ATGGACTTACCGCTTTACCAACAACTGCAGCTCAGCCCGCCTTCCCCAAAGCCGGACCAATCAAGCAGCT TCTACTGCTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGCCGCCGACGCCAGCTTTCACCTGAGCTA CCAGATCGGTAGTGCCGCCGCCGCCATCCCTCCACAAGCCGTGATCAACTCGCCGGAGGACCTGCCGGTG CAGCCGCTGATGGAGCAGGCGCCGGCGCCGCCTACAGAGCTTGTCGCCTGCGCCAGTGGTGGTGCACAAG GCGCCGGCGTCAGCGTCAGCCTGGACAGGGCGGCGGCCGCGGCCGCCGCGAGGAAAGACCGGCACAGCAA GATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTGTCCCTTGACGTCGCCCGCAAGTTCTTC GCGCTCCAGGACATGCTTGGCTTCGACAAGGCCAGCAAGACGGTACAATGGCTCCTCAACACGTCCAAGG CCGCCATCCAGGAGATCATGGCCGACGACGTCGACGCGTCGTCGGAGTGCGTGGAGGATGGCTCCAGCAG CCTCTCCGTCGACGGCAAGCACAACCCGGCGGAGCAGCTGGGAGATCAGAAGCCCAAGGGTAATGGCCGC AGCGAGGGGAAGAAGCCGGCCAAGTCAAGGAAGGCGGCGACCACCCCAAAGCCGCCAAGAAAATCGGGGA ATAATGCGCACCCGGTCCCCGACAAGGAGACGAGGGCGAAGGCGAGGGAGAGGGCGAGGGAGCGAACCAA GGAGAAGCACCGGATGCGTTGGGTAAAGCTTGCATCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGTG GCTAGCGACAGGCCGAGCTCGAACCATTTGAACCACCACCACCACTCATCGTCGTCCATGAACATGCCGC GTGCTGCGGAGGCTGAATTGGAGGAGAGGGAGAGGTGCTCATCAACTCTCAACAATAGAGGAAGGATGCA AGAAATCACAGGGGCGAGCGAGGTGGTCCTAGGCTTTGGCAACGGAGGAGGATACGGCGGCGGCAACTAC TACTGCCAAGAACAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGA www.ncbi.nlm.nih.gov/BLAST/
Resources at NCBI GenBank – Molecular Databases Nucleotides, Proteins, Structures, Expression (ESTs) and Taxonomy. Literature Databases PubMed, Journals, OMIM, Book, and Citation Matcher. Genomes and Maps – Entrez Map Viewer, UniGene, COGs, Organism-specific, Organelle, Virus, and Plasmids. Tools – Software Engineering BLAST, Sequence Analysis, 3-D Structures, Gene Expression, Literature and Genome Analysis. Education Books, Courses, Public Information. Research Biology, Computers.
Objectives Explain what can you do with sequence data. Explain what a database is. Describe what kinds of data and resources are available. Describe some of the uses of databases.