230 likes | 964 Views
Nucleotide Databases: Genbank. Dr Maha Al- Sulaimani Department of Biochemistry. Learning Objectives. Understand what a nucleotide database is. Distinguish the structure of eukaryotic and prokaryotic genes. Make sense of a GenBank entry.
E N D
Nucleotide Databases:Genbank Dr Maha Al-Sulaimani Department of Biochemistry
Learning Objectives • Understand what a nucleotide database is. • Distinguish the structure of eukaryotic and prokaryotic genes. • Make sense of a GenBank entry. • Understand the difference between GenBank and a gene-centric resource.
Nucleotide Databases • The Nucleotide database is a collection of sequences from several sources, including GenBank and RefSeq. • Genome, gene and transcript sequence data provide the foundation for biomedical research and discovery.
Examples of Nucleotide Databases • Genbank. • INSDC (International Nucleotide Sequence Database Collaboration). • NDB (Nucleic Acid Database). • NCBI (National Center for Biotechnology Information). • EMBL (European Molecular Biology Laboratory). • EBI (European Bioinformatics Institute).
Looking for DNA Sequences • There are many types of DNA sequences. • The most common are: • Regulatory regions, often before genes. • Untranslated regions, often around genes (UTRs). • Protein-coding regions (exons) • Intergenic regions (between genes, introns). • All these sequences can be found in GenBank
Prokaryotes vs. Eukaryotes • Prokaryotes • Genome=one large circular chromosome + a few small circular chromosomes (plasmides). • 0.5 to 8 Mb/chromosome. • Genes in one piece. • 70% of the genome is coding. • 1 gene/Kb. • Eukaryotes • Genome= many large linear chromosomes. • 10 to 700 Mb/chromosome. • Genes split. • 5% of the genome is coding. • 1 gene/100 Kb (Human).
Typical Eukaryotic Protein-Coding Gene • The coding sequences are made of coding exons separated by introns. • Introns are spliced out and exons glued together to make the ORF. • One gene can code for several alternative proteins: alternative splicing.
GenBank • Housed by the National Center for Biotechnology. (NCBI). • GenBank is the memory of biological science. • Contains everyDNA sequence ever published.
GenBank • GenBank is the original information source for most biological databases. • GenBank is more complicated to use than gene-centric databases.
Limitations of GenBank • GenBank entries can contain: • Entire genes. • Portions of genes. • Many genes. • GenBank entries can be of uneven quality: • Can be duplicates and/or inaccurate. • The database is not a selection center. • All data is treated equally.
Limitations of GenBank • GenBank entries are not the final word on particular genes: • They have no authoritative biological meaning. • They merely keep track of what was done. • Gene-centric databases are needed to compile everything that is known on a given gene and to correct potential errors.
Using Gene-centric Databases:Entrez Gene • Entrez Gene can be accessed from NCBI. • In GenBank, each entry is one sequence from one publication. • In Entrez Gene, each entry is one gene. • Entrez Gene is built with GenBank data.