190 likes | 304 Views
On line (DNA and amino acid) Sequence Information . Lecture 9. Introduction. Annotation of genes Basic bioinformatics Databases NCBI home page Query and return results DNA sequence results page Protein sequence results page . Bioinformatcs Databases.
E N D
Introduction • Annotation of genes • Basic bioinformatics Databases • NCBI home page • Query and return results • DNA sequence results page • Protein sequence results page
Bioinformatcs Databases • The Biological data, generated by various labs, is submitted and stored in specific databases is : • The data can be: • Nucleotide: DNA and mRNA (cDNA) • Proteins sequences • The main nucleotide sequence databases are: • United states: Genebank (NCBI) • Europe: Nucleotide sequence database (EMBL) • Japan: DNA databank of Japan. (DDJB) • These databases also contain sequences related to: • Expressed sequence tags (ESTs) small (800 bp) of mRNA that be used to see what genes are expressed…
Protein Databases • The main protein databases is: • Uniprot (DB) databases contains data from three related databases sites: • SWISS-PROT (most up-to date information) • Trembl: (translation of coding sequences.) • PIR database [protein information resource] • Both the nucleotide and protein databases contain much more detail than just sequences. The data is generated is referred to gene annotated data.
The Annotation of genes • Once the gene sequence’s have been determined then the data must be annotated, This basic annotated data includes: (Klug 2010) • Identify regulatory regions • Identify coding sequences (cds); the exons/ introns (if a sequence; eukaryotic)…. • The amino acid sequence for the gene. • Other organisms where the DNA sequence/ AA sequence is to found • Journals/Reference to where data came from. • Links to other databases that contain information about the gene, Global Sequence
Bioinformatics Database • To faciliate finding annotated data about genes and protein information there are a number of sites containing specific search engines; • NCBI has ENTREZ • EMBL has the EBI search page previously SRS engine • The SIB ExPaSysearch engine(This is more fosuces on protein related information. ) • Consider the following query: • What is the DNA and amino acid sequence for the following gene: Human BTEB • Type the following into the search text box: • Human[orgamism] AND BTEB[title]
Coding section of gene The Exon intron structure is also available in graphic form
Further information • On the right hand column you will find links to online analytical resources; e.g. BLAST (psi-blast) (a tool to search for similar sequences contained in the database): • Information on the amino acid sequence obtained for the CDs of the gene. The text box also provides a link to information on the protein in the uniprot database.
An EMBL nucleotide record • Annotated data can also be found in the EMBL database: • BTEB EMBL record.: shows the main record. • Clicking on the “text” link at the top right hand corner will give the essential features of the gene. BTEB-EMBL-EBI_text_record. • An ExPASy database search gives the following information for this gene: Type BTEB and then BTEB and Human
The BTEB Protein record A link to a graphic representation of the protein and the relevant annotated data can be found at: BTEB Human Protein
Other databases databases • The nucleotide (Genbank and EMBL) and protein (Uniprot) contain the “raw data” and are referred to as “primary databases”. • More specific databases derive data from these and are referred to as secondary database; examples include protein family and sequence similarity databases such as PROSITE and PRINTS • There are databases which contain information about specific organisms such as e. coli using Genome online database (GOLD)
Other databases • Databases for specific types of sequences such as those associated with promoters and other regulatory elements. dbEST; Homologous structure alignment database. • Structural databases from the Protein Data Bank • On-line Mendelian inheritance of man (OMIM) which contains information on human genes and genetic disorders. • The nucleic acids research journal January edition provides up-to-date analysis of current online bioinformatics databases: Nucleic acid research database edition
Other important information sources • PUBMED: Literature research: journal articles/ conference proceedings/ books etc. • Search under many fields: keyword, author…. • Returns: journal articles/abstracts • Two types: general/review. • BTEB pubmed search found at: • http://www.ncbi.nlm.nih.gov/pubmed?term=BTEB&cmd=DetailsSearch • The user can register a NCBI account to manage their activity and store findings of: gene searches; pubmed searches…. This information can be download, emailed….
Exercise • The EMBL-EBI record: BTEB_”text”_record. • The NCBI : BTEB NCBI Nucleotide Record • The DDJB: BTEB flatfile Record • Exercise: write a briefy report comparing and contrasting the core elements of both records: refer to page 8-16 in Bioinformatics: A practical guide to the analysis of genes and proteins 3rd edition ; Book can be found in the library.
Exercise • Search for the following gene “DNA” sequence: • Human Leukocyte Elastase gene linear DNA [ hint should be 5292 bp long]. • Retrieve the record and download and save the fasta file.