1 / 20

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES. Lecturer: Junaid Gamieldien, PhD junaid@sanbi.ac.za. Day2: Specialized Databases. http://www.sanbi.ac.za/training-2/undergraduate-training/. WHAT YOU NEED TO LEARN:.

mele
Download Presentation

BTN323: INTRODUCTION TO BIOLOGICAL DATABASES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BTN323:INTRODUCTION TO BIOLOGICAL DATABASES Lecturer: Junaid Gamieldien, PhD junaid@sanbi.ac.za Day2: Specialized Databases http://www.sanbi.ac.za/training-2/undergraduate-training/

  2. WHAT YOU NEED TO LEARN: • What are protein pattern/fingerprint/motif databases and why are they important? • What are the benefits using ontologies in database design? • How do model organism databases support human health research?

  3. PATTERN DATABASES • Sometimes alignment-based methods find no hits to provide us with clues about a novel gene/protein’s function • Then we turn to finding MOTIFS - common conserved sequence elements in protein families • In many cases a motif consists of distinct subparts that are highly conserved in the sequences, while the regions between these subparts have little in common. • If we have a database of these patterns, we can assign potential function to a novel protein by finding one or more known motifs…

  4. Protein • Similar sequence  Similar function • Also true for subsections of a protein • Motifs or signature sequences e.g. DNA binding motifs EVOLUTIONARY CONSTRAINT! Sequence B Sequence A

  5. INTERPRO: INTEGRATED PATTERN DATABASE • Integrated resource for protein families, domains, regions and sites • Combines several databases that use different methodologies well-characterised proteins to derive protein signatures. • Capitalises on their individual strengths => powerful integrated database and diagnostic tool (InterProScan)

  6. MEMBER DATABASES • ProDom: provider of sequence-clusters • PROSITE patterns: regular expressions. • PRINTS provide protein ‘fingerprints’ • PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs).

  7. INTERPRO PROTEIN ‘SITES’ • Conserved Site - any short sequence pattern that may contain one or more unique residues • Active sites - one or more signatures cover all the active site residues • Binding sites bind chemical compounds • A Post-translational Modification modifies the primary protein structure, eg. glycosylation, phosphorylation, etc.

  8. INTERPRO SEQUENCE ANALYSIS: INTERPROSCAN • Searching against different functional site databases has become a vital for the prediction of protein function (where e.g. BLAST fails). • Different DB’s have different strengths and weaknesses of their underlying analysis methods. • Ideally, all of the secondary databases should be searched against to ensure the best results. • This is exactly what InterProScan does (part of todays practical topic)

  9. BIO-ONTOLOGIES • Community developed agreements on terms/concepts describing a topic and also the relationships between them • The Gene Ontology (GO) is the most widely used • The GO provides common language to describe a gene product's biology in terms of: • Molecular Function • Biological Process • Cellular Location • Several others e.g. anatomy, cell types, disease, phenotype, pathway, …

  10. involves GENE-X

  11. ADVANTAGES OF GO (AND MANY OTHER BIO-ONTOLOGIES) IN DB DESIGN • A common language applicable to any organism • Represents and organises information in a way that both humans and machines can understand • GO terms can be used to annotate gene products from any species • Enables easy comparison of information across species

  12. ADVANTAGES OF GO (AND MANY OTHER BIO-ONTOLOGIES) IN DB DESIGN (2) • Terms make good entry points for database searches • Researchers can search for what they really mean (and meaning is more consistent between individuals) • Transitive links of biological objects query term via it’s child terms ensures that ALL relevant results are returned automatically • Reverse’ queries can easily be done to return termswhen biological objects are used as queries

  13. GENE-X will be returned even if query is done at this level involves GENE-X Using GENE-X as the query can return ‘cytokinesis’ and even all its parent terms

  14. MODEL ORGANISM GENETIC DATABASES • Very useful for collecting results from genetic (and other) experiments that cannot be done on humans • Disease models • Gene knockouts • Drug testing • Environmental manipulation • In terms of genomics, model organism data is invaluable to unravel: • Gene and protein functions • Gene to phenotype relationships • Gene to disease associations • The aim of these databases is to integrate all relevant information in one place • More easy to mine database for novel associations • Enables linking between databases

  15. RAT AND MOUSE GENOME DB’S – DATA TYPES • Genes, proteins and their annotations including Gene Ontology links and expression information • Phenotypes – described by terms in the Mammalian Phenotype Ontology • From gene knockout models produced by the project and their partners • From evidence mined from the literature • Disease, Pathway and Behaviour ontologies and relevant gene associations also present in RGD

  16. DESIGNED FOR EASE OF USE • Web query interfaces are intuitive • Several traditional ways to query – gene names, symbols, chromosomal location • Query interfaces for ontologies (Disease, Phenotype, Pathway, Behaviour) • Ontology annotations can easily be retrieved for any gene or protein • Both databases have links to human genes, which simplifies mouse and rat evidence-driven in-silico exploration into human diseases and phenotypes

More Related