1 / 8

Bioinformatics

Bioinformatics. Ayesha M. Khan 20/2/12. Introduction to databases. If we are to derive the maximum benefit from the deluge of sequence information, we must deal with it in a concerted way by doing the following: Establish Maintain Disseminate  the information contained in databases.

nam
Download Presentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Ayesha M. Khan 20/2/12 Lec-3

  2. Lec-3 Introduction to databases If we are to derive the maximum benefit from the deluge of sequence information, we must deal with it in a concerted way by doing the following: Establish Maintain Disseminate the information contained in databases

  3. Lec-3 Introduction to Databases • Databases are effectively electronic filing cabinets, a convenient and efficient method of storing vast amounts of information. • Central, shareable resources Many different types of databases, depending on -Nature of information being stored -Manner of data storage

  4. Lec-3 Primary & Secondary databases • Primary and secondary databases are used to address different aspects of sequence analysis, because they store different levels of protein sequence information • Primary or derived databases • Primary databases: experimental results directly into database • Secondary databases: results of analysis of primary databases • Aggregate of many databases /Composite databases • Links to other data items • Combination of data • Consolidation of data

  5. Lec-3 Primary sequence databases • Early 1980’s • Nucleic acidEMBL (Europe), GenBank (USA), DDBJ (Japan) • Protein PIR, MIPS, SWISS-PROT, TrEMBL, NRL-3D

  6. Lec-3 EMBL: • EMBL is the nucleotide sequence database from European Bioinformatics Institute (EBI) • It has sequences from: direct author submissions, genome sequencing groups, scientific literature and patent applications. DDBJ: • DNA databank of Japan, produced maintained and distributed at the National Institute of Genetics. GenBank: • DNA database from National Center for Biotechnology Information (NCBI).

  7. Lec-3 Principal requirements of a database The principal requirements on the public data services are: • Data quality - data quality has to be of the highest priority. However, because the data services in most cases lack access to supporting data, the quality of the data must remain the primary responsibility of the submitter. • Supporting data - database users will need to examine the primary experimental data, either in the database itself, or by following cross-references back to network-accessible laboratory databases. • Deep annotation - deep, consistent annotation comprising supporting and ancillary information should be attached to each basic data object in the database. • Timelines - the basic data should be available on an Internet-accessible server within days (or hours) of publication or submission. • Integration - each data object in the database should be cross-referenced to representation of the same or related biological entities in other databases. Data services should provide capabilities for following these links from one database or data service to another.

  8. Lec-3 Exercise Look for a gene of your interest in the three primary nucleic acid databases: compare the information given in each one of them.

More Related