560 likes | 657 Views
LSM3241: Bioinformatics and Biocomputing Lecture 3: Molecular database development Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg http://bidd.nus.edu.sg Room 07-24, level 7, SOC1, National University of Singapore. Development of a protein inhibitor database.
E N D
LSM3241: Bioinformatics and BiocomputingLecture 3: Molecular database developmentProf. Chen Yu ZongTel: 6516-6877Email: csccyz@nus.edu.sghttp://bidd.nus.edu.sgRoom 07-24, level 7, SOC1, National University of Singapore
Development of a protein inhibitor database • Step 1: Start from a protein (alternative names from Swissprot) • Step 2: Search for inhibitors using protein name and alternative names (from Medline, or Google) • Step 3: Find additional information about each of the inhibitors (from ChemIDplus database) • Step 4: Database construction
Assignment of a protein • Each student is given an Id number, use this number to get your protein from the protein list posted on the module home page. • In case that you wish to work on a different protein instead of the one assigned to you, get another one from a list of back-up proteins posted on the module home page. • Earn extra credit by working on extra number of proteins
Inhibitor search • Search Medline by using keywords (protein name + inhibitor). • In case no inhibitor or too few inhibitors can be found from medline, search google or yahoo as an alternative source. • Make sure the protein name and its alternative names (get from Swissprot) are used in the search. • Record inhibitor name and reference source (journal issue, page and publication year; internet source etc.). • Record the number of references checked (including those found from the keyword-search but without inhibitors) • Try to find as many inhibitors as you can. The more known inhibitors you get, the more credit you can earn.
Finding chemical information • Search ChemIDPlus database (http://chem.sis.nlm.nih.gov/chemidplus/) for additional information of each inhibitor • Before using ChemIDPlus, download Chime from MDL Click here to search Select “Equals” for inhibitor name Select “Contains” for other keywords Input inhibitor name Click here to select Chime option in ChemIDPlus (Chime is needed to view structure) More Info at: http://sis.nlm.nih.gov/Chem/ChemMain.html
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? Click here to get full record Click here to get structure
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database?
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? Right-click in this field to get 3D structure file
ChemIDPlus database Additional information of each inhibitor may be found from this database How to use this database? • Steps for saving the file: • Open a word or text file • Select all text in the source file • Copy (using Ctrl+C) and paste (using Ctrl+V) to the word or text file • Save the word or text file as chem_name.doc or chem_name.txt
Generating 3D structure from 2D structure Use of ViewerPro software Save 2D structure as a MDL file
ViewerPro software Load 2D structure MDL file (viracept.mol) Save 2D structure as a MDL file
ViewerPro software Save to 3D structure: select File => Save as (pdb format) Save 2D structure as a MDL file
ViewerPro software Re-load pdb file and clean structure: select Modify => Clean Structure (click here 10 times) Save 2D structure as a MDL file
ViewerPro software Save as pdb file: select File => Save as (pdb file) Save 2D structure as a MDL file
Database construction Information needed to construct database Inhibitor NameAspirin Protein TargetCox I (SwissProt accession number) [1] CAS Number50-78-2 [2] FormulaC9-H8-O4 [2] Systematic NameO-Acetylsalicylic acid, Salicylic acid, acetate Synonyms2-(Acetyloxy)benzoic acid, 2-Acetoxybenzoic acid [2], A.S.A. empirin [2] ClassificationAnalgesic, Anti-inflammatory agents, non-steroidal Notes: The prototypical analgesic used in the treatment of mild to moderate pain. It has anti-inflammatory and antipyretic properties and acts as an inhibitor of cyclooxygenase which results in the inhibition of the biosynthesis of prostaglandins. Aspirin also inhibits platelet aggregation and is used in the prevention of arterial and venous thrombosis. (From Martindale, The Extra Pharmacopoeia, 30th ed,p5) References: 1:OMIM 176805 (http://www.ncbi.nlm.nih.gov/htbin-post/Omim/dispmim?176805) 2:ChemIDplus Database (http://chem.sis.nlm.nih.gov/chemidplus/) 3D structureaspirin.pdb
Database construction Database development
Database construction Database development
Database construction Database development
Database Construction Database Definition: • A Database is a shared collection of Inter-Related data designed to meet the needs of multiple types of users and applications. This implies that multiple user VIEWS can be defined • Data stored is independent of the programs which use it • Data is structured to provide a basis for future applications DATABASE = Stored Collection of Related Data May be physically distributed
Database construction Database development
Database Construction Data/Information A General Definition: DATA - raw (unprocessed or partly processed) facts which represent the state of entities (things) which have occurred INFORMATION - data which has been processed into a form USEFUL TO THE USER What is Information to one user may be Data to another user.
Database construction Database development
Database Models - Hierarchical owner/parent child /parent owner member child child/parent
Database Models - Network ‘set’ of data owner member owner member ‘set’ of data Note: Only linked sets can be accessed
Database Models - Relational table table table table table A B C DE Any table(s) can be joined to any other table(s), provided there is a means of effecting the join Primary key / Foreign key concept. Data redundancy No fixed linkages
Example: 2 Relations in a Database EMPNUM NAME Date of Birth DEPTNUM 3 JONES 16-05-1956 605 7 SMITH 23-09-1965 432 11 ADAMS 11-08-1972 201 15 NGUYEN 23-10-1964 314 18 PHAN 16-11-1976 201 Relation (Table) Name : EMP Relation Schema: EMP(empnum,name,date of birth,deptnum) DEPTNUM DEPTNAME 201 Production 314 Finance 432 Information Systems 605 Administration Relation (Table) Name : DEPT Relation Schema: DEPT(deptnum, deptname)
Table Name: EMP Table Name: DEPT EMPNOENAMEJOB DEPTNO 7839 KING EXECUTIVE 10 7698 BLAKE MANAGER 30 7782 CLARK MANAGER 10 7566 JONES MANAGER 20 DEPTNODNAME LOC 10 ACCOUNTING CLAYTON 20 RESEARCH CAULFIELD 30 SALES PENINSULA 40 OPERATIONS GIPPSLAND Definition of a Relational Database • A relational database is a collection of relations or two-dimensional tables. Database
Server Data Models Model ofsystemin client’smind Entity model ofclient’s model Table modelof entity model Tables on disk
Relational Database Management System Server User tables Data dictionary
String Matching Based on 1. Analysis of query pattern 2. Auxiliary table 3. Character mapping 1 2 3 4 5 6 7 8 9 10 11 L E C T U R E R O O M L occurs in position 1 E occurs in positions 2 and 7 C occurs in position 3 T occurs in position 4 U occurs in position 5 R occurs in positions 6 and 8 O occurs in positions 9 and 10 M occurs in position 11 Matching starts <Right> character and progressively shifts left on match
Database construction Database development: data format
Database construction Database development: data languages
Database construction Database development: An example of data language
SQL statementis entered Statement is sent to database SQL> SELECT loc FROM dept; Database Data is displayed LOC ------------- CLAYTON CAULFIELD PENINSULA GIPPSLAND Communicating with a RDBMS Using SQL
Database construction Data Definition Language (DDL) used to describe data at the database level Schema level - complete database description Sub-Schema level - user views (restricted) Data Manipulation Language (DML) Provides for Create Insert Update Retrieve (extract) Delete Drop Modify Calculation Report capabilities
Using Oracle • SQL • Structured Query Language: DDL, DML, DCL • Universally supported by database servers • SQL*Plus • Oracle command-line SQL interpreter • Account (schema) = independent working space
SQL data types a: length.b: length, scale.
General Data Types • Used to augment an attribute description and to provide a means of Integrity ’Normal’ data types are : Character (or text) Numeric - Integer, Decimal, Money, Float (in Access Long and Short Integer, Decimal with options of a number of ‘decimal’ places) Date - Standard date format - Access has various forms Logical - Yes/No True/False Exists/Does Not Exist
Data Attribute Size In many cases this is set by the Data Definition facilities e.g.. Date, Short Integer, Long Integer, Logical,Number Others are set by the Designer: Number of Characters, Size of a ‘Decimal Number’ Access allows for a Default value Duplicates/ No Duplicates allowed Primary Key nomination Indexing ‘Required’ Status of an Attribute
DDL Example CREATE TABLE books ( title VARCHAR2(60), author VARCHAR2(60), isbn NUMBER(10,0) CONSTRAINT pk_books PRIMARY KEY, pub_date DATE DEFAULT SYSDATE ); PRIMARY KEY: column serves as a unique identifier SYSDATE: returns the system’s current date “;” terminates a SQL statement
DDL Example CREATE TABLE book_reviews ( isbn NUMBER(10,0) CONSTRAINT fk_books_booksrev REFERENCES books(isbn), reviewer VARCHAR2(30), comments VARCHAR2(150) ); • REFERENCES: value in this column must be one of the values in the isbn column of the table “books”. This is called referential integrity.
DML Example INSERT INTO books VALUES( ’The Importance of Being Earnest’, ’Oscar Wilde’, -- this is a comment 9876543210, ’14-FEB-1895’ ); --: Comment. The part from this symbol till the end of the line will be ignored by the database server. ’14-FEB-1895’: Date format. DD-MMM-YYYY or DD-MMM-YY
DML Example SELECT * FROM books; SELECT * FROM books WHERE author = ’Oscar Wilde’; *: every column ’Oscar Wilde’: strings are quoted by ‘. If ‘ is part of the string, it is escaped by double it (e.g. string a’b should be ‘a’’b’ in SQL)
DML Example SELECT title FROM books WHERE author LIKE ’Oscar%’; You will get: The Importance of Being Earnest LIKE: fuzzy matching (in contrast to =) %: wild character: % represents any string of any length (Osc%r matches Oscar, Osceer…) _ represents any single character (Osc_r matches Oscar, Oscer…)
DML Example COMMIT; ROLLBACK; if you have modified the data in database, such as inserted or deleted an entry, you can either: COMMIT to make this change permanent, or ROLLBACK to cancel your operation (the data will look as if you just finished your last COMMIT).