150 likes | 268 Views
Migrating to DayCart. Introduction. Thor database concepts. Data (Chemical Structure) hierarchy. Thor data model Daylight/Oracle cartridge data model. Other Considerations. What are the steps. Demo. Thor Database Concepts. Datatrees, Datatypes, Dataitems and Datafields.
E N D
Introduction • Thor database concepts. • Data (Chemical Structure) hierarchy. • Thor data model • Daylight/Oracle cartridge data model. • Other Considerations. • What are the steps. • Demo.
Thor Database Concepts • Datatrees, Datatypes, Dataitems and Datafields. • These four concepts are closely related. • Datatrees - Is the method for representing chemical information. • Datatypes - A set of definition that indicate the meaning of a dataitems fields. • Dataitems - Tag names a datatype, which in turn defines the meaning of each datafield. • Datafields - A string of characters; the unit of data.
Data (Chemical Structure) hierarchy • Data stored for each chemical entity are organized on a hierarchical basis with each entity expressed in terms of Parent, Version and Preparation(Oracle only). • Parent is the basic structure which is free of salts, solvates and radio labels. • Version is the translation of a Parent into an actual compound, i.e. a free base, salt, solvates, etc... • Preparation being a discrete batch of that Version.
Parent Version 1 Version 2 Version n Prep Prep Prep Prep Prep Prep Prep Prep Prep Data (Chemical Structure) hierarchy Thor Database, with all molecule information
SMILES CC(C)C(O)C1CCCCC1 FP Timestamp 199701291010.03 Graph CC(C)C(O)C1CCCCC1 DB_NO 873 PISM CC(C)[C@@H](O)C1CCCCC1 ATOM_STER 2,1,1,4,1,7,1,10,1 . . DB_NO 97 VISM Cl.CC(C)[C@@H](O)C1CCCCC1 SALT 1,10 Daylight Thor data tree model
Thor data tree in lexical form $SMI<CC(C)C(O)C1CCCCC1> $DB_NO<873> PISM< CC(C)[C@@H](O)C1CCCCC1> ATOM_STER<2,1,1,4,1,7,1,10,1> $DB_NO<750> PISM< CC(C)[C@H](O)C1CCCCC1> ATOM_STER<2,1,1,4,2,7,1,10,1> BOND_STER<1,1,1,5,1,6> ISO<1,0,0> $DB_NO<97> VISM< Cl.CC(C)[C@@H](O)C1CCCCC1> SALT<1,10> . . .
Daylight/Oracle data model(1) Table:R_PARENT Name Null? Type ------------------------------------------------------------------------------------------------- DB_NO NOT NULL NUMBER(7) SMILES NOT NULL VARCHAR(700) ATOM_STER VARCHAR(200) BOND_STER VARCHAR(200) ISO VARCHAR(200) MIXT_INFO VARCHAR(200) CST VARCHAR(200) Table:R_VERSION Name Null? Type ------------------------------------------------------------------------------------------------- DB_NO NOT NULL NUMBER(7) SMILES NOT NULL VARCHAR(700) PARENT_DB_NO NOT NULL NUMBER(7) SALT_INFO VARCHAR(80)
Daylight/Oracle data model(2) Table:R_PARENT Name Null? Type ------------------------------------------------------------------------------------------------- DB_NO NOT NULL NUMBER(7) SMILES NOT NULL VARCHAR(700) Table:R_PARENT_INFO Name Null? Type ------------------------------------------------------------------------------------------------- DB_NO NOT NULL NUMBER(7) DATA_TYPE NOT NULL NUMBER(2) DATA_INFO NOT NULL VARCHAR(200) Table:R_VERSION Name Null? Type ------------------------------------------------------------------------------------------------- DB_NO NOT NULL NUMBER(7) SMILES NOT NULL VARCHAR(700) PARENT_DB_NO NOT NULL NUMBER(7) SALT_INFO VARCHAR(80)
Parent DB_NO SMILES Data Version_1 Version_2 Version_n DB_NO DB_NO DB_NO SMILES SMILES SMILES Parent_DB_NO Parent_DB_NO Parent_DB_NO Data Data Prep_2 Prep_1 Prep_1 Prep_2 Prep_3 Prep_2 Daylight/Oracle Cartridge model Data Data Prep_1 DB_NO Version_DB_NO Data
Other Considerations • 2D coordinates or connection table. • 2D coordinates as in Thor. • Connection table (Molfile, RXNfile, TGFfile … etc) • Data conversion for input/search. • Nitro groups (charge separated, double bond or don’t care). • Parent and salt molecules. • Indexes for normal column data. • Indexes for chemical structure data • i.e. ddexact, ddgraph, ddblob … etc.
Other Considerations • SMILES column size (what to use?). • 700 bytes or less , Oracle will allow you to put unique indexes and blob-based index. This may or may-not be important to you. • greater than 700 bytes you can use blob-based index.
What are the steps • Design the data model. • Consider input, access and modification of your data. • Consider how chemical data is going to be searched. • Design the database schema. • Export data from Thor (or use SD files). • Create PL/SQL program to load the data. If the data is formatted you may be able to use ‘sqlldr’ command to load the tables. • For data from SD files use mol2smi procedure.
Thanks • John Bradshaw • Jack Delany • Daylight team • Michael Wainwright (Modgraph)