220 likes | 542 Views
SDF File analysis. Creation , composition , checking. Concerning chemical table files. Chemical table files are files that contain information about chemicals Various formats RGfiles , Rxnfiles , RDfiles , XDfiles and Clipboard Molfile , SDF. MDL Molfile.
E N D
SDF File analysis Creation, composition, checking
Concerningchemicaltablefiles • Chemical table files are files that contain information about chemicals • Variousformats • RGfiles, Rxnfiles, RDfiles, XDfiles and Clipboard • Molfile, SDF
MDL Molfile • A file format for holding information about the atoms, bonds, connectivity and coordinates of a molecule • Most cheminformatics and somecomputationalsoftwaresareabletoread • Standard version: V2000 • Containing a header and a connectiontable
MDL Molfilecontent Generated by Molgen 5.0 11 9 0 0 0 0 -0.0666 -1.5989 0.0514 C 0 0 0 0 0 0 0 0 0 0 0 0 0 1.2913 -1.6184 -0.1221 C 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.9621 -1.2620 -0.9586 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.0783 1.8974 -0.4702 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.4844 1.6346 0.9333 O 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.5244 -1.8601 1.0528 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.7535 -1.3543 -1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1.9833 -1.8974 0.7324 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.9833 -1.2177 -0.8648 H 0 0 0 0 0 0 0 0 0 0 0 0 0 0.8090 1.5332 -0.8167 H 0 0 0 0 0 0 0 0 0 0 0 0 0 -1.3677 1.1615 1.1238 H 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 2 0 0 0 0 1 3 1 0 0 0 0 1 6 1 0 0 0 0 2 7 1 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 4 5 1 0 0 0 0 4 10 1 0 0 0 0 5 11 1 0 0 0 0 M END $$$$
MDL SDF file • SDF = structure-data file • Wrapsthemolfileformat
SDF content §1 – molecularinformations ./MinCheck/C2_H6_N0_O3_F0_S0_1.log OpenBabel04161413273D Gaussian 09 # G3MP2B3 Opt(Cartesian,Tight,CalcAll,MaxStep=1,MaxCycles=300) QCISD 11 9 0 0 0 0 0 0 0 0999 V2000 0.4466 -1.5390 0.0292 C 0 0 0 0 0 0 0 0 0 0 0 0 1.4790 -2.1676 -0.5273 C 0 0 0 0 0 0 0 0 0 0 0 0 -0.2693 -0.5704 -0.6322 O 0 0 0 0 0 0 0 0 0 0 0 0 -0.3941 2.0659 0.3307 O 0 0 0 0 0 0 0 0 0 0 0 0 -1.5836 1.3451 0.7668 O 0 0 0 0 0 0 0 0 0 0 0 0 0.1141 -1.7508 1.0446 H 0 0 0 0 0 0 0 0 0 0 0 0 1.7979 -1.9482 -1.5413 H 0 0 0 0 0 0 0 0 0 0 0 0 2.0238 -2.9170 0.0345 H 0 0 0 0 0 0 0 0 0 0 0 0 -1.0239 -0.2837 -0.0806 H 0 0 0 0 0 0 0 0 0 0 0 0 0.0506 1.3459 -0.1697 H 0 0 0 0 0 0 0 0 0 0 0 0 -2.2708 1.8377 0.2828 H 0 0 0 0 0 0 0 0 0 0 0 0 1 6 1 0 0 0 0 2 1 2 0 0 0 0 2 8 1 0 0 0 0 3 9 1 0 0 0 0 3 1 1 0 0 0 0 4 5 1 0 0 0 0 7 2 1 0 0 0 0 10 4 1 0 0 0 0 11 5 1 0 0 0 0 M END
SDF content §2 – input and calculatedparameters > <Scale factor> 0.96 > <Stoichiometry> C2H6O3 > <Charge> 0 > <Multiplicity> 1 > <Molecular mass> 78.03169 > <DegreeOfFreedom> 27 > <Permanent dipole moment(B3LYP, Debye)> 1.475 > <ABC(cm-1)> 14.133 1.731 1.655 > <Scaled freq(cm-1)> 49.1 59.1 80.1 182.8 222.6 335.5 460.0 529.6 663.0 762.0 812.3 911.3 928.1 944.3 1124.8 1287.3 1299.6 1321.8 1403.2 1483.7 1689.2 3041.9 3064.2 3147.0 3408.9 3472.7 3557.0 > <IR intensities(rel.)> 4.5 3.8 6.6 7.8 25.1 93.3 16.9 79.8 60.8 214.2 73.0 2.9 55.0 16.5 33.8 210.3 56.9 126.8 4.4 22.8 90.0 19.2 0.4 8.3 59.4 559.4 26.8 > <Temp(K)> 298.150 > <Pressure(atm)> 1.00000 > <DfHg_G3MP2B3(kJ/mol)> -269.7 > <Scaled S(J/molK)> 363.4 > <UNScaled CV(J/molK)> 98.9
SDF content §3 – moleculardescriptors > <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8; > <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O)) -H(-O(-H-O)) > <SMI> C(=C)O.OO > <MolRT> 3 > <InChi> InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H > <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N > <MCDL> CH;CHH;3OH[2,3;;;5] $$$$
Molecularfragmentschemes • Developedinthe ’50s • Screens (struturalkeys, fingerprints) havebeendevelopedinthe ’70s • Generallytheyrepresentbigstringscan be storedeffectively -> compressed • Importantrole • inproviding efficient substructure searching capabilities in large chemical databases, • insimilarity searching, • in clustering large data sets, • inassessing chemical diversity, • in conducting SAR and QSAR studies
Images of theoptimizedstructure(depicteddifferently) GaussView ChemDraw www.chemicalize.org (searchedafterInChI)
MPD (MOLPRINT 2D) • MPD = MolecularPopulational Dynamics • A molecular similarity searching technique based on atom environments • Atom environments are count vectors of heavy atoms present at a topological distance from each heavy atom of a molecule > <MPD> 2;1-1-2;1-1-9;1-1-13;2-3-13; 2;1-1-2;1-2-13;2-1-9;2-1-13; 9;1-1-2;1-1-13;2-1-2;2-1-13; 8;1-1-8;1-1-13;2-1-13; 8;1-1-8;1-1-13;2-1-13; 13;1-1-2;2-1-2;2-1-9; 13;1-1-2;2-1-2;2-1-13; 13;1-1-2;2-1-2;2-1-13; 13;1-1-9;2-1-2; 13;1-1-8;2-1-8; 13;1-1-8;2-1-8;
MNA • MNA = MultilevelNeighbourhoodof Atoms • 2D molecular fragments suitable for use in QSAR modelling • Output: a complete descriptor fingerprint per molecule • Fragment: startingat the origin, each atom is appended to the descriptor immediately followed by a parenthesized list of its neighbours > <MNA> -C(-H(-C)-C(-H-H-C)-O(-H-C)) -C(-H(-C)-H(-C)-C(-H-C-O)) -O(-H(-O)-C(-H-C-O)) -O(-H(-O)-O(-H-O)) -O(-H(-O)-O(-H-O)) -H(-C(-H-C-O)) -H(-C(-H-H-C)) -H(-C(-H-H-C)) -H(-O(-H-C)) -H(-O(-H-O)) -H(-O(-H-O))
SMILES (SMI) • SMILES = SimplifiedMolecularInputLineEntrySpecification • A linear text format which can describe the connectivity and chirality of a molecule • Specificallyrepresents a valence model of a molecule, not a computer data structure, a mathematical abstraction, or an "actual substance" > <SMI> C(=C)O.OO
MolRT (easteregg, it’s molarity…)
InChI • InChI = International Chemical Identifier, • Areliable computerized method to represent identities • Arepresentation of the chemical structure with details • Simple, but unique identifier for molecules (like a barcode) • Differentlayers separated with delimiters (/) • Main layer • Charge layer • Stereochemical layer • Isotopic layer • Fixed-H layer • Reconnected layer + = = • > <InChi> • InChI=1S/C2H4O.H2O2/c1-2-3;1-2/h2-3H,1H2;1-2H
InChiKey • Ashortened and more browser-preferable form of InChIcode • Its lengths is fixed in 27 characters • The first 14 represent the molecular skeleton/connectivity matrix • Nextlayer contains8+1 characters • the first 8-character block encodes stereochemistry and isotopic substitution information • +1 character defines the kind of InChIKey (S=standard, N=non-standard) • Nextcharacter: used version of InChI • Finishingcharacter: protonationindicator > <InChiKey> JJZZTHKXWWHOAE-UHFFFAOYSA-N
MCDL • MCDL = MolecularChemicalDescriptorLanguage; firstlypublishedin 2001 • Developed for linear representation of structural and other chemical information for chemical databases • Similarto InChI:both languages are modular, constitution, connectivity, and stereochemistry is represented by individual „modules” • MCDL provides direct placement of hydrogen atoms, whereas InChI uses a separate block > <MCDL> CH;CHH;3OH[2,3;;;5]
Otherusefullinks and references • Todeschini, Roberto / Consonni, VivianaMolecular Descriptors for Chemoinformatics, 2., revised and enlarged Edition, 2009.ISBN 978-3-527-31852-0 - Wiley-VCH, Weinheim • Bender A, Mussa HY, Glen RC, Reiling S.: Similarity searching of chemical databases using atom environment descriptors(MOLPRINT 2D): evaluation of performance, J ChemInfComput Sci. 2004 Sep-Oct;44(5):1708-18. • GakhAA, Burnett MN.: Modular Chemical Descriptor Language (MCDL): composition, connectivity, andsupplementary modules, J ChemInfComput Sci. 2001 Nov-Dec;41(6):1494-9. • http://arxiv.org/ftp/arxiv/papers/1311/1311.3723.pdf • http://openbabel.org/wiki/Multilevel_Neighborhoods_of_Atoms • http://openbabel.org/wiki/SMILES • http://www.daylight.com/meetings/summerschool98/course/dave/smiles-intro.html • http://www.inchi-trust.org/ (and referencestherein) • http://www.iupac.org/home/publications/e-resources/inchi/download.html (and referencestherein) • http://www.chemspider.com/inchi-resolver/
Yourobjectivesfortoday • Tocheckyour .sdf file fortwochosenisomers • Tocollectallthecodes • Tocomparethemwitheachother and finddifferences