140 likes | 285 Views
BeeHive a datamining tool at Biovitrum and iNovacia. Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006. Research Informatics’ Philosophy. All data in Oracle Safe, pharma industry standard (e.g. many chemical cartridges, ChemAxon, MDL, Accelrys, ...)
E N D
BeeHive a datamining tool at Biovitrum and iNovacia Mats Dahlberg Research Informatics iNovacia AB, Sweden ChemAxon UGM, Budapest June 7 2006
Research Informatics’ Philosophy • All data in Oracle • Safe, pharma industry standard (e.g. many chemical cartridges, ChemAxon, MDL, Accelrys, ...) • ”Data is our asset. Programs come and go.” • Integration through database layer • ...but hidden to the users. Multiple front-ends allowed • Applications rapidly adapted to users needs • Close connection developers - users • Workflow support requires full control over the code • Unorthodox solutions are allowed • Sometimes quick and dirty development • Sometimes unstable code (but usually fixed quickly...) • Sometimes non-standard technical platform (e.g. Bee language)
BeeHive • Function • Main repository for ALL research data (almost) • Used by all project teams • Technical platform for various modules • Features • Advanced on-the-fly join of DB table • Versatile handling of lists (compounds, batches, projects ...) and Queries • Data grouping (”One-line-per-compound”) • Fully customisable through meta-data, easy to add new branches (CBT, ELN stats etc) • Structure searching through ChemAxon Oracle cartridge • Built on Bee language from MolSoft LLC, San Diego • Status • Moved from MDL’s cartridge 2006 • Business critical. Appr 250 users throughout R&D
The heart – just a SQL generator… • Defines column types and cost for all joinable columns • All possible joins are pre-calculated, travelling salesman problem (more then 300 tables)
Prog 1 Prog 2 Cross database client Example from Biovitrum Meta data structure • Define entities and clean up the dictionaries • Compound numbers, protein targets, batches, plasmids ... • One source for every entity possible to validate numbers no misspellings improved data quality • This is the core of integration - not a particular client or system • None of this comes out-of-the-box!
Query builder with structural searching Navigate through all tables Activity, solubility, chemist etc BeeHive Overview
All unique values in drop-down lists • No hard-coded values • Easy to spot errors Query builder
One compound per line • Average IC50 and SD values • Hill number from ActivityBase • Structure pop-up window Extraction of data for SAR analysis
Systems and applications:BeeHive Modules That Uses JChem • CIMS • Chemical Inventory Management System • Keeps track of all chemicals (bottle history, location, risk phrases etc) • Replaced previous MDL system • Fully barcoded (bottles, shelves, people...) • Has improved compliance, reagent availability and speed of inventory work • Reagent Search • ACX database of chemical catalogues from CambridgeSoft • Cross-linked to CIMS • ”Give me all amines under 250 Dal and show in-house on top of the list”
Systems and applications:BeeHive Modules /cont’d/ • ChemSpec • Registration of all new compounds • Structure based logic for new compounds and batches • BVT (iNo) number assignment • Connection point for analytical data and requests • Used by all medicinal and analytical chemists
What is next on the list? • JChem Calculated properties on all molecule databases • pKa, logP, logD, ... • Generation of diverse screening sets on the fly (BCUT?) • ...
Summary - informatics • Data sharing is crucial • Excel is not enough! • No database no modelling • Each organisation must define their meta data • You need a database administrator • Define the data structure first - applications can be improved gradually