180 likes | 314 Views
MartLoader. Anthony Cros Joachim Baran. Overview. Motivation Creating ICGC marts Performance evaluation Improvization Current MartLoader implementation MartLoader in development Aims Architecture Front-end: configuration and mart planning
E N D
MartLoader Anthony Cros Joachim Baran
Overview • Motivation • Creating ICGC marts • Performance evaluation • Improvization • Current MartLoader implementation • MartLoader in development • Aims • Architecture • Front-end: configuration and mart planning • Back-end: parallelization and mart generation
Creating ICGC Marts • Import data in line with the ICGC Submission Manual • Transform data values/ranges • for BioMart compatibility • for presentation through BioMart’s web-interface • Generate a mart database
Comparison to MartBuilder • Similar workflows, but different requirements Not in 3NF. Decoding and Rewriting Schema Discovery Validation Builder Loader 3NF Denormalization Generic Transformation* *Currently focusing on ICGC only.
current martloader Over to Anthony
Performance Evaluation • Is the choice of the database system relevant? • PostgreSQL vs MySQL MyISAMvs MySQL InnoDB • Is using UNIX file based joins faster than database joins? • relational databases are optimized for joining data • file based joins require • sorting of the key columns of the input file • joining on the sorted key columns • importing the result into the database
Performance Evaluation • PostgreSQL slightly outperforms MySQL • Database backend not of major importance though
Performance Evaluation • UNIX file based sort + join + SQL import fastest
Developing an Improved MartLoader • Rapid configuration with bespoken UI • Optimized mart generation and data loading • Being generic & flexible Exchangeable components: • Front-end • Abstract data model • Planning algorithm • Back-end • Parallelization • Mart generation Backend Interface File Abstract Data Model File Graphical User Interface Planning Algorithm File-Processing Mart Generation
Versatile Use of Interfaces • Serve ICGC specific needs • interface elements reflecting ICGC Submission Manual parameters • consistency checking and verification of the data model .xls ICGC configuration • tailored interface • input assistanceand verification Backend Interface File Abstract Data Model File Graphical User Interface Planning Algorithm
Versatile Use of Interfaces • Serve ICGC specific needs • interface elements reflecting ICGC Submission Manual parameters • consistency checking and verification of the data model • WormBase/ WormMart • ease the migration to 0.8 • demonstrate MartLoader’sgeneric interface to inspire other developers * .xls ICGC configuration WormBase • tailored interface • input assistanceand verification Backend Interface File Abstract Data Model File Graphical User Interface Planning Algorithm *AceDB:aC. elegansdatabase
Versatile Use of Interfaces • Serve ICGC specific needs • interface elements reflecting ICGC Submission Manual parameters • consistency checking and verification of the data model • WormBase/ WormMart • ease the migration to 0.8 • demonstrate MartLoader’sgeneric interface to inspire other developers • MartBuilder • incorporate the advanced backend into MartBuilder * .xls ICGC configuration MartBuilder WormBase • tailored interface • input assistanceand verification Backend Interface File Abstract Data Model File Graphical User Interface Planning Algorithm *AceDB:aC. elegansdatabase
Mart Planning Decoding and Rewriting Validation Transformation
backend Over to Anthony