120 likes | 276 Views
HKIS Project An integrated platform for data driven biological experiments Brussels 18 th -19 th March 2004. HKIS Project Partners. ISoft (F) – Coordinator Curie Institute (F) Ulm University Hospital (G) European Oncology Institute (I) Informatics Research Laboratory (F).
E N D
HKIS ProjectAn integrated platform for data driven biological experimentsBrussels 18th-19th March 2004 HKIS – Workshop – Brussels 18-19 March 2004
HKIS ProjectPartners • ISoft (F) – Coordinator • Curie Institute (F) • Ulm University Hospital (G) • European Oncology Institute (I) • Informatics Research Laboratory (F) HKIS – Workshop – Brussels 18-19 March 2004
HKIS Platformuser requirements and project architecture • HKIS platform is an integration platform aimed at supporting bio experts in their data driven experiments. • Supports all data driven experimentation phases • Exploration • Studies • Automatisation, reuse and share • Allows share of information and know-how between multi disciplinary teams • Physicians • Bio-informaticians • Bio-statisticians • Biologists HKIS – Workshop – Brussels 18-19 March 2004
HKIS Platformmain features • It provides functions for • Accessing the data • Querying the data • Processing the data • Store, report and compare the results • Re-play and automatise data driven experiments HKIS – Workshop – Brussels 18-19 March 2004
HKIS Platformmain features • Access to any data of any source • Genes • Chromosomes • Sequences • Patient • Other • Bio community reference data • Arrays data • Clinical data • Other • Crossing of any data with any data • Relate immediately any data from any source to any other available data • Annotations, identifiers, functions, … • Get “semantically” related information • No need to know the detail of formats : transparent mapping • Embedding of any data processing • External tools embedding • Blast, Xcluster, Cluster, TreeView, … • Scripts and algorithms embedding • R, S+, Matlab, Perl, … • Very high performances • volume, speed, power of expression (e.g. 1 million lines per second, terabytes) • Ease of use, real time and fully interactive, no programming HKIS – Workshop – Brussels 18-19 March 2004
HKIS PlatformDefinition • HKIS platform IS NOT • A monolithic integrated tool • Like Rosetta, GeneSpring, … • A graphical programming environment • Like Clementine, InsightFull miner, … • A meta data model nor an interface to existing DB • Like Integr8, …, SRS, … • A set of newly developed algorithms • HKIS platform IS • An integration platform • Allowing traceable and re-usable integration of data processing chains • Based on a general purpose real time data processing engine • An open, evolutive and modular architecture • Allowing addition of existing algorithms and tools or specific ones • A connectivity platform for data integration • Allowing any data crossing whatever their origin and volume is • A interactive support to data driven experimentation methodology HKIS – Workshop – Brussels 18-19 March 2004
HKIS Platformdata access features • An open connectivity platform • An answer to the lack of standards • Gives homogeneous access to • EMBL, RefSeq, UniGene, LocusLink, MapView, NetAffx, Swiss-Prot, TrEMBL, GO, Golden Path • Chip array data, CGH data, Clinical data, etc. • Access to EnsEMBL, GenBank, dbSTS, dbSNP, InterPro, OMIM, GeneCards and HUGO is in development • Generic parsing mechanisms available for easy extension • adding a new data source is a matter of days • Instant data crossing • E.g. DNA arrays with CGH • Gene grouping by function / location / … • Generic data joining mechanisms • Allows fast answer to complex queries • E.g. Retrieve all ESTs for which the 3' and 5' end sequences are both present in available databases. • E.g. Retrieve the list of Swissprot proteins encoded by a gene present in the genomic region surrounding a list of STS markers. • E.g. From a microarray experiment, among the differentially expressed genes, retrieve the ones encoding a protein having a signal peptide. Distant Databases …/… HKIS – Workshop – Brussels 18-19 March 2004
HKIS Platform Data quality • Data quality support • Quality is essential but never really addressed • Difficult, time and cost consuming • Risk of wrong results • All functions to detect, count, correct and report • At the “morphological” level • Missing values, outliers, bad formats • At the biological level • Incoherent data, contradictional data, bias • All functions to monitor quality evolution • Replay of data quality analysis • Comparison of successive replays HKIS – Workshop – Brussels 18-19 March 2004
HKIS PlatformImplementation • HKIS Platform is based on AMADEA data morphing technology • General purpose data access and data transformation engine • Graphical design of data transformation processes • Developped in C++ • HKIS platform developments • Starting from biology user requirements • Extension to make possible custom libraries definition • Specific processes • Specific data access modules HKIS – Workshop – Brussels 18-19 March 2004
Existing Algos & scripts Specific Algos & scripts External tools Existing Algos & scripts Specific Algos & scripts External tools Existing tools Existing Algos & scripts Specific Algos & scripts Embedding libraries Data access libraries General purpose libraries Biology libraries Web databanks HKIS PlatformImplementation Curie, Ulm & IEO experimentation projects Data Morphing Engine Library manager Data heap HKIS – Workshop – Brussels 18-19 March 2004
HKIS PlatformConclusion • A unique solution for • Data processing : an answer to lack of standards • Immediate access and crossing of heterogeneous and large volume of biological data • Traceable and re-usable experimentation scenarios • Transparent embedding of existing and/or specialised tools and algorithms with no rupture • Deploying through the web • Performances in terms of • speed (10 to 100 faster than PERL or SQL) • Large volumes (terabytes of data) • Ease of use (no programming) • Power of expression (very powerful GP data transformation engine) • Team communication • Informatics, bio-informatics, bio-statistics, biology, medicine • A breakthrough in the bioinformatics approach(accordingly to the workshop report on “Bioinformatics – structures for the future” – June 2003) HKIS – Workshop – Brussels 18-19 March 2004
HKIS PlatformConclusion 2 • Current status • 9 months to the end • First prototype available • Full integration of selected algorithms • Evaluation in progress • Web interface development • Contact • hkis@isoft.fr • www.hkis-project.com HKIS – Workshop – Brussels 18-19 March 2004