130 likes | 215 Views
BRIDGES Status Report. Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk 18 th March 2004. Overview. Review goals of Bridges project
E N D
BRIDGESStatus Report Dr Richard Sinnott Technical Director National e-Science Centre ||| Deputy Director Technical Bioinformatics Research Centre University of Glasgow ros@dcs.gla.ac.uk 18th March 2004
Overview • Review goals of Bridges project • Briefly summarise technical approach • Outline achievements thus far • Demonstration • Plans for the future
Bridges Goals • High blood pressure affects 25% of adults in western societies • Cardiovascular Functional Genomics (CFG) project investigating this through physiological models of hypertension in rat • Bridges is a supporting project to CFG and will provide Grid infrastructure to facilitate scientific research • CFG project partners are distributed but need to access and integrate various software and especially data resources • Main aims of BRIDGES are to develop re-useable infrastructure to provide data federation incorporating appropriate security concerns
Edinburgh Public curated Shared data Glasgow data Private data Private data Leicester Private data Oxford Private Netherlands data London Private Private data data CFG Partner Distribution
Problems to be addressed • BRIDGES will address the following problems facing CFG biologists • How to integrate data with multiple levels of security including public data, project only data and private data? • How to search multiple distributed databases through single optimised queries? • How to use multiple tools in a coordinated (and automated) manner, e.g. how to develop re-useable workflows for the CFG scientists? • Integration of a range of bioinformatics analysis and visualisation tools, e.g. BLAST, genome browsers, etc. • How to deal with inconsistencies of online databases and possible “dirty data”? • How to get more “up to date” data? • Make it all user friendly… • portals, • hidden infrastructure, e.g. security authorisation
Planned Approach • BRIDGES will address these problems through • Development of re-useable Grid services based upon GT3 technologies • Virtualisation of multiple distributed data sets to provide a single virtual data set for use by the biologists – exploiting IBM’s DiscoveryLink • Developing a collection of data on a well-managed platform, including copies of extracts of relevant public data, all project data, and the required software tools (administered using DB2 and DiscoveryLink) • Access to and integration of multiple distributed data sets in a Grid environment using results from the OGSA_DAI/DAIT projects • A secure environment offering authentication and authorisation • will build on results of the PERMIS security authorisation project
Bridges team • Project Management • Richard Sinnott • Dave Berry • Database Design/Development • Derek Houghton • Grid Services Developer • Micha Bayer • Magnus Ferrier • Technical Input • David White, Jean-Christophe Mestres, Andy Knox, Emmanuel Guyonnet (IBM), Ela Hunt (Glasgow), Neil Hanlon (Glasgow) • Prof’s David Gilbert, Malcolm Atkinson, Anna Dominiczak,
Achievements • Web site and project portal established • http://europa.nesc.gla.ac.uk/wps/portal • Engaged with CFG consortia • Staff trained in relevant technologies • GT3, DiscoveryLink, Condor • Initial version of local repository developed • Populated with data that cannot be federated • e.g. public data sets with no programmatic interface • Ensembl/EMBL-EBI, NCBI - GENBANK, REFSEQ, Gene Expression Omnibus UCSC, SwissProt/TrEMBL UniSTS/dbSTS UNIGENE LOCUSLINK GENMAPP OMIM Sanger dbSNP dbEST InterPro, Pfam,Prints,Cath, SCOP, ProSite, Weissman Institute PDB Rikken Rat Genome DB, Mouse Atlas, Affymetrix, … • Includes shared data sets of CFG scientists • QTL DB, …
Achievements …ctd • GT3 based Grid services offered that allow to make use of these local data sets • Grid enabled BLAST services produced • Offer access to large e-Science infrastructures at Glasgow (ScotGrid) • SyntenyVista tool extended to allow Grid enabled visual navigation of genomic data sets • Planned front end for many other tools • Externally • Poster at AHM 2003 • Tutorial submitted to ISMB/ECCB (the major bioinformatics conference) • Liaising with other projects • eDIKT, myGrid, GeneGrid, PERMIS, ...
Achievements …ctd • Demonstration of some of the achievements
Plans • Refine/extend and requirements • Further refinement of use cases & scenarios • More data sets (public, shared, private, …) • Implementation and realisation of further use cases • e.g. extended query services for microarray data interpretation, workflows for probe set mapping, … • Security realisation and roll-out • We can only help share CFG data sets if we can get SECURE access to them – following up with CFG sites • Authorisation with PERMIS coming • GSI based authentication • Investigate application of replication manager (RLS) • Should support illusion of data from each site being available to all other sites • Further Grid based data visualisation services accessible via SyntenyVista • Ensure that keep track of relevant developments (WSRF, GT4, …)
Future Vision of Tools via Portal DRILL-DOWN FUNCTIONS To tabular summaries To multiple alignment To sequence