140 likes | 298 Views
Discovery Net. Discovery Net. Yike Guo , John Darlington (Dept. of Computing), John Hassard (Depts. of Physics and Bioengineering) Bob Spence (Dept. of Electrical Engineering) Tony Cass (Department of Biochemistry), S evket Durucan (T. H. Huxley School of Environment)
E N D
Discovery Net Discovery Net • Yike Guo, John Darlington (Dept. of Computing), • John Hassard (Depts. of Physics and Bioengineering) • Bob Spence (Dept. of Electrical Engineering) • Tony Cass (Department of Biochemistry), • Sevket Durucan (T. H. Huxley School of Environment) • Imperial College London
AIM • To design, develop and implement an infrastructure to support real time processing, interaction, integration, visualisation and mining of massive amounts of time critical data generated by high throughput devices.
Industry Connection : 4 Spin-off companies + related companies (AstraZeneca, Pfizer, GSK, Cisco, IBM, HP, Fujitsu, Gene Logic, Applera, Evotec, International Power, Hydro Quebec, BP, British Energy, ….) The Consortium
Hardware : sensors (photodiode arrays, hybrid photodiodes, PMTs), systems (optics, mechanical systems, DSPs, FPGAs) Software (analysis packages, algorithms, data warehousing and mining systems) Intellectual Property: access to IP portfolio suite at no cost Data: raw and processed data from biotechnology, pharmacogenomic, remote sensing (GUSTO installations, satellite data from geo-hazard programmes) and renewable energy data (from our own remote tidal power systems) Industrial Contribution
Distributed Reference DBs Distributed Users Collaborative applications Distributed Devices Distributed warehousing High Throughput Sensing Characteristics • Different Devices but same computational characteristics • Data intensive & • Data dispersive • large scale, • heterogeneous • distributed data • Real-time data manipulation Need to • calibrate • integrate • analyse Discovery issues:Distributed Knowledge Discovery, Management Incremental, Interactive Discovery & Collaborative Discovery Information issues:annotations semantics, reference, integrated view of data Data issues:different measurements for same object: Data registration, normalisation, calibration & quality control GRID issues:wide area, high volume, scalability (data, users), collaboration
DNet Architecture High Throughput Sensing (HTS) Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery Based on Kensington Discovery Platform Grid-based Knowledge Discovery Grid-based Data Mining, Collaborative Visualisation Information Structuring Information Integration & Composition, Semantics & Domain-based Ontologies, Sharing Distributed Data Engineering Data Registration, Data Normalisation, Data Quality Based on Globus & ORB Infrastructure High Throughput Computing Services Utilising Grid Infrastructure for HT Computing Grid Basic Infrastructure Globus/Cordon/SRB
Throughput (GB/s) Size (petabytes) Node Number operations Testbed Applications HTS Applications Large-scale Dynamic Real- time Decision support Large-scale Dynamic System Knowledge Discovery 1-10 1-10 >20000 Structuring Mining Optimisation RT decisions • Renewable energy Applications • Tidal Energy • Connections to other renewable initiatives • (solar, biomass, fuel cells), & to CHP and baseload stations • Remote Sensing Applications • Air Sensing, GUSTO • Geological, geohazard analysis 1-100 10-100 >50000 Image Registration Visualisation Predictive Modelling RT decisions • Bio Chip Applications • Protein-folding chips: SNP chips, Diff. Gene chips using LFII • Protein-based fluorescent micro arrays 1-1000 10-1000 >10000 Data Quality Visualisation Structuring Clustering Distributed Dynamic Knowledge Management
Large-scale urban air sensing applications GUSTO GUSTO Each GUSTO air pollution system produces 1kbit per second, or 1010 bits per year. We expect to increase the number (from the present 2 systems) to over 20,000 over next 3 years, to reach a total of 0.6 petabytes of data within the 3-year ramp-up. The useful information comes from time-resolved correlations among remote stations, and with other environmental data sets. NO simulant 6.7.2001 You are here
Renewables characterised by • large number of small units, • often in remote areas • wireless connectivity • fluctuating,unpredictable loading • As total exceeds 12% grid control • becomes very difficult • without RT e-grid. Electrical grid There is large potential in embedded generation renewable sources – they will dominate in new build (nuclear., hydro and carbon) power stations. Decentralised power is the new paradigm. . • active management, • RT monitoring, • RT control, • minute to minute security, • pan network optimisation. • This requires very high bandwidth • RT remote station data acquisition, • warehousing and analysis.
End devices Floor switches Central Computing Facilities Building Router Switches workstation cluster wireless SMP Core Router Switches storage • Access to disparate off-campus sites: IC hospitals, Wye College etc. Proposed Firewall London MAN/ JANET The IC Advantage The IC infrastructure: microgird for the testbed Over than 12000 end devices 10 Mb/s – 1Gb/s to end devices ICPC Resource 1 Gb/s between floors 150 Gflops Processing 10 Gb/s to backbone >100 GB Memory 10 Gb/s between backbone router matrix and wireless capability 5 TB of disk storage £3m SRIF funding Network upgrade +20 TB of disk storage 2x1Gb/s to LMAN II (10Gb/s scheduled 2004) +25 TB of tape storage 3 Clusters (> 1 Tera Flops)
Particle Physics and Astronomy Research Council (PPARC) • ASTROGRID (http://www.astrogrid.ac.uk/) • a ~£5M project aimed at building a data-grid for UK astronomy, which will form the UK contribution to a global Virtual Observatory
Particle Physics and Astronomy Research Council (PPARC) • GridPP (http://www.gridpp.ac.uk/) • to develop the Grid technologies required to meet the LHC computing challenge • collaboration with international grid developments in Europe and the US
EPSRC Testbeds (1) • MyGrid Personalised extensible environments for data-intensive in silico experiments in biology • Distributed Aircraft Maintenance Environment • RealityGrid closely couple high performance computing, high throughput experiment and visualization
EPSRC Testbeds (2) • GEODISE : Grid Enabled Optimisation and DesIgn Search for Engineering • CombiChem : Combinatorial ChemistryStructure-Property Mapping • Discovery Net : High Throughput Sensing