180 likes | 325 Views
Synergies among Grid, Peer-to-Peer and Cloud Computing (Towards e-Science Communities) Luís Antunes Veiga luis.veiga@inesc-id.pt Distributed Systems Group INESC-ID Lisboa / Instituto Superior Técnico. ...Why...: e-Science. Most science is becoming e-Science large data repositories
E N D
Synergies among Grid, Peer-to-Peer and Cloud Computing (Towards e-Science Communities) Luís Antunes Veiga luis.veiga@inesc-id.pt Distributed Systems Group INESC-ID Lisboa / Instituto Superior Técnico
...Why...: e-Science • Most science is becoming e-Science • large data repositories • growing every day • processed in myriads of ways • powerful applications • computational intensive • increasing demand for resources
...Why...: e-Science Communities • Researchers form natural communities • they tend to gather around... • research areas • tools, instruments, applications • data repositories used • affiliation, geography • projects, consortia • special kind of “social” network
...Why...: Synergies... • Leverage globally available computing resources • harness resources of whatever shape or source • e.g., Clusters, Grids, multiprocessors • P2P voluntary cycle-sharing, Desktop Grids • Utility and Cloud Computing • provide uniform and easy-to-use interface to resources • data storage, sharing, transfer • resource allocation and scheduling • work/task distribution • most e-scientists will not be programmers (no-code)!
...Where...:e-Science At Large... • E-Science examples... • video coding • video and image processing • raytracing, high-res rendering • face recognition in pictures/movies • mollecular modeling • chemical reaction simulation
...e-Science At Large... • ...E-Science examples • network protocol simulation • financial investments • stock exchange • derivatives • statistical • numerical methods, data processing • language/speech processing
...e-Science At Large • What is common to all these e-Science activities? • large amounts of data • complex methods/algorithms • long processing times, resource intensive • no software development /classical programming • languages, API, sockets, synchronization, MPI, etc. • use mostly pre-developed/deployed applications • scripting, customization, configuration • possible intrincate and very advanced • comprises large numbers of parallelizable tasks • most can be made completely independent
Synergy Vision Resources from P2P, Grid, Utility Computing Deployed Tasks Job Result
...What:...Synergy • Application Execution Model • Gridlet concept: intuitive, simple to use, data-centered • suited to most applications used by researchers • Resource Sharing Architecture • leverage mostly any computing and storage provider • a P2P-based Cloud encompassing Clusters, Grids, PCs • Community Support • social network integration (facebook,hi5) • deployable via BOINC (SETI@home)
...How:...Gridlet • Gridlet • uniform basis of workload division, computation off-load • chunk of data with associated operations to be performed • parameters, scripts, configuration files, ... • cost estimate: G$: (CPU, Bandwidth, Memory, Disk) • jobs are gridlets sent to applications • allow adaptation of unmodified applications • operation/data transformation via XML policies • intuitive approach to • data-partitions, task-spawning, resource management
...How:...Infrastructure • Synergy Infrastructure • Extendable peer-to-peer architecture • harness cycles of desktops, clusters, utility-computing • gathers asymmetric participants, different capabilities • Hybrid structured/unstructured overlay • structured: data repository, caching, results, indexes • unstructured: execution scheduled on any node • Hierarchical overlay • super-peers aggregate information of neighbors • resources, applications, reputation, cached data, ...
Synergy Infrastructure • cloud on overlay/mesh • oceans of gridlets • flow across the overlay • lifecycle • cost estimate • G$ = (CPU, BW)
...How:...Community Support • e-Science Infrastructure driven by Communities. • Social network integration • facebook,hi5, widgets on web pages • execute code on idle computers of “friends” • discover similar interests • e.g., tools, applications • Community-driven portals • data sets, benchmark data, results • algorithm, topology, process descriptions • ask/donate storage and CPU • code deployable via BOINC (SETI@home)
...What For:...Current and Next Activities • Application Scenarios • Video Transcoding • Network Topology/Protocol Simulation • Raytracing for 3D rendering • Face Dectection on Film Archives (e.g., Cinemateca) • Synergy VM for transactional-memory applications • Execution Infrastructures (combined) • P2P cycle-sharing, volunteer computing • Clustered Virtual Machines (e.g., Java, .Net) • Grids, Utility Computing Infrastructures
...What For:...Video Transcoding (1) • file splitting • semantics-aware data-partitioning • append/prepend gridlet-data • complete frames • movie header information • keep full (intra) & predicted frames • XML description: • format • headers • boundaries • constraints • transformations
...What For:...Video Transcoding (2) • gather available gridlet-results • sent by servicing peers • extract result data & discard headers • reassemble file according to semantics • new header • ordering • constraints • transformations • special cases: • discard gridlets • crypto-challenge
...What For:...Network Simulation COGITARE addresses: • limits on size & complexity of simulations • inefficient resource utilization (e.g., multi-core) • no agnostic topology description languages • no repository for research result interchange • absence of teaching a platform
Conclusion • e-Science is becoming dominant • increasing demand for computing resources • harness resources from various sources (P2P,Grid,Cloud) • minority of computer researchers and programmers • intuitive application and resource model • manage activities around communities • Future Work • assessment of financial derivative products • chemical reaction and process simulation • Thank you: Questions? www.gsd.inesc-id.pt/~lveiga