250 likes | 422 Views
E-Research Infrastructure?. Markus.Buchhorn@anu.edu.au Head, ANU Internet Futures; Grid Services Coordinator, GrangeNet ; Leader, APAC Information Infrastructure Program; (PhD Mt Stromlo 1988-1992). A gentle (and fast!) overview. Themes: What does e-Research mean?
E N D
E-Research Infrastructure? Markus.Buchhorn@anu.edu.au Head, ANU Internet Futures; Grid Services Coordinator, GrangeNet; Leader, APAC Information Infrastructure Program; (PhD Mt Stromlo 1988-1992)
A gentle (and fast!) overview • Themes: • What does e-Research mean? • What kind of infrastructure is involved? • How is it being developed? • What are the problems?
e-Research + infrastructure • The use of IT to enhance research • and education! • Access resources transparently • Make data readily available • Make collaboration easier • Is it The Grid ? • No, and yes – the Grid is a tool in the kit • Who funds it? The Govt – when building for a large community • NCRIS (SII+MNRF), ARC, eResearch-CoordC’tee
ANU Internet Futures • A cross-discipline, cross-campus “applied” research group • e-Research infrastructure development • Objectives: • To investigate and deploy advanced Internet-based technologies that support university research and education missions. • Bring research-edge technologies into production use • Engage with APAC, GrangeNet, ARIIC/SII, …, Internet2, APAN, TERENA, … • A strong focus on User Communities • Identify common requirements
What does “Grid” mean? • Analogy with the power grid • A standard service (AC, 240V, 50Hz) • A standard connection • A standard user interface • Users do not care about • Various generation schemes • Deregulated market • Power auctions • Synchronised generators • Transmission switching, fail-over systems • Accounting and Billing
What does “Grid” mean in IT? • Transparent use of resources • Distributed, and networked • Multiple “administrative domains” • Other people’s resources become available to you • Various IT resources • Computing, Data, Visualisation, Collaboration, etc. • Hide complexity • It should be a “black box”, one just plugs in.
What are the bits in eRI? Applications and Users… Grid, Middleware Services Layer (Advanced) Communications Services Layer Network Layer (Physical and Transmission)
What’s in that middle bit? Applications and Users… Computing Data Middle-ware Collaboration Instruments Visualisation (Advanced) Communications Services Layer
Networks • Physical networks are fundamental to link researchers, observational facilities, IT facilities • Demand for high-(and flexible) bandwidth to every astronomical site • Universities, observatories, other research sites/groups • GrangeNet, AARNet3, AREN, … Big city focus • Today remote sites have wet bits of string, and station wagons • At least 1-10Gigabit links soon-ish (SSO, ATCA, Parkes, MSO). • Getting 10-20Gigabits internationally right now, • including to the top of Mauna Kea in the next year or so • Canada, US, NL, … are building/running some 40+Gb/s today • e-VLBI, larger detectors, remote control, multi-site collaboration, real-time data analysis/comparisons, … • Burst needs, as well as sustained. • Wavelength Division Multiplexing (WDM) allows for a lot more bandwidth (80λ at 80Gb/s)
Common Needs - Middleware • Functionality needed by all the eRI areas • Minimise replication of services • Provide a standard set of interfaces • To applications/users • To network layer • To grid services • Can be built independently of other areas • A lot of politics, policy issues enter here
Common Needs - Middleware - 2 • Authentication • Something you have, something you know • Somebody vouches for you • Certificate Authorities, Shibboleth, … • Authorisation • Granularity of permission (resolution, slices, …) • Limits of permission (time, cycles, storage, …) • Accounting • Billing, feedback to authorisation *Collectively called AAA
Common Needs - Middleware - 3 • Security • Encryption, PKI, … • AAA, Non-repudiation • Firewalls and protocol hurdles (NATs, proxies,…) • Resource discovery • Finding stuff on the Net • Search engines, portals, registries, p2p mesh, … • Capability negotiation • Can you do what I want, when I want • Network and application signalling • Tell the network what services we need (QoS, RSVP, MPLS, …) • Tell the application what the situation is • And listen for feedback and deal with it.
The Computational Grid • Presume Middleware issues are solved… • Probably the main Grid activity • Architectural Issues • CPUs, endian-ness, executable format, libraries; non-uniform networking; Clusters vs SMP, NUMA, …; • Code design • Master/Slave, P2P; Granularity (Fine-grained parallelism vs (coarse) parameter sweep) • Scheduling • Multiple owners; Queuing systems; Economics (How to select computational resources, and prioritise) • During execution • Job Monitoring and Steering; Access to resources (Code, data, storage, …) • But if we solve all these: • Seamless access to computing resources across the planet. • Harness the power of supercomputers, large->small clusters, and corporate/campus desktops (Campus-Grid)
Computing facilities • University computing facilities, within departments or centrally. • Standout facilities. • The APAC partnership (www.apac.edu.au) • Qld: QPSF partnership, several facilities around UQ, GU, QUT • NSW: ac3 (at ATP Everleigh) • ACT: ANU - APAC peak facility, upgraded in 2005 (top 30 in the world) • Vic: VPAC (RMIT) • SA: SAPAC (U.Adelaide?) • WA: IVEC (UWA) • Tas: TPAC (U.Tas) • Other very noteworthy facilities, such as Swinburne's impressive clusters. There are bound to be others, and more are planned.
Data Grids • Large-scale, distributed, “federated” data repositories • Making complex data available • Scholarly output and scholarly input: • Observations, simulations, algorithms, … • to applications and other grid services • in the “most efficient” way • Performance, cost, … • in the “most appropriate” way • within the same middleware AAA framework • in a sustainable and trustworthy way
Authenticate, Authorise Computing Queries/Results, Curation User Visualisation Collaboration ACCESS! and account Rep. Rep. Hardware, Software A set of Repositories, sharing a purpose or a theme Metadata(Ontologies, Semantics, DRM, …) Rep. Content Archive Data Grid 101 Directories: AAA, Capabilities Workflows, DRM,… Content Archive Interface Presentation
Data Grid Issues • Every arrow is a protocol, Every interface is a standard • Storage: hardware, software; file format standards, algorithms • Describing data: metadata, external orthographies, dictionaries • Caching/replication: Instances (non-identical), identifiers, derivatives • Resource discovery: Harvesting, registries, portals • Access: security, rights-management (DRM), anonymity; authsn. granularity • Performance: delivery in appropriate form and size, ; user-meaningful user interface (Rendering/presentation – by location and culture) • Standards, and the excess thereof • Social engineering: Putting data online is • An effort – needs to be easier, obvious • A requirement! – but not enforced; lacks processes • Not recognised nor rewarded • PAPER publishing is!
Data facilities • In most cases these are inside departments, or maybe central services on a university. • ANU/APAC host a major storage facility (tape robot) in Canberra that is available for the R&E community to make use of • Currently 1.2Petabytes peak, and connected to GrangeNet and AARNet3. • It hosts the MSO MACHO-et-al data set at the moment, and more is to come. • To be upgraded every 2 years or so – factor of 2-5 in capacity each time • If funding is found, each time. Needs community input. • Doesn’t suit everyone (yet) • Mirror/collaborating facilities in other cities in AU and overseas being discussed • Integration with local facilities • VO initiatives – all data from all observatories and computers… • Govt initiatives under ARIIC – APSR, ARROW, MAMS, ADT
Collaboration and Visualisation • A lot of intersection between the two • Beyond videoconferencing - telepresence • Sharing not just your presence, but also your research • Examples: Multiple sites of • Large-scale data visualisation, computational steering, engineering and manufacturing design, bio-molecular modelling and visualisation, Education and training • What’s the user interface? • Guided tour vs independent observation • Capability negotiation, local or remote rendering • (Arbitrary) application sharing • Tele-collaboration (Co-laboratories) • Revolve around the Access Grid • www.accessgrid.org
Access Grid “Nodes” • A collection of interactive, multimedia centres that support collaborative work • distributed large-scale meetings, sessions, seminars, lectures, tutorials and training. • High-end, large-scale “tele-collaboration” facilities • Or can run on a single laptop/PDA • Videoconferencing dramatically improved • But not the price • Much better support for • multi-site, multi-camera, multi-application interaction • Flexible, open design • Over 400 in operation around the world • 30+ in operation, design or construction in Australia • 4+ at ANU
AccessGrid facilities • University hosted nodes are generally available for researchers from any area to use, • you just need to make friends with their hosts. • Qld: JCU-Townsville, CQU-several cities, UQ, QUT, CQU, SQU, GU (Nathan, GoldCoast) • NSW: USyd, UNSW(desktop), UTS • ACT: ANU (4+, one at Mt Stromlo. SSO has been suggested) • Vic: UMelb (soon), Monash-Caulfield, VPAC (by RMIT), Swinburne (desktop), U.Ballarat (desktop) • SA: U.Adelaide (1 desktop and 1 room), Flinders (soon), UniSA (planning) • WA: UWA (IVEC) • Tas: UTas (soon) • NT: I wish! • Another 400+ around the world.Development by many groups, Australia has some leadership • Accessgrid-l@grangenet.net
Visualisation Facilities • Active visualisation research community in Australia • OzViz'04 at QUT 6-7 Dec 2004. • Major nodes with hard facilities include • ANU-VizLab, • Sydney-VisLab, • UQ/QPSF-VisLab, • IVEC-WA, • I-cubed (RMIT), • Swinburne, • etc.
Online Instruments • Remote, collaborative access to unique / scarce instruments: • Telescopes, Microscopes, Particle accelerators, Robots, Sensor arrays • Need to interface with other eRI services • Computation – analysis of data • Data – for storage, comparison • Visualisation – for human analysis • Collaboration – to share the facility
So, in summary: • Transparent use of various IT resources • Research and education processes • Make existing ones easier and better • Allow new processes to be developed • Are we there yet? • Not even close!! • But development in many areas is promising • In some situations, the problems are not technical but political/social • Some of the results already are very useful • Astronomy needs to help the processes, to help Astronomy!