410 likes | 517 Views
Building Grids: If Everybody Else Is Doing It, Why Shouldn’t You?. Jay Boisseau, Texas Advanced Computing Center SURA Grid Application Planning & Implementation Workshop December 6-8, 2005. Outline. Welcome! Overview of TACC (with Grid Computing Context) Some Perspectives on Grid Computing
E N D
Building Grids: If Everybody Else Is Doing It, Why Shouldn’t You? Jay Boisseau, Texas Advanced Computing Center SURA Grid ApplicationPlanning & Implementation Workshop December 6-8, 2005
Outline • Welcome! • Overview of TACC (with Grid Computing Context) • Some Perspectives on Grid Computing • Closing Thoughts • More
TACC Mission To enhance knowledge discovery & education and to improve society through the application of advanced computing technologies.
TACC Strategic Approach To accomplish this mission, TACC: • Evaluates, acquires & operates advanced computing systemsand software • Provides documentation, consulting, and training to users ofadvanced computing resources • Conducts R&D to produce new computational technologies & techniques that enhance advanced computing systems • Collaborates with users to apply advanced computingtechniques in their research, develop, occupations, etc. • Educates the community to broaden and deepen the pipelineof talented persons choosing careers in advanced computing • Informs society about the value of advanced computingtechnologies in improving knowledge and quality of life Resources& Services Research & Development PR & EOT
TACC Advanced ComputingTechnology Areas • High Performance Computing (HPC) • Visualization & Data Analysis (VDA) • Data & Information Systems (DIS) • Distributed & Grid Computing (DGC)
TACC Advanced ComputingTechnology Areas • High Performance Computing (HPC) • Visualization & Data Analysis (VDA) • Data & Information Systems (DIS) • Distributed & Grid Computing (DGC) • newest area of R&D, resources, services at TACC • “tying it all together”
TACC Advanced ComputingApplications Focus Areas • Computational Geosciences • World-class expertise, programs at UT Austin • Strategic to state of Texas • Computational Life Sciences • Broad & deep expertise in Texas higher ed institutions • Important to society • Emergency Situation Assessment & Response • Crucial to life, property • Leverages TACC expertise, resources, and applications
TACC Advanced ComputingApplications Focus Areas • Computational Geosciences • World-class expertise, programs at UT Austin • Strategic to state of Texas • Computational Life Sciences • Broad & deep expertise in Texas higher ed institutions • Important to society • Emergency Situation Assessment & Response • Crucial to life, property • Leverages TACC expertise, resources, and applications • Each has need for resources sharing & coordination, workflow, data/instrument integration: grid computing
TACC HPC & Storage Systems LONESTAR LONGHORN WRANGLER Cray-Dell Xeon Linux Cluster1028 CPUs (6.3 Tflops) 1+ TB memory, 40+ TB disk IBM Power4 System 224 CPUs (1.16 Tflops) ½ TB memory, 7.1 TB disk Dell Xeon EM64T Linux Cluster 656 CPUs (4.2 Tflops) 1.3 TB memory, ~4 TB disk STAMPEDE ARCHIVE GLOBAL DISK Mac Xserve G5 Cluster 46 CPUs (368 Gflops) 52GB memory, 3.7TB disk STK PowderHorns (2) 2.8 PB max capacity managed by Cray DMF Sun SANs andData Direct Disk > 50TB
ACES VisLab • Front and Rear Projection Systems • 3x1 semi-cylinder immersive environment, 24’ diameter • 5x2 large-screen, 16:9 panel tiled display • Matrix switch between systems, projectors, rooms • Full immersive capabilities with head/motion tracking
TACC Advanced Visualization Systems • Sun Terascale Visualization System • 128 UltraSparc 4 cores, ½ TB memory • 16 commodity graphics cards, > 3 Gpoly/sec • Remote to VisLab; very remote to TeraGrid! • SGI Onyx2 • 24 CPUs, 6 Infinite Reality 2 Graphics Pipes • 25 GB Memory, 356 GB Disk
TACC Network Connectivity • Intercampus bandwidth • Force10 switch/routers with 1.2 Tbps backplane in TACC machine room and ACES building • 10 Gbps between TACC machine room and ACES provided by Nortel DWDM (waiting for 10GigE cards) • WAN network upgrades: • UT Internet2 at OC-12 • TeraGrid connection at 10 Gbps • New Lonestar Education And Research Network (LEARN) being built for Texas universities • Texas Joining National Lambda Rail (10 Gbps waves) • High bandwidth networks (local and national) to facilitate resource sharing, coordination, data flow…
TACC R&D – Distributed & Grid Computing • Web-based grid portals • GridPort, TeraGrid User Portal, SURA portal, TIGRE portal • Grid resource data collection & information services • GPIR • Overall grid deployment and integration • UT Grid, TeraGrid, TIGRE, OSG, SURA • Grid scheduling and workflow tools • GridShell, MyCluster, Metascheduling Prediction Services • Remote and collaborative grid-enabled visualization • For TeraGrid, UT Grid • Network performance for moving terascale data
TACC Activities & Scope Since 1986
TACC Activities & Scope Since 2001 Since 1986
TACC Activities & Scope UT Grid, TIGRE,TeraGrid, OSG, SURAgrid, GridPort,GridShell,etc. Since 2001 Since 1986
Summary • TACC has grown into a leading center since June 01 • 4x of staff, 6x external funding • 100x compute power • New R&D in HPC, Vis, Data, and especially Grid Computing • New EOT, international, industrial partners programs
Summary • TACC has grown into a leading center since June 01 • 4x of staff, 6x external funding • 100x compute power • New R&D in HPC, Vis, Data, and especially Grid Computing • New EOT, international, industrial partners programs • Grid computing projects have played a major role in TACC’s growth and success so far • Leadership in software including GridPort, GridShell, MyCluster, Metascheduling Prediction Services • Partnership in grids at campus, state, regional, national, and international scales
Researchers Already Use Distributed Computing: Case Is Already Made! • Researchers already use distributed systems: • Local workstations for some development, small simulations • HPC at big centers • Visualization back in their lab or in a Vislab • Archival storage to SANs, NASes, tape silos, etc. • Researchers already collaborate with peers at other institutions • science is collaborative! • Grids should enable resource sharing, collaboration, etc. with • Greater ease • More flexibility • More capability
Or in English… “There are talented people everywhere in the world focused on solving the most challenging problems, and there are companies everywhere determined to provide the best products as efficiently as possible… people WILL collaborate and learn to share resources, as well as ideas and data, in order to ‘be first’ … people have been using distributed resources for decades, and this is only increasing… Grid computing to me is the subset of distributed computing that makes it easier… So, ‘Grid computing’ is here today and will remain important, by whatever name you want to call it.” -- me in GRIDtoday 12/05/05
Grid Computing: My View • Grid computing is a standard, ‘complete’ set of distributed computing software capabilities • Grid computing must provide some basic functions • resource discovery and information collection & publishing • data management on and between resources • process management on an between resources • common security mechanism underlying the above • No grid computing package provides everything • Example: ‘Open Grid Services Architecture’ (OGSA) (e.g., as implemented in Globus v4) makes it possible to build the components and make them work together
Grid Computing: My View • TACC focuses on Grid computing to • enhance our HPC, SciVis, and massive data storage • integrate researchers’ local computing systems with ours • eventually, integrate research instruments for research that also requires HPC, SciVis, massive data storage
So TACC Drank The Grid Kool-Aid • What grids are we participating in? • UT Grid: campus-scale • TIGRE: state • SURA Grid: regional • TeraGrid: national • Open Science Grid: international • And we’re building grid tools to provide capabilities for/in these grids • Why are we participating in these grids? Some examples will answer that question….
UT Grid: Enable Campus-Wide Terascale Distributed Computing • Why Build It? To move from ‘island’ of high-end resources to ‘hub’ of campus computing continuum • provide models for local resources (clusters, vislabs, etc.), training, and documentation • develop procedures for integrating local systems to UT Grid • single sign-on, data space, compute space • leverage every PC, cluster, NAS, etc. on campus! • integrate digital assets into UT Grid • integrate UT instruments & sensors into UT Grid • provide user portals and login nodes to access and use all campus resources!
UT Grid: Resources Distributed Across Two Campuses Ext nets Research campus NOC GAATN CMS NOC Switch TACC Storage PGE TACC PWR4 ACES TACC Cluster Switch ICES Cluster PGE Cluster Switch TACC Vis ICES Data PGE Cluster PGE Data ICES Cluster Main campus
UT Grid Status • First 20 Months: • Deployed production United Devices ‘grid,’ (Roundup) • Deployedproduction Condor pool, integrated with other pools (Rodeo) • Developed GridPort v4, GridShell v1 • Building user portal, downloadable client software stack • More to come… (see tomorrow’s talk)
TIGRE: Texas Internet Grid forResearch & Education • Why Build It?: Help Texas universities &medical centers work together to shareresources and advance Texas research,education, economy • 2 year project, $2.5M • But took 2+ years to get funding! • 5 funded participants • Rice University • Texas Tech University • Texas A&M • University of Houston • University of Texas
TIGRE: Texas Internet Grid forResearch & Education • Develop, document, and deploy a grid across the 5 participants • Supporting driving applications • Enable other LEARN members to join TIGRE • Package grid software so that others can easily install it • Provide good documentation • Ensure that it’s easy, lightweight • Make it modular: enable institutions to provide just what they can offer • NOTE: Companion project (LEARN) will provide a high-bandwidth network for use by TIGRE and other Texas institutions
YEAR 1 Q1 Project plan Web site Certificate Authority Minimum testbed requirements Select 3 driving applications Q2 Alpha portal Q3 Define software stack Distribution Mechanism Simple demo of 1 TIGRE app Q4 Alpha client software package distributed YEAR 2 Q1 Alpha customer management services system deployed & demonstrated Q2 Global grid scheduler deployed Q3 Stable software available (only bug fixes after this) Services required to be part of TIGRE specified Q4 Complete hardening of software Complete documentation Finalized procedures and policies to join TIGRE & document Demonstrate TIGRE at SC TIGRE Deliverables: Quick Build!
NSF TeraGrid: National Cyberinfrastructure for Computational Science • Why Build It? Provide terascale computational capabilities that go beyond just HPC to facilitate 21st century research! • Includes NCSA, SDSC, PSC, Indiana, Purdue, Argonne, and Oak Ridge • Anointed as NSF production cyberinfrastructure for 5 years • - TACC is providing terascale computing, storage, and visualization resources • - UT is providing terascale geosciences data sets
So Should You Or Shouldn’t You? • Grid computing is here to stay, by one name or another… • The possibilities are too great • The needs are too great • But it’s not always needed • Simple solutions, powerful tools, sharp minds get answers • Can maximize collaboration, but can also inhibit people from working on the real problem • Get user requirements and THINK! • What is needed? • What is overkill? • Use mature technologies unless doing grid R&D • Use the minimum subset to meet requirements, build on successes incrementally
To Build Useful Grids, Software Must Be: • Easier • No more difficult than CLIs for ‘power users’ • No more difficult than the Web/PC apps for the other 99% of (potential) users (portals, desktop apps, etc.) • No more difficult than configuring office network for admins • Smarter • Smart scheduling, data transfers, workflow • Built-in help/advice, like PC apps and portals
To Build Useful Grids, Software Must Be: • More robust • Must not break more than the individual resources • Opportunity is to break less than any individual resource (but only partially successful so far) • And standards-based & interoperable • Web services, etc. • So lots of opportunities for us geeks! • But let’s not lose sight of the forest for the trees!
Finally, Enjoy Your Time HereWhile You Learn • Austin is Fun, Cool, Weird, & Wonderful • Mix of hippies, slackers, academics, geeks, politicos, musicians, filmmakers, artists, and even a few cowboys • “Keep Austin Weird” is the official slogan • Live Music Capital of the World (seriously) • Also great restaurants, cafes, clubs, bars, theaters, galleries, museums, etc. • http://www.austinchronicle.com/ • http://www.austin360.com/xl/content/xl/index.html • http://www.research.ibm.com/arl/austin/index.html (!)
Your Austin To-Do List • Eat barbecue at Rudy’s, Stubb’s, Iron Works, Green Mesquite, etc. • Eat Tex-Mex at Chuy’s, Trudy’s, Maudie’s, etc. • Have a cold Shiner Bock (but not Lone Star) • Visit 6th Street and Warehouse District at night • Go to at least one live music show • Learn to two-step at The Broken Spoke • Visit the Texas State History Museum • Walk/jog/bike around Town Lake • Visit the UT main campus and the ACES VisLab • See a movie at Alamo Drafthouse Cinema (arrive early, order beer & food) • Eat Amy’s Ice Cream • Listen to and buy local music at Waterloo Records • Buy a bottle each of Rudy’s Barbecue ‘Sause’ and Tito’s Vodka • Drive into the Hill Country, visit small towns and wineries • See sketch comedy at Esther’s Follies • See a million bats emerge from Congress Ave. bridge at sunset