450 likes | 567 Views
Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min). Anurag Shankar TeraGrid Science Gateways Team Indiana University TeraGrid 2007 Madison, WI. This unit will try to answer the following questions:. What is a science gateway?
E N D
Building Blocks for a Simple TeraGrid Science Gateway: Issues to Consider in Development (40min) Anurag Shankar TeraGrid Science Gateways Team Indiana University TeraGrid 2007 Madison, WI
This unit will try to answer the following questions: • What is a science gateway? • What questions to ask before building one? • What problems scientists face when computing? • What can gateways do to help? • What technologies can be used? • How to ensure that the gateway will be used? • When is using the TeraGrid appropriate? • What resources do I need to build one? TeraGrid 2007
What do you really mean by a science gateway? • A (web-based) GUI that allows a scientist to do some sort of computation by clicking buttons. • The computation requires resource(s) at the back end to carry it out, e.g. storage, CPU cycles, databases, etc. • These resources could be modest - perhaps just a PC, or significant - a compute cluster or a grid. • Pardon the subsequent, implicit CPU-cycle-centricity. The gateway could as well be a data repository, etc. TeraGrid 2007
What is a TeraGrid science gateway? • A web interface • with science users in the front and TeraGrid services in back (a traditional TG SGW). • that bridges an existing non-TeraGrid science grid and the TeraGrid (a grid-bridging SGW). • that allows applications running on a user’s desktop to access TeraGrid services (a personal TG SGW). Will TeraGrid build a science gateway for me? • Nope. But we will gladly help you build one. TeraGrid 2007
Why call them gateways and not portals? • We could, but distinguish here for the sake of clarity. • We will use the word “portal” generally, to refer to an entry point, a URL on the web. could be an aggregation point for information, services, or tools, a means to allow ubiquitous access or the ability to customize, etc. • We define a “science gateway” as a portal designed specifically for (or by) a specific science community. TeraGrid 2007
Ok, so I think I want to build a science gateway. Before we even start, what are the crucial questions to ask? What is it that we are trying to do here? • Is it to lessen the pain? For who? • Is it to build something because it uses cool technology users will love? • Is it to get the damn thing done so we can write that quarterly report? • etc. … TeraGrid 2007
Questions to ask … 1. Will the gateway add value for the user? • For example, a command line user can perform every task that a gateway can, often with far more control and without the obfuscation layer a gateway adds. • You will wrest the command line from these users only from their dead fingers. • A gateway must add serious value to be successful here. TeraGrid 2007
Questions to ask … 1.1. Precisely how will the gateway add value? (aka why would the user want to use my gateway?) • Will it solve an existing problem? • Will it add new functionality? • Will it save user time? • etc. 1.2. How am I going to find out? TeraGrid 2007
Questions to ask … 2. If yes to (1), what technologies can and should be used? • 2.1. How? 3. Cost? 4. Validation (that I accomplished what I set out to do)? How? TeraGrid 2007
1. What problems can gateways solve? • What are the common problems facing scientific users? • Increasing complexity of ITs. • No time to master increasingly complex ITs. • Repetitive tasks waste a lot of time. • No simple workflow tools. • No easy to use “toolboxes” for frequent tasks. • An alien HPC culture for many new entrants. • Command line interface too distasteful (for many). • No native clients to do useful things. TeraGrid 2007
1. Problems … • No GUI for tasks that are done a lot easier graphically. • Frequent reinvention of the wheel (redundancy of effort). • (Insert your favorite here). All problems can be reduced to Not being able to have the data I need delivered, here, now. TeraGrid 2007
1. Problems … • Ok, so which of these problems can gateways solve/address? What can they add? • Save users from back-end complexity. • especially those that do not speak HPC • Provide a simple interface to many tasks. • Save user time by providing tools for repetitive tasks. • Provide standard tools for a discipline/group of users. • Provide a GUI when/where appropriate. • Provide statefulness, persistence, historical data, etc. • Allow ubiquitous access. TeraGrid 2007
1. Problems … • Is a science gateway always the right approach? • No. • For example, a PI with a small research group, all involved in extensive code modification, development, and/or testing is unlikely to benefit from a gateway. • Gateways are best used when a large group of users (community) make use of the same computational tools. • Fields using common data formats (astronomy, climate modeling, etc.) also lend themselves to gateway-ing. TeraGrid 2007
2. What technologies can I use? • Common off the shelf (COTS) • Usually PHP/MySQL, perl, ruby, python based. • Very popular, open source, portal building toolkits such as Mambo, Joomla, Drupal, e107, PHP-Nuke, etc. • Also new “web operating systems (WebOS)” like eyeOS. • Standards based • Portlets (JSR 168). • Globus (Globus Toolkit 4, COG kit). • Web services (WSDL, WSRF, WSRP, etc.). • Globus web services (WS-MDS, WS-GRAM). • Grid services (OGSA). TeraGrid 2007
2. COTS technologies … TeraGrid 2007
2. COTS technologies … When resources are limited (time, people, expertise) and/or when the project has modest needs. • When are COTS technologies appropriate? • When the portal needs to be built yesterday. • When the portal needs to be built yesterday and there is exactly one undergrad to do it. • When the undergrad has just taken his first programming class. TeraGrid 2007
2. Standards based technologies • What are all these terms and acronyms? Portlets? JSR 168/286? WSRP? WSRF? COG? OGSA? • COG kit = COmmunity Grids kit • JSR = Java Standard Request • OGSA = Open Grid Services Architecture • WS-GRAM = Web Services - Grid Resource Allocation Manager • WS-MDS = Web Services - MetaData Service • WSRF = Web Services Reference Framework • WSRP = Web Services for Remote Portlets TeraGrid 2007
2. Standards based … • The acronym maze alone will give you a headache, even on a good day. • Let’s try an evolutionary approach to see if helps. • For good bedtime reading, check out my “Portals 101” document, created in desperation: http://www.gridsphere.org/gridsphere/gridsphere/html/docsTab/r/ TeraGrid 2007
2. Evolution of portal technologies … (2003) Portlets WSRP Services based Servlets Dynamic ? (1997) Java applets (1995) Static Javascript (1995) Web Services (late 1980s) PHP (1994) Stateful Web Services CGI (1993) HTML Time Prehistory TeraGrid 2007
2. Evolution of grid technologies … Open Grid Services Architecture (2005) Java COG kit GT 3.0 (2003) (1997) (API for Globus) Grid Services GT 2.0 (2002) (2003) Open Grid Services Infrastructure Web Services GT 1.0 (1997) ? (1997) Global Grid Forum (2000) (1997) Globus (Grid middleware) Prehistory (Distributed Computing) Time TeraGrid 2007
2. Evolution of standards … • Web: HTML CSS XHTML XML (W3C) • Modular web: Servlets Portlets (JCP/Sun) • SOA: WS WSDL WS-x, WSRF (OASIS) • Portlets: JSR 168 JSR 286 (JCP/Sun) • Grid: Globus OGSI OGSA (GGF/OGF) • JCP = Java Community Process (creates Java Standard Requests or JSRs) • W3C = World Wide Web Consortium • SOA = Services Oriented Architecture • WS-x = Various web services standards or in process to be standards (maybe), such as WS-Notification, WS-Security, etc. TeraGrid 2007
2. Problem with evolution … Evolution according to creationists TeraGrid 2007
2. Evolution … TeraGrid 2007
2. Evolution … Man’s Evolution from the Prehistoric to Post Fast Food Is it or it is not evolution? Depends on who you ask. TeraGrid 2007
2. Portlets • Standardized Java components (special servlets) that can be put together quickly to create a complete portal page. • Plug and play. Transportable. • Generate fragments of markup. • Follow the JSR 168 standard. • JSR 168 defines • How to bundle portlets • How the portlet lifecycle is managed TeraGrid 2007
2. Portlets … • Run inside a “portlet container”. Two popular JSR 168 compliant containers are • Gridsphere • Apache Pluto • The portlet container runs inside a “servlet container”. The most popular container is • Apache Tomcat • The servlet container may work with a webserver such as Apache httpd. TeraGrid 2007
2. Portlets & the grid • What is the connection between portlets and the grid? • None. Portlets are merely generic components. • Some portlets (grid portlets) might perform grid tasks. • What about Gridsphere? It has the word grid in it. • Nope. It is simply a strategic name chosen by the Gridsphere developers. • Gridsphere is a generic, JSR 168 compliant portlet container. • It can thus run JSR 168 compliant (or not) portlets that do some grid task(s). TeraGrid 2007
2. Practical (standards-based) tools • Enough! I have a headache already. Tell me something I can actually use with TeraGrid. • COG kits • Open Grid Computing Environment (OGCE) • (Gridsphere) GridPortlets • Clarens is a web services approach to the grid • IN-VIGO virtualizes the grid • Application Hosting Environment (AHE) runs unmodified apps on the grid TeraGrid 2007
2. Globus API • Java community grids toolkit (COG kit) • An abstraction layer (via a Java API) that hides the underlying middleware (Globus toolkit/different toolkit versions - GT2/GT4). • Provides command line tools as well. • Also Python COG kit. http://wiki.cogkit.org/ TeraGrid 2007
2. Portal Creation Enviroments • Open Grid Computing Environment (OGCE) • A complete Java environment that allows you to develop JSR 168 portlets, Gridsphere included. • Uses the COG kit. • Provides a number of bundled portlets • Job submission and monitoring • File transfer • Collaboration tools, etc. • Current version: 2.0.4. http://www.collab-ogce.org/ TeraGrid 2007
2. Portal creation … • GridPortlets • GridPortlets is the name of the package. The package includes grid portlets, but note the difference. • A specific, JSR 168 compliant Java implementation. • Runs under Gridsphere (not included). • Uses the COG kit but provides an abstraction layer (API) on top of the COG kit. • Uses (depends on) Gridsphere’s simple API for creating a GUI. • Provides an “action” model for creating portlets. • Current version: 1.4. http://www.gridsphere.org/gridsphere/gridsphere/guest/download/r/ TeraGrid 2007
3. Tips for building a usable gateway • How can I make sure that my gateway will actually be used? • If you keep in mind three most important factors: a) users, b) users, and c) users. • Let users dictate; don’t assume. • If users can’t, spend time with them; observe what they do and how they do it. • Test, test some more, then test until you drop. The assumption that an IT person/developer, removed from the user/discipline, can “build it and they will come” is doomed from the get go. TeraGrid 2007
3. Usability tip #1: Determine what users want/need • Some users know and come seeking help. • Others have no idea; they don’t know what’s possible. How do I help them? • Try this: • “Can I come over and see your lab (or how you do X)?” X might be • process data/run simulation/handle results • submit/run/monitor jobs, etc. • “Ah, that’s how you do it. What if I can Y?” Y might be • make it 100x faster • make it a lot easier, etc. TeraGrid 2007
3. Usability tip #2: Design/build a good user interface • Otherwise why would Microsoft spend zillions of dollars on developing and testing its user interfaces? • The UI can be a make or break factor. • How do I ensure that I have a usable UI? • Formal usability testing in a usability lab • Scour the web to learn about usability/testing • Read the “Usability 101” document http://dhruv.uits.indiana.edu/portals/usability-101.doc • Perform poor man’s usability testing TeraGrid 2007
3. Developer/user UI disconnect … * From “DON’T MAKE ME THINK: A Common Sense Approach to Web Usability” by Steve Krug. TeraGrid 2007
3. Usability Tips #3: Follow best practices • Refer to the TeraGrid Science Gateways Primer http://www.teragridforum.org/mediawiki/index.php?title=TeraGrid_Science_Gateways_Primer TeraGrid 2007
4. Scaling up • Do I need to scale up? • Not necessarily. • Many scientific applications provided in a gateway may require only local resources (compute cluster, storage, databases, etc.). • Many existing science gateways use quite modest back ends for compute resources. • Some even have nothing to do with CPU cycles or grid at all. TeraGrid 2007
4. Scaling up … • Ok, so when do I need more powerful resources (such as the TeraGrid)? • Reactively: • too many new users, analyses, etc. • processing is too slow to be useful • local resources no longer sufficient • users yelling at you? • Proactively: • possible future growth designed in from the get go • close monitoring of trends • etc. … TeraGrid 2007
4. Scaling up … • Why should I use the TeraGrid? • Virtually unlimited resources (CPU cycles, storage, databases, etc.) • Many services available. • Easy to get access. • TG support staff ready to help. • A production, national grid infrastructure (looks good on grant too) TeraGrid 2007
4. Scaling up … • Ok, I am convinced that I need to scale up? What do I do next? • Nancy Wilkins-Diehr will be addressing this later today. TeraGrid 2007
Still awake? Had enough? TeraGrid 2007
5. Local resources needed • So what will it take locally for me to build one of these gateways? • People • Expertise • Time • Hardware • Software TeraGrid 2007
5. Local resources needed … • How many people? • Depends. For a complex, grid-based gateway, 1-2 FTEs. Much less if modest effort (an undergrad). • What level of expertise? • For need-it-now projects, interpreted language (PHP, perl, Ruby, etc.) programming skills + some DB (MySQL, etc.) knowledge. • For a well designed, high-end gateway Java programming skills a must. Also some database and UI experience. • How much time? • Anywhere from 3-6 undergrad months for a simple gateway to roughly ~2 FTE-years for one that is fairly complex; this includes the learning curve (modest to high). • What hardware? • Anywhere from a Unix/Linux box to an entire Linux cluster depending on development needs. TeraGrid 2007
5. Local resources needed … • What software? • Programming language(s): Perl, Python, Ruby, PHP, Java, Javascript, etc. • Development environment (compilers, editors, debuggers, etc.). • Databases: MySQL, PostgreSQL, etc. • Server environment: Apache httpd, Apache Tomcat, etc. • Grid middleware: Globus toolkit, COG kit, etc. • Portlet container: Gridsphere, Pluto. • Portlet building toolkit: OGCE, GridPortlets. • Web services: WSRF, WSRP, etc. • Popular portal building toolkit: Joomla/Drupal/Mambo/e107, etc. TeraGrid 2007