EOSDIS Alternate Architecture Study

EOSDIS Alternate Architecture Study Jim Gray McKay Fellow, UC Berkeley, 1 May 1995, gray @ crl.com 1. Background - problem and proposed solution 2. What California proposed Co workers: Mike Stonebraker: Producer / Director / Script Writer / Propeller Head Bill Farrell: Ramrod and Computer-literate DirtBag Jeff Dozier: Godfather Special effects: Earth Science: Frank Davis, C. Roberto Mechoso, Jim Frew Computer Science: Reagan Moore, Jim Gray, Joe Pasquale Administration: Claire Mosher Writing: Stephanie Sides Prototypes: many-many....people

What’s The Problem? • Antarctica is melting -- 77% of fresh water liberated • => sea level rises 70 meters • => Chico & Memphis are beach front property • => New York, Washington, SF, LA, London, Paris • Let’s study it! Mission to Planet Earth • EOS: Earth Observing System (17B$ => 10B$) • 50 instruments on 10 satellites 1997-2001 • Plus Landsat (added later) • EOS DIS: Data Information System: • 3-5 MB/s raw, 30-50 MB/s processed. • 4 TB/day, 15 PB by year 2007 • Issues • How to store it? • How to serve it to users?

What Happened? • 1986: Mission to Planet Earth • 1989: Bids from Hughes & TRW • 1993: Contract Grant, Public Review: • customers do not want it (tape/mainframe centric) • 1994: Alternate Architecture • Three “outside teams” • Wyoming: Internet 20,000,000 • Maryland: Software Engineering • California: DB centric • One “home team” CORBA & Z 39.50 & UNIX • 1995: Drifting in the Sequoia direction

The Hughes Plan • 8 DAACs (Data Active Archive Centers) = Bytes • (one per congressional district?) • N SCFs (Scientific Computation Facilities) = MIPS • (typically instrument or science teams) • Thin wires among them • 90% of DAAC processing is PULL • building standard data products • fixed pipeline: calibrate, grid, derive • Typical subscriber gets tapes or CDroms • (standard data products) • One “chauffeur” per 10 customers (high ops costs) • Build everything (operations, HSM, DBMS,...) from scratch • CORBA and Z 39.50 is the glue. • Criticism: not evolvable, not open, not online, not useful.

What California Proposed • 0. Design for success: expect that millions will use the system (online) • 1. DBMS centric design automates discovery, access, management • 2. Object relational databases enable • Automate access to data so that the NASA 500, • Global Change 10,000 and Internet 20,000,000 can use system. • Cache popular results, not all results (saves 3x or more) • Compute on demand (saves lots of storage and cpu). • Emphasize pull processing rather than push processing. • Use parallelism to get scaleup. • Do Batch as a data pump • 3 Be Smart Shoppers: • Use COTS hardware/software (saves 400M$) • Just-in-time acquisition (saves 400M$) • Use workstation not mainframe technology (gives 10x more stuff) • Depreciate over 3 years (ends in 2007 with "fresh" equipment) • 4. 2 + N node architecture • 2 Super DAACs for fault tolerance and for growth. • Unify the 2 "big" data storage centers with 2 big data analysis centers. • Allow many “little” peer-DAACs at science/user groups

Meta-Model for Sequoia Proposal • Be technological optimists: • couldn’t build it today, count on progress. • ride technology wave (= not water cooled) • Buy or Seed, do not build. • Use COTS where possible • Fund 2 or more COTS vendors if need product • OR DBMS • HSM • Operations. • Replace people with technology (= OR DBMS): • automate data discovery, access, visualization • DBMS Centric view.

DBMS Centric View • This is a database problem (no kidding)! • This is not • a file system problem (file wrong abstraction) • a rpc problem (CORBA wrong abstraction) • a Z 39.50 problem (Z 39.50 is a FAP). • This is a operations problem • Hierarchical storage management • Network management • Source code control • client-server tools. • You can BUY all this stuff. Fund COTS. • BUILD AS LITTLE AS POSSIBLE

Design for Success: Expect Lots of Users • Expect that millions will use the system (online) • Three user categories: • NASA 500 -- funded by NASA to do science • Global Change 10 k - other dirt bags • Internet 20 m - everyone else • Grain speculators • Environmental Impact Reports • New applications • => discovery & access must be automatic • Allow anyone to set up a peer-DAAC & SCF • Design for Ad Hoc queries, Not Standard Data Products If push is 90%, then 10% of data is read (on average). • => A failure: no one uses the data, in DSS, push is 1% or less. • => computation demand is 100x Hughes estimate • (pull is 10x to 100x greater than push)

Push Processing Pull Processing Other Data The Process Flow • Data arrives and is pre-processed. • instrument data is calibrated, • gridded • averaged • Geophysical data is derived • Users ask for stored data • OR to analyze and combine data. • Can make the pull-push split dynamically

The Software Model: Global View • SQL* is the FAP and API. • Applications use it to access data. • It includes • stored procedures • (so RPC) • GC class libraries • Computation is data driven • Gateways for other interfaces • HTTP, Z 39.50, Corba & COM • TP or TP-lite manages workflow

Automate access to data • Invest in: • Design global change schema. • cooperate with standards groups. • OR DBMS class libraries for GC datatypes • Develop browser to do resource discovery • Community will develop access & vis tools • OR DBMS will do • PUSH processing: triggers and workflow • PULL processing: query optimization. • (some assembly required).

How Well Did SQL Work? • Bill Farrell and others did 30 user scenarios schema, application, SQL, performance • Snow cover, CO2, GCM,... • Avg ad hoc scenario generated about 30% of • EOSDIS baseline processing • => validated PULL over PUSH demand • SQL was indeed a power tool: • Many scenarios became a few simple SQL queries: • Need a spatial & temporal SQL. • Personal view: • It’s great!, much better than Farrell or I expected.

Compute on demand • 90% of data is NEVER used (according to Hughes). • Some data is used only once. • Data is often re-calculated • repair hardware/software bugs, • new & better algorithms • Optimization: store only popular data. • Compute this based on past use • (of this data and related data) • Balance two costs: • 1. Re_Compute_Cost / Re_Use_Interval • 2. Storage_Cost x Re_Use_Interval • Recompute is often cheaper (saves 3x we think).

Use parallelism to get scaleup. • Many queries look at 100s or 1,000s of data tiles. • e.g. Berkeley weekly Landsat images since 1972. • = 1000 tape accesses. • = 4,000 tape minutes = 6 days. • Done 1,000 way parallel: = 4 minutes. • Disk & tape demands are huge: multi-GOX • Computation demands are huge: tera-ops. • Only solution: • Use parallel execution • Use parallel data access • SQL* does this for you automatically.

Data Pump • Compute on demand small jobs • less than 1,000 tape mounts • less than 100 M disk accesses • less than 100 TeraOps. • (less than 30 minute response time) • For BIG JOBS scan entire 15PB database • once a day /week • Any BIG JOB can piggyback on this data scan. • DAAC in 2007:

Use COTS hardware/software (saves 400M$) • Defense contractors want to build (and maintain) stuff. • (they do it for the money) • Fund SQL* (SQL-2007): Object-Relational (extensible) • supports Global Change data types • Automates access • Reliable storage • Tertiary storage • Parallel data search (automatic) • Workflow (job control) • Reliable • Fund Operations software companies (Tivoli...)

Use workstation technology (NOW) • Use workstation hardware technology, • not Super Computers • 0.5$/MB of disk vs 30$/MB of disk • 100$/MIPS vs 18,000$/MIPS • 3k$/tape drive vs 50k$/tape drive • Processor, Disk, Tape ARRAYS: connected by ATM • a NOW • Gives 10x (?100x) more stuff for same dollars • Allows ad hoc query load • Allows a scaleable design • Allows same hardware: SuperDAACs = PeerDAACs

Use workstation technology (NOW) • Study used RS/6000 and DEC 7000 as workstation • (they are 100k$/slice). • Should have used Compaq. • Price for 20GFlop, 24 TB disk, 2PB tape TODAY Compaq/DLT prices computed by Gray. 10% Peer DAAC costs 3M$ today, 1% Micro DAAC (200TB) costs 300K$

Just-in-time acquisition (saves 400M$) • Hardware prices decline 20%-40%/year • So buy at last moment • Buy best product that day: commodity • Depreciate over 3 years so that facility is fresh. • (after 3 years, cost is 23% of original). 60% decline peaks at 10M$

2+N DAAC architecture • 2 Super-DAACs Have 2 BIG sites which • Each store ALL the data (back each other up) • no other way to archive these 15 PB databases • Each service 1/2 the queries and run a data pump • Each produces 1/2 the standard data products • Each has a BIG MIP farm next to the Byte farm • (a SCF science computation facility). • N Peer-DAACs • Each stores part of the data (got from a super DAAC) • Can be NASA sponsored or private. • Same software and hardware as Super-DAACs • Super-DAACs are “banks”, Peer-DAACs are “pubs” • careful anything goes

Minimize Operations Costs • Reduced sites (DAACs) have reduced costs • Use Mosaic, Email, Telephone user support model • Count on vendors to provide: • Network management (NetView & SMTP) • Data replication • Application software version control • Workflow control • Help desk software • More reliable hardware/software

Unify data storage centers with data analysis • Data analysis (Science Computation Facilities) • need quick & high bandwidth access to DB. • WAN technology is good but not that good. • WAN technology is not free. • => Co-Locate DAACs and SCFs. • => two super SCFs, many peer SCFs. • Instrument teams often find a bug or new algorithm • => reprocess all the base data to make new data set. • => ripple effect to data consumers • => must track data lineage.

Budget • We had a VERY difficult time discovering a budget. • So we did our own. • It was less. • Big savings in operations and development • Hardware savings could give bigger DAACs

Challenging Problems • Design the Global Change Schema • Understand data lineage • Build discovery, analysis, visualization tools • Build an OR DBMS • Including distributed, • parallel, • workflow • lazy-eager evaluation • tertiary storage, • SQL • workflow • Build a decent & reliable HSM • Build a way to operate a 1,000 node NOW.

EOSDIS Alternate Architecture Study