160 likes | 383 Views
Challenges in Integrating Diverse Data for Ecological Synthesis Special Roles & Responsibilities for Information Managers. Judy Cushing The Evergreen State College Olympia WA judyc@evergreen.edu www.evergreen.edu/bdei NSF EIA-0310659, EIA-0131952
E N D
Challenges in Integrating Diverse Data for Ecological SynthesisSpecial Roles & Responsibilitiesfor Information Managers Judy Cushing The Evergreen State College Olympia WA judyc@evergreen.edu www.evergreen.edu/bdeiNSF EIA-0310659, EIA-0131952 http://canopy.evergreen.edu/canopydbNSFDBI-0417311, DBI-0319309, … www2.evergreen.edu/quantecology
Challenges in Integrating Diverse Data Lessons Learned from the Grasslands Data Integration (GDI) Project* Information Managers Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN) , Mark Servilla (NET), Kristin Vanderbilt (SEV) Computer Scientists and Data Analysts Judy Cushing, Carri LeRoy, Juli Mallett, Lee Zeman Ecologists Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) Integrate Above-Ground Net Primary Productivity (ANPP) data, with its drivers (contextual data) for cross-site comparisons (Ecological Synthesis), past and future (come visit our poster!) Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN) , Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN) , Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists
What’s in the GDI Database? • recorded or calculated annual aboveground NPP values from 5 LTERs: Jornada, Sevilleta,SGS, Konza, Kruger • 4,126,700 grams, over 20 years in 1697 plots
What’s did we Find? • Ecology • Environmental drivers of ANPP • ANPP-based grassland community composition. • Preliminary definition & provision of contextual data – • Ecotrends ++…. • Information Management: species table fixes, ideas for better experimental design documentation, scripting for data integration…. CHANGE LOGS WERE ESSENTIAL; USDA PLANTS DB • 4. CS – case study on Data Integration; need for TOOLS: • PASTA-LIKE SERVICE & • TAXONOMIC CONCEPT SERVICE Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN) , Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Jincheng Gao (KNZ), Nicole Kaplan (SGS), Ken Ramsey (JRN) , Mark Servilla (NET), Kristin Vanderbilt (SEV) LTER Information Management Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists Christine Laney (JRN), Alan Knapp (SGS), Daniel Milchunas (SGS), Esteban Muldavin (SEV) LTER Ecologists
ANPP vs. Precip No climate data yet
r = 0.608 r = 0.631 r = 0.329 r = 0.196
CART Model: Classification and Regression Tree Model, R2 = 0.642!! Variables included in model: LTER, year, PDSI, NH4, NO3, absTmax, asbTmin, Tmax, Tmin, Tmean, Precip
Lesson 1What you (IMs) do is important • ANPP – a critical ecological measure (indicator?) • You (Kristin, Ken, Nicole) made GDI happen…. • It’s a collaborative & interdisciplinary project – • and not a technology problem…. • IMs • Computer Scientists • Ecologists • Statistician (Data Analyst) • You know the issues, physically possess the data • for important ecological & scientific DB problems • e.g., global climate change, resource management
Lesson 2The GDI DB should be dynamic – Not StaticA static data warehouse is an oxymoronas is “Museum of Innovation” • More years, future years • Current data – further refined • More sites, different ecosystems
Lesson 3Volume Matters….More sites, more years, more trouble…. • More species codes • Differences in experimental design • Cross-site comparison highlights data anomalies • High volumes make a qualitative difference • A good data structure* matters even more…. * Ask me why GIS not been a priority to illustrate my field datasets….
Lesson 4Information Managers CriticalComputer Science in Crisis…. There won’t be enough CS graduates … to do all the jobs … even today….
NSF’S ICER (CPATH) INITIATIVE INTEGRATIVE COMPUTING EDUCATION & RESEARCH NSF • CS content changed (changing!) radically…. • No uniform agreement on the core… • Graduates lack a systems approach…. • Dwindling pipeline…. • US industry [& science] competitiveness threatened….
NSF’S ICER (CPATH) INITIATIVE NSF asked: Why is CS in crisis? What can be done? Northwest Region: http://www.evergreen.edu/icer Improve the quality of computing education …. Attract more people …. Improve retention…. Strengthen interdisciplinary connections…. Improve CS educational research …. Google asked: What can industry do? I ask: What should the LTER IMs do?
Lesson 4 (cont)Computer Science in Crisis…. My charge on this panel: IMs typically come from “the sciences” (essential) Yet their tasks are programming & managing software projects. What skills or tools are essential for IMs? …As an educator, which are effectively learned on-the-job, and which require formal training? Tools are learned on the job, Skills through practice. (but should be demonstrable before hiring) Concepts require (some) formal training…. (there is a handful of critical concepts?)
Lesson 4 (cont)What CS to do the GDI? • Concepts • Formal Languages & Parsing • Data Structures • Abilities • See patterns (and non-patterns) • Learn new technology fast; see when the tools won’t do it • Build new technology, services…. • Skills (tools) • Scripting Languages, Database tools and SQL But, CS is not enough… needed an interdisciplinary team…. historical perspective, ecology vision, statistical expertise Future tools – PASTA- like & TAXONOMIC SERVICES, Contextual data provision (ClimDB, EcoTrends)
Questions? Judy Cushing judyc@evergreen.edu www.evergreen.edu/bdei http://canopy.evergreen.edu/canopydb www2.evergreen.edu/quantecology