280 likes | 411 Views
Surviving the Data Deluge: Challenges and Opportunities for Academic Research. Kelvin K. Droegemeier Vice President for Research Regents’ Professor of Meteorology University of Oklahoma. A HUGE Spectrum. Zillions of small data;
E N D
Surviving the Data Deluge: Challenges and Opportunities for Academic Research Kelvin K. DroegemeierVice President for ResearchRegents’ Professor of MeteorologyUniversity of Oklahoma
A HUGE Spectrum Zillions of small data; ubiquitous, streaming,unexpected, highly perishable, non-reproducible, geo-referenced, mobile Huge structured data;anticipated, well defined, streaming, fixed in space,reproducible
The Notion of Sharing • Humans are naturally selfish, even as infants, and not sharing is an innate behavior we exhibit at a very early age!
Sharing RESULTS is Foundational to Scholarship and is Getting Easier! “School of Athens” Raphael (1509-1510)
But Sharing DATA? • Today, sharing data typically means providing access to everything that led to the published article • PLOS journals require authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception • “Dataare any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances” (PLOS) • For ScienceMagazine, “all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader, and all codes involved in the creation or analysis of data must also be available.” • New OSTP directive makes this generally applicable
Key Questions We Think About! • Definition of DATA • Who owns it and when, and who decides? • What should be kept and who decides? • How and where should it be kept and who decides? • How is quality assured and who determines it? • Who provides access and who pays? • How is access provided and who decides? • Who is given access and when, and who decides? • When should access be denied and who decides? • Who provides technical assistance in using data? • How should credit be given for generating/maintaining data, and who decides? • Who ensures and pays for compliance?
The Notion of Sharing • The academy encourages collaboration but still tends to evaluate and reward individual achievement – thus reinforcing at least caution in sharing
So Why Would We Share?? • Because it is required • To enhance our own scholarship • To enhance our discipline • To enhance our academic program • To enhance our institution • To bring benefits to our nation • To improve the world
Transdisciplinary Interdisciplinary Multi-Disciplinary The Research Spectrum Disciplinary D. Lightfoot NSF
Research at the Boundaries Social and Behavioral Sciences Physical Science Technology and Engineering Policy Economics Open Data will help (force) us to more quickly and effectively develop common vocabularies and build trust among researchers across disciplines
The Value Proposition of Open Data Access • In Research • Replication/reproducibility of previous results • New interpretation and analysis to scale for heterogeneous data sets – an open market approach • Opportunities for collaboration, probably numerous and multi-disciplinary • Follow-on studies and thus faster progress • Identification of unknown problems • Greater accountability and responsible conduct of research • Research experience in K through 16 • Credit for gathering/generating/making available data • In Teaching • Online and engaged learning • In Economic Development • Corporate R&D to use in analytics
Question: Will Open Data Access Make a Difference? 550 deaths
Open Data May Facilitate Understanding! Numerical Simulation 24 hours CPU = 1 hour real 20 TB of output Still trying to understand Mother Nature Real time! Still trying to understand
And Provide a Pathway for Saving Lives • Social and behavioral science research, especially mining of social media data, is critical for understanding why people die in tornadoes • Data from those studies, coupled with output from computer weather models, are being used to re-envision the entire severe weather warning system • More eyes on the problem = better outcome
Google Earth and Apple iPhone as Possible Role Models • Provide the substrate (base code and APIs) and open it up to the world • This is data and possibly code in the context of open access • May draw in young people and those from underrepresented groups • Availability of tools is essential, and open access data may lead to a vast array of new resources for analyzing data, especially using cloud services • Today, no real market exists but OA could create one!
The Challenges of Open Data Access in Research • Misuse of data owing to misunderstanding battles about results • Burden on researchers to explain data, deal with access problems • Applying analytics to mine vast amounts of information and make sense of it • Misappropriation of credit
Broader Challenges of Open Data Access • Privacy: S (tiny bits) = life story or valuable intelligence • Workforce is a HUGE challenge and cuts across all disciplines. How do we educate in that manner? • Tools for data analytics that are extensible and format/domain-agnostic
View from a Vice President for Research • Key Point #1: Facilitating Research With Data • A means to tackle some of the most compelling intellectual challenges at the intersections of multiple disciplines • It’s not only about providing access but also helping ensure EFFECTIVE use of data – which is not automatic! • The institution must help: bring people together, stimulate conversations, bridge language barriers, build trust, guide thinking, provide support • Library has a unique role to play – a renaissance as the intellectual commons of our campuses • As IT has opened new doors and brought people/disciplines together, so can the “data challenge” if we handle it properly! • Especially critical for engagement of social sciences and the humanities
View from a Vice President for Research • Key Point #2: Providing Credit and Support • System (but importantly also a philosophy) for giving credit to faculty for generating, maintaining, and provisioning data (similar to how IP has been added to portfolio) • Provost, Deans, Tenure Committees, Senior Faculty • Building data stewardship into research metrics aka citations, impact factors, etc • Creation of persistent identifiers/tags (EZID, DataCiteConsortium) – but also WHAT credit means • Creation of an indirect cost component for data and compliance
View from a Vice President for Research • Key Point #3: Logistics and Cost • Coordination and consolidation of data management approaches across the institution: Provost, Library Dean, CIO, VPR • Appropriate cyberinfrastructure, security, systems-level approach • Integration into the broader academic ecosystem • The strategies in which we invest today may be quite different in a short time – shifting sands • Division of responsibilities (local and national) • When will the dust settle regarding policies? • Unity versus diversity in approaches
View from a Vice President for Research • Key Point #4: Changing Times • Place-based education, especially at the undergraduate level, means changing how we educate students and bringing experiential learning to every student (aka undergraduate research and creative activity). Open Access offers HUGE possibilities! • Improved access to historical data may slow the pace of new data creation yet spur support for sustainability • Cost models may be impacted by disruptive technologies and ideas (e.g., MOOC analog in the IT sector) • Workforce creation is key – especially for dealing with nuances of statistics and blending data from highly disparate sources; and noting that curationand analytics are intertwined
Assessing Impact • Assessing impact is critical if we are to understand the real benefits (and possible drawbacks) of open data access • We must link assessment to value proposition points described earlier and be open to new ideas of measuring value (e.g., drawing in underrepresented groups to STEM fields, creation of new domains such as digital humanities) • Assessment also is critical for modifying policies to create even greater benefits and address drawbacks
Blue Sky Thoughts: Caution – A Forecast from a Meteorologist! • Open data will generate completely new models for collaborative interaction – a new “social media” in research centered around sharing, not simply communicating. This will lead to new tools and approaches (e.g., Tableau) • New disciplines likely will arise when researchers from one discipline start “playing around” with data from others, and these new perspectives will reveal new insights, solve longstanding problems and raise new questions. The “Fourth Paradigm.”
Blue Sky Thoughts: Caution – A Forecast from a Meteorologist! • The entire notion of a “publication” will change, leading to open-ended online research release events in which new analyses and data will continually be added to previously reported outcomes, creating branches and forks in time, with other disciplines linked in a virtual intellectual commons • Access to data, and tools for analysis, will enhance interest and exploration by children and offer greater opportunity to increase participation by traditionally underrepresented groups
The End Game for Research • Open Data will enable NEW and better ways of doing old things + lots of NEW things • Reproducibility (repeat/verify) • Creatibility (frontier) • Extensibility (extend/expand) • Collaborability (across)