Mary Beth Hughes, Ph.D. IDA Science and Technology Policy Institute

Can Traditional R&D Evaluation Methods Be Used for Evaluating High-Risk, High-Reward Research Programs? Mary Beth Hughes, Ph.D. IDA Science and Technology Policy Institute American Evaluation Association Annual Conference Anaheim, CA November 4, 2011

Overview • Introduction to the Problem • A Typology of HRHR Research Programs • Applicability of Traditional RTD Evaluation Methods to HRHR Research Programs • Example: NIH Director’s Pioneer Award • Some Concluding Thoughts

A Call for High-Risk, High-Reward Research Programs…

…And a Desire to Understand Program Effectiveness

What are High-Risk, High-Reward Research Programs? HRHR research programs not well defined Generally, HRHR research programs aim to support unconventional, innovative, creative, transformative, “outside the box” research that will have a larger impact than status-quo research HRHR research programs may differ from traditional RTD programs in terms of: funding amount and duration, mechanism of funding, target recipients, target research fields, selection criteria, and selection/review processes

The Problem What do we know about when and how traditional methods can apply to evaluations of HRHR research programs? Given the differences between HRHR programs and traditional R&D programs (and the hypothesized difference between “normal” research and “HRHR” research), it is not clear that the application of traditional RTD evaluation methods to HRHR programs is valid.

Very Few Evaluations and Research on U.S. HRHR Programs* Summative Studies Formative Studies *Many more studies of ‘creative’ research at individual scientist or organizational level – but no commonly agreed on indicators of creative research NIH’s Director’s Pioneer Award – IDA Science & Technology Policy NSF’s SGER – SRI HHMI vs. R01 – Pierre Azoulay, MIT NIH’s Director’s New Innovator Award– IDA Science & Technology Policy NSF’s Emerging Frontiers in Research and Innovation – IDA Science & Technology Policy

A Typology of HRHR Programs (Heinze, 2008; Hughes and Lal, 2009) • People Programs • Aimed at funding an individual scientist to undertake (almost) any research project; longer duration funding and higher funding amounts • Synergy Programs • Aimed to move a field forward through projects based on teams or inter- or multi-disciplinary approach • Challenge Programs • Aimed at funding projects based on a technological challenge or critical national need; funding may be milestone-based • Seed Programs • Aimed to jump start a project (often unconventional); shorter duration funding and lower funding amounts

“People” Programs Present Greatest Challenge to Translation of Methods • Synergy Programs • Emerging bibliometrics to measure interdisciplinarity • Social network analysis to measure collaborations • Challenge Programs • Specific challenges and milestones identified a priori serve as measures of desired outcomes – did research meet milestones or is it contributing to outcome? • Seed Programs • Intended to incorporate new projects into traditional funding streams – did projects go on to receive funding? • People Programs • Research projects not well defined, funding used for multiple ideas, many different types of risks possible • Longer-term funding means need for early indicators • Concept of failure less clear

One Example: The NIH Director’s Pioneer Award • Key Features • Managed out of NIH’s Office of the Director • 5-page application reviewed externally • Interview by external panel • Three high-level criteria (Years 2+) • Scientific problem to be addressed • Investigator • Suitability for NDPA mechanism • 5 years of funding, $2.5M • 51% effort commitment • Flexibility in how funds are used • Evaluation Request • NDPA represented several “firsts” for NIH • Viewed as an “experiment” in how to fund biomedical and behavioral research • IDA/STPI was asked to evaluate short term outcomes of first 22 awardees

NDPA Evaluation Used Common RTD Evaluation Methods These methods are all ex-post but our understanding around ex-ante evaluations of HRHR research (e.g. proposals) is also weak. How well does traditional peer review apply to HRHR research? How successful are various alternatives (shortened applications, interviews, “sandpit process”, etc.) at identifying HRHR research? Many opportunities for further study… Bibliometrics Case Studies - descriptive Expert Review

1. Bibliometricsfor Traditional RTD Evaluations • Description of Method • Some examples of bibliometrics • Traditional: Counts (and variations thereof), Citations (and variants therof), Content Analysis • Emerging: Interdisciplinarity, Burstness, Centrality • Uses in traditional RTD evaluations • Counts used as measures of productivity • Citations used as measures of utility and dissemination

Bibliometrics for HRHR Research Evaluation • Prior Work • Many studies proposing relationships between bibliometric-based indicators and creative research outcomes (Productivity (Simonton, 2004); Interdisciplinarity(Heinze, 2007); Brokerage (Burt, 1992); Burstness and Centrality (Chen, 2009)) • Azoulay (2010) found HHMI researchers had higher level of productivity both post-award and compared to control group (and higher levels of highly cited publications) • NDPA Findings • No apparent correlation between measures in literature and expert review of research • Interpretation unclear – short-term nature of evaluation? Small number statistics? • Counts cannot distinguish between concept of a researcher trying something new and failing and an unproductive scientist • Conclusion for Use of Method for Evaluation of “People” Programs • Currently insufficient as sole method; further work needed to understand HRHR-specific bibliometricindicators (e.g. Transformative outcomes may not be readily accepted by scientific tradition (Polanyi, 1966))

Bibliometrics: Initial Comparisons Across HRHR Evaluations NDPA Data (Source: STPI) HHMI Data (Source: Azoulay, 2010) Bibliometrics show potential for cross-evaluation comparisons and synthesis of HRHR evaluations

2. Case Studies for Traditional RTD Evaluations • Description of Method • “In-depth investigations into a program, project, facility, or phenomenon, usually to examine what happened, to describe the context in which it happened, to explore how and why, and to consider what would have happened otherwise.”(Yin, 1994) • Uses in traditional RTD evaluations • Typically used in exploratory phases of a program (Rueggand Feller, 2003) • Helpful for understanding key relationships and variables in a complex phenomenon (Shadish, Cook, and Leviton, 1991)

Case Studies for HRHR Research Evaluations • Prior Work • No HRHR research program evaluations relying on case studies; literature on creative research includes case studies on individual scientists OR overall work environment but less understanding at level of groups or projects • NDPA Findings • Case studies allowed for tracking of research trajectory and gave awardees opportunity to state what made research pioneering • Great diversity across awardees in terms of use of funding, other funding, research approach, research trajectory, group size, group composition  no consistent markers of pioneering research • Conclusion for Use of Method for Evaluation of “People” Programs • Most suitable method for understanding HRHR research trajectory, especially in the near-term and for small samples • But need common framework for understanding what information to collect to enable better understanding of HRHR research (sociology of science?)

3. Expert Review for Traditional RTD Evaluations • Description of Method • Using informed judgments to make assessments • Many variants on implementation of review; effects of review process variants on review outcomes not well-understood • Uses in traditional RTD evaluations • Most widely used method for research evaluations • Recognized challenges, but often viewed as most robust method (Garfield, 2006; NAS, 1999; Nature, 2009)

Expert Review for HRHR Research Evaluations • Prior Work • Experts widely used for evaluating “high rewards” (e.g. Nobel Prize) • Little research done on how experts identify HRHR research, although Amabile (1982) suggests a technique for determining creative outcomes • NDPA Findings • Experts in the field of the awardee independently evaluated 3 publications and the case study of the awardee and asked to determine the level of “pioneeringness” • When asked how experts made determination, most included some variant of “you know it when you see it”…but there was still disagreement between experts (consistent with non-HRHR research evaluation findings) • Conclusion for Use of Method for Evaluation of “People” Programs • Currently necessary for understanding scientific contributions • Need better understanding of what experts are looking for in their assessment • Need robust data collection to enable cross-evaluation comparisons

Some Concluding Thoughts • To return to original question…can these methods be applied to HRHR research programs? • Sort of, BUT we need a clearer understanding of program theory of HRHR programs, need to tailor methods appropriately, and need to use multiple methods • Need more evaluations of HRHR research programs for synthesis of evaluation results • Studies are in progress (EURECIA, CREA, STPI, …) • As evaluators, need to balance role as auditors and researchers and caveat findings appropriately

Thank You! NDPA FY 2004 – 2005 Outcome Evaluation report available at: https://commonfund.nih.gov/pdf/Pioneer_Award_Outcome%20Evaluation_FY2004-2005.pdf • Acknowledgments • BhavyaLal • Stephanie Shipp • Elizabeth Lee • Amy Marshall • Brian Zuckerman • Questions? mhughes.ipa@ida.org

Mary Beth Hughes, Ph.D. IDA Science and Technology Policy Institute