260 likes | 410 Views
Institute for Research on Innovation & Science (IRIS). Jason Owen-Smith IRIS/University of Michigan jdos@umich.edu Iris.isr.umich.edu @IRIS_UMETRICS. Roadmap. Background on IRIS What we currently do USE Cases for MPC/FHE/etc. The Challenge(s).
E N D
Institute for Research on Innovation & Science (IRIS) Jason Owen-Smith IRIS/University of Michigan jdos@umich.edu Iris.isr.umich.edu @IRIS_UMETRICS
Roadmap • Background on IRIS • What we currently do • USE Cases for MPC/FHE/etc.
In 2016, our society invested $220 on academic research for every man, woman, and child in the country For every $1: $0.55 from federal government, $0.24 from universities, $0.06 each from states, industry, non-profits, $0.03 from all other sources • We make those investments to develop human knowledge and to improve quality of life and well being. • How do we understand and improve those effects?
The Wisconsin Idea Proposed revision The mission of the [University of Wisconsin] system is to develop human resources to meet the state’s workforce needs, to discover and disseminate knowledge, and to develop in students heightened intellectual, cultural, and human sensitivities, scientific, professional, and technological expertise, and a sense of purpose. The mission of the [University of Wisconsin] system is to develop human resources to discover and disseminate knowledge, to extend knowledge and its application beyond the boundaries of its campuses and to serve and stimulate society by developing in students heightened intellectual, cultural, and human sensitivities, scientific, professional, and technological expertise, and a sense of purpose. Inherent in this broad mission are methods of instruction, research, extended training and public service designed to educate people and improve the human condition. Basic to every purpose of the system is the search for truth.
Our Response: IRISData for research and reporting to understand, explain, and improve the public value of academic research
Framework Discovery Learning Dissemination Innovation Entrepreneurship Economic Growth Public Health Food Safety Security (More) Rational Policy … Propose Knowledge, People, Skills Fund Science Investments Universities Hiring, Spending Jobs Stimulus
Background • Founded in 2015 • Recession STARMETRICS UMETRICS IRIS • Emerged from CIC/Big Ten • Transaction level sponsored projects expenditures on employees, vendors and sub-awards • 33 current member institutions (11 Big 10) = ~30% of federal R&D spend • Members contribute to support infrastructure & receive reports and other data products • Goal is 150 members (~93% of federal R&D spend) • IRB approved data repository – Virtual Data Enclave • ~60 current users w/ approved projects, signed DUAs • Disclosure proofing procedures • But basically a trust model
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 University transaction data – Restricted
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 2 University transaction data –Restricted US Census outcome data –Restricted
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 3 2 • University transaction data –Restricted • US Census outcome data –Restricted • Federal grant data – Public
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 3 2 University transaction data –Restricted US Census outcome data –Restricted Federal grant data –Public US Patent Office data – Public 4
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 3 2 5 • University transaction data –Restricted • US Census outcome data –Restricted • Federal grant data –Public • US Patent Office data – Public • Publication data –Public & Restricted 4
Research and reporting to understand, explain and improve the public value of academic research Key goal: long term, near comprehensive, longitudinal data about academic researchers Key problems: no single data source, most extant data is about documents (grants, publications, patents) not people, no (public) persistent identifiers 1 3 2 5 University transaction data –Restricted US Census outcome data –Restricted Federal grant data –Public US Patent Office data – Public Publication data –Public & Restricted Dissertation data –Public & Restricted 4 6
Submission Process • Common data structure • Upload through a secure portal • Coded quality assurance checks • Immediate: e.g. value ranges, duplication, missing fields, record counts etc. • 24 hours: e.g. normalization • Data depositors generally don’t know what’s really in the data they submit
Process Challenges • Universities vary dramatically in quality of data produced • Normalization, actually unique identifiers, garbage strings, duplicates, missing data, negative values, wonky date ranges . . . • Labor intensive community and relationship building • Substantial work required to integrate multi-university data • No persistent individual or organizational identifiers, wildly different naming conventions • Disambiguation and unstructured linkage across univs and b/t integrated data and public sources (e.g. pubmed, ISI, patents, proquest) • Computationally intensive feature-based SVM approach in development • Formatting issues – e.g. Census works only in SAS and requires access to identified micro-data • Social science researchers are trained (and expect to) look closely at micro-data, construct unique variables, integrate new data sources • Data access is time consuming, may be a barrier, limitations in computing capacity, many “help desk” requests, disclosure review
Use Cases for FHE • Social science research, standard statistical methods • Key challenge, data are supposed to be flexible and usable for research we cannot envision • Universities who want to benchmark but don’t want to be identified • Universities love to compare themselves to others but hate to be compared • Agencies, policy-makers, the public (?) who want to explore aggregate data • Issues of trust/oversight, generally need to see information across a portfolio of institutions • ???? “Living” data. Multiple updates per year, two transfers to Census, annual documented data release through a virtual and a physical enclave.