200 likes | 219 Views
Learn about USDA's Agricultural Resource Management Survey (ARMS) and the improvements made to provide user-friendly access to critical agricultural data, ensuring confidentiality and enhancing analytical capabilities.
E N D
Improving researcher access toUSDA’s Agricultural ResourceManagement SurveyCharles Towe and Mitch MorehartEconomic Research Service, USDA
What is the Agricultural Resource Management Survey? ARMS is USDA’s primary survey for the annualcollection of data from farm operators about their: • Farm--ownership, governance, management, and performance • Choice of practices, inputs, and expenditures to produce crop and livestock commodities • Household--demographic attributes, economic and financial activities
Program Activities Supported by ARMS • Responding to mandates: Income for farms, Costs for commodities, Status of family farms • Support for U.S. National Economic Accounts (GDP, Personal Income) • Providing data to respond to USDApolicies & programs • Enabling research to inform decision makers on a variety of issues
Data delivery (pre 2004) • ARMS is complex survey that has existed, in one form or another, for approximately 20 years. • Since 1996 the data collection methodology has been standardized.
Project goals • Allow user to customize table request • Allow 2 way tables • Add state level analysis • Support graphical representation of data • Provide advanced users access to suite of regression-type methods Provide this to users in an environment that protects survey participants confidentiality to ensure future participation.
Primary and complementary cell suppression algorithm • Primary • looking at individual cells • class disclosure • Secondary • solving from totals or known formulae • combining data from different tables and sources • using non-suppressed information to infer things • much more difficult to check
Primary disclosure 1) Threshold rule • no cells with less than 3 units (enterprises) 2) Dominance rule • sum of the sample minus the two largest observations (C) cannot exceed 60% of the largest value, or C > 3/5 * U Two largest observations C U V W X Y Z
Secondary disclosure 1) Algorithm (equation check) determines additional cells for obfuscation in order to keep primary disclosure intact 2) Factored in solving from totals and across cells using relationship of data in a single table 3) Could not prevent cross table searches
Collect list of violating variables Primary Disclosure Rules on Data (plus statistical reliability) Equation Check Key variables Identify the method for selecting complementary cells Done! Build table for display Final List Candidate List Primary and complementary cell suppression algorithm Data request made
Final implementation • Prototype built in 2003 and presented at a peer review panel • highlighted the need to disseminate data further and illustrated the risks associated • Resulted in approval of a weighting scheme for all data which, theoretically, eliminates need of secondary suppression. • Pre-generated each data point • Faster response time, which • Allowed greater graphic capabilities
11/25/03 First prototype presented 5/10/04 Peer Review 6/21/04 Noise implemented 8/5/04 Security Evalua-tion 8/21/04 Audit logging 2004 8 9/24/04 Extranet Tool Released 4/21/03 Kickoff Team Charter Identify Goals, Project Plan, & Resources 3/26/04V-1 Release Intranet, by IP address 4/29/04 V-2 Releaseltd, secure Extranet by IP address 6/30/04 V-3 Release ltd, secure Extranet open outside of ERS 11/9/04 Tailored Reporting Tool Public Release Public website overhauled Project Timeline and Milestones eGov integration Testing & evaluation Data preparation Application functionality Design
Enclave Basics • Mission • To Promote Access to sensitive micro data • To Protect Confidentiality • To Archive, Index and Curate Micro-data • Background • Started by NIST/ATP • Went live July 2007 • Current participants/data producers: NIST/ATP, USDA/ERS (pilot), Kauffman Foundation • Innovations • Secure remote access • Collaboratory: a collaborative environment for researchers to work, share code, ideas & work with online discovery tools • Standardized metadata documentation techniques (IHSN’s microdata management toolkit; DDI compliance)
NORC Data Enclave: Mechanics of Portfolio Approach to Protection Provision of access – • Technical protection (IT and operational) • Agency-specific data protection requirements (Legal) • Statistical protection (Statistical) • Researcher training (Educational)