350 likes | 529 Views
Structuring Data to Facilitate Analysis. Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural Resources Fort Collins, CO. Workshop at the 2008 Pathways to Success Conference:
E N D
Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural Resources Fort Collins, CO Workshop at the 2008 Pathways to Success Conference: Integrating Human Dimensions into Fish & Wildlife Mgmt.
Workshop Objectives • Illustrate strategy for: • Facilitating analysis of 2006 National Survey ofFishing, Hunting, and Wildlife-Associated Recreation (FHWAR) • Increasing the usability of FHWAR data formanagement, planning & policy • Compare two types of data structures: • Flat files • Relational Entities
Traditional Flat File Rows = Respondents Columns = Variables
Flat Files – Journal Article Example Every journal article has: • One or more authors • Title • Journal name • Specifics about date of publication: Year Volume number Issue number Page numbers • Potentially keywords
Potential Issues with Flat Files • Problem • Diefenbach et al (2005) article had 7 co-authors • 7 columns (variables) necessary to accommodate all authors’ last names • 19 of 26 articles in flat file had only 1 or 2 authors • 67% of author fields empty • If first names included – more empty fields • Solution – Relational database
Relational Databases • Definition • Set of tables containing data for predefined categories • Data stored in separate files (tables) that are linked • Terminology • Table = Entity (E) • Rows (tuples) in table = information about an object(e.g., journal article or respondent) • Columns (attributes) = variables • Two types of relations (R) • Set of tuples – a table with attributes (these R’s store data) • Algebraic (Person ID in Table A = Person ID in Table B)(these relations use data stored in entities)
(R2)RelationJournal ID (R1)Relational TableArticle IDAuthor ID (R3)Relational TableArticle IDKeyword ID Relational Data Structure for Journal Articles Article Entity Article IDJournal IDArticle titleYear, Issue, Pages Journal Entity Journal ID Journal name Publisher info Author Entity Author ID Last, First nameContact info Keyword Entity Keyword ID(attitudes, norms)
Comparison Flat File vs. Relational Database Flat file Relational Database R1 = table multiple authors (AuthorID) linked to given article (ArticleID) Author entity and R1 (author–article relation)can have any number of rowsso all authors of an articlecan be identified
FHWAR Flat File Example • Fishing, Hunting & Wildlife-Associated Recreation (FHWAR) • National Survey – Conducted about every 5 years • 1955 – first survey • 2006 – most recent survey • Data on hunters, anglers, wildlife watchers: • Sportsperson expenditures • Species sought in different states • Data collection costs (1991–2006) $55 million (in 2008 $) • 1991-2006 data comparable within limits but not integrated
2006 – FHWAR Flat File Data Data distributed on CD containing 3 ASCII text files: • Screening data • Sportsperson (hunting & fishing) data • Wildlife Watcher data
FHWAR Flat File – Analytical Issues • Important issues • 4,500+ vars with obtuse variable names (e.g., NCU_STD1) • 200 pages of documentation • Census conversion programs do not create variable labels or value labels • Major issues • Data compression • Conceptual complexity
Conceptual complexity • When uncompressed to blocks of 50 states ≈ 20,000 variables • Difficult to visualize analysis strategy • Flat FHWAR files hide data structure Analytical Issues Affecting Use • Data compression • No hunters hunt in all 50 states (at most 8 in 2006 data) • To avoid numerous empty cells data are compressed(e.g., the values for 3 vars are combined into a single var) • For example: “days” of participation is combined with an “activity” (e.g., big game or small game hunting) in a given “state” (in the order states are mentioned) • Compressed vars cannot be directly analyzed by SAS or SPSS
Entity PERSON SPORTSPERSON HUNTING_ACTIVITY TRIP_EXPENDITURES Based on flat file: Screening Sportsperson Sportsperson Sportsperson Relational File Structure Illustration Four entities ≈ half of the 2006 FHWAR flat file data
PERSON Entity • 6 control variables (e.g., Person_Weight) • 10 demographic variables (e.g., Age, Sex) • 8 hunting variables (e.g., Hunted_2005) • 8 fishing variable (e.g., Fished_2005) • 6 residential wildlife watching variables(e.g., Home_Observe_2005) • 5 non-residential wildlife watching variables(e.g., Trip_Watch_2005)
SPORTSPERSON Entity • 6 control variables(e.g., Person_ID, Sportsperson_Weight) • 11 demographic variables (e.g., Age, Sex) • 15 national summary variables (e.g., Hunted_2006) PERSON variables in SPORTSPERSON could be “obtained” from PERSON but also included SPORTSPERSON to simplify analyses
TRIP_EXPENDITURES Entity TRIP_EXPENDITURES entity reduces 844 compressed vars to 10 vars
HUNTING_ACTIVITY Entity HUNTING_ACTIVITY entity reduces 840 compressed vars to 12 vars
Variable: Sub_Table_ID HUNTING_ACTIVITY entity A collection of state-level sub-tables to facilitate analysis
Sportsperson(Sportsperson data) Control Variables Person ID Sportsperson Weight Demographics (11) National summary “species” variables (15) Hunted 2006 Big game hunted Days big game hunted Trips hunting big game Hunting Activity(Sportsperson data) Person IDSportsperson WeightSub Table IDResponse UnitResponse Trip Expenditure(Sportsperson data) Person IDSportsperson WeightSpending categoriesDollars Visualizing the 4 Entities Person(Screening data) Control Variables Person ID Person Weight Demographics (11) Hunting (8) Hunted Ever Hunt Intentions Fishing (8) Wildlife Watching Residential (6) Trips (5)
Summary • About 1,750 flat file variables reduced to < 60 • Obtuse variable names replaced with intuitive names • Compressed flat file variables cannot be directly used in SPSS or SASVariables in relational entities can be used in analysis • Details in Beaman & Vaske (2008)
To simplify analyses 2 additional entities: Hunting_Activity_and_Demographics Trip_Expenditures_and_Demographics Entity Data Files (http://welcome.warnercnr.colostate.edu/~jerryv/)
SAS Code & SPSS Syntax Figure numbers based on Beaman & Vaske (2008) http://welcome.warnercnr.colostate.edu/~jerryv/
Example – Hypothesis Average days of elk huntingvaries between Colorado vs. Wyomingand by hunter’s sex
Flat File to Entity for Hypothesis Data FHWAR6.Hunt_BGspecies_States ; Length Person_ID 5 Sportsperson_Weight 4 Sex 3 State_of_Residence Activity_Location Fish_Hunt_Type Response_Unit Response 4 ; Set FHWAR6.fh3 (rename = (sex = xsex)) ; Keep Person_ID Sportsperson_Weight Sex State_of_Residence In_State Response Activity_Location Fish_Hunt_Type Response_Unit ; Person_ID = PersonID ; Sportsperson_Weight = spwgt ; Sex = Xsex ; State_of_Residence = put (resstate, $st2num2.) ; * Array stores info to identify state when decompressing ;Array a1( 2, 8 ) HUNTSTD1-HUNTSTD8 STDAYSHD1-STDAYSHD8 ;
* Array stores info to associate species with variables ; Array gam1( 9) g1-g9 ;Retain g1 1 g2 2 g3 3 g4 4 g5 5 g6 6 g7 7 g8 40 g9 41 ;Array a7( 2, 9 , 8 ) bgame1d1--bgdifday9d8 ;Do m = 1 To 2 ; Do j=1 To 9 ; Do k=1 To 8 ;If a1( 1, k) = ' ' Then Goto End7 ; Fish_Hunt_Type = gam1(j) ;If m = 1Then Do ; Response_Unit = 1 ; End ;Else Do ; Response_Unit = 2 ; End ; Response = a7(m, j, k) ; Activity_Location = put(a1( 1, k), $st2num2. ) ; If Activity_Location = State_of_Residence Then In_State = 1 ;Else In_State = 0 ; * Outputs data for hypothesis; If Response > 0Then Output ;End7: End ; End ; End ;run ;
SAS Entity to SPSS Entity Get SAS Data = ‘C:\Hunt_BGspecies_States.sas7bdat’. Add Value labels Save Outfile = ‘C\Hunt_BGspecies_States.sav’.
GET File = 'C:\Hunt_BGspecies_States.sav'. WEIGHT BY Sportsperson_Weight. Select if (Activity_Location = 8 or Activity_Location = 56). Select if (Fish_Hunt_Type = 2). Select if (Response_Unit = 2). UNIANOVA Response BY Sex Activity_Location. Opens data Weights data CO huntersWY hunters Elk hunters Days of participation ANOVA Testing Hypothesis with Relational Entity GET File = 'C:\FHWAR\Hunting_Activity.sav'.Select if (Sub_Table_ID = 10). WEIGHT BY Sportsperson_Weight. Select if (Activity_Location = 8 or Activity_Location = 56). Select if (Fish_Hunt_Type = 2). Select if (Response_Unit = 2). UNIANOVA Response BY Sex Activity_Location.
Conclusions • Analyses that are difficult to perform with flat file data are possible with relational structure • Restructuring all of 2006 FHWAR data as well as data from 1991, 1996, & 2001 would: • Yield similar analysis capabilities • Allow for trend analysis • New practical opportunities for state agencies
Practical Opportunity • State agencies have accurate records of license sales (e.g., hunting only, fishing only, combos) • With potentially 100s of licenses, permits, & stamps sold, not practical to ask about specific licenses in a flat file • Moving to relational structure for obtaining license data has advantages …
Advantages of Relational License Data • Can ask about actual state license salesAll state license info can be “pre-stored” in one entitySize of entity would not impact other data entities • Questions about specific license cost not necessary; correct information pre-stored • Establishing relationship between state specific license sales & FHWAR dataprovides foundation for benchmarking / calibratingmeaningful estimates based on FHWAR
From Analysis to Data Collection • Entity based models: • facilitate analyses • can also enhance data collection • Currently working with software company Techneos (www.techenos.com) toimplement pilot models that yield: • more consistent and • accurate data collection