430 likes | 437 Views
Explore statistical tools & techniques with SDA@CHASS for microdata analysis, access codebooks, variable searching, and more in English & French. Learn how to filter data for detailed insights.
E N D
DLI Boot Camp 2011Finding Statistics: Tools and Techniques SDA Jean Blackburn Vancouver Island University Library jean.blackburn@viu.ca
SDA@CHASS: Microdata Analysis & Subsetting • University of Toronto CHASS Data Centre runs a web-based statistical package called SDA (Survey Documentation & Analysis) developed at UC Berkeley • The SDA@CHASS service links SDA to microdata files from Statistics Canada surveys and more. • IP authenticated access to SDA@CHASS is available to DLI institutions for an annual fee
Key Features • Variable-level searching • Extraction / subsetting capabilities • Access to codebooks • Web-based statistical analysis capabilities • Ability to recode and compute new variables
There are both French and English language versions of SDA@CHASS, but the French language data catalogue is limited as of the time of this presentation…
The SDA@CHASS English language data catalogue contains not only DLI microdata sets, but also open-access data and those restricted to University of Toronto users.
You can specify a specific data set or cluster of data sets to search, and whether to search survey OR variable-level metadata (but not both) Click the + to expand the data file clusters and select specific data sets to search
The View button shows you the codebook definition for the variable; “Study” titles link to the SDA interface...
…the SDA interface (we’ll come back to it in more detail later).
As well as searching, you can browse the SDA@CHASS data catalogues…
The French language data catalogue has 3 survey titles available; there are several data sets available for the Recensement de la population de Canada.
You can access the codebook for this data set from the menu bar…
Use the Variable Selection tool to browse variables in the data set by category.
Expand variable categories by clicking the + Click on variables to select them
You can view the codebook definitions for selected variables by clicking the View button beside the “Selected” field.
Codebook definition for “immstat” variable Codebooks include numeric values and labels for responses, record layouts, and unweighted frequencies.
Follow the “Search Techniques Help” link for tips on wildcards and field searching. N.B. The SDA help files are English language only. Click the variable button to select the variable for manipulation in SDA. For example, we can use SDA to get a weighted frequency distribution for the “immstat” variable…
To run a weighted frequency for the “Statut d’immigrant” variable (“immstat”)… 1. Copy the selected variable “immstat” to the Row field in the Frequencies/Crosstab analysis tool, using the “Row” button… 2. Adjust table and display options as desired. SDA defaults to weighted cases. I have selected a pie chart type. 3. Click the “Run the Table” button.
The weighted frequency distribution table and chart will open in a new browser window or tab…
Charts can be saved as image files to insert into Word documents, etc.
What if we want to see immigration status… • for a particular province? • for people of a particular age? • for men and women separately? SDA allows you to apply filter and control variables to your analysis.
To run a weighted frequency for the “Statut d’immigrant” variable (“immstat”), for British Columbia only, we’ll use the Province variable (“pr”) as a filter… Select the “pr” variable and click the View button to see the codebook definition… (Note that the “immstat” variable remains in the Frequencies/ Crosstab program “Row” field. )
The codebook definition for the Province variable (“pr”) tells us that the value for British Columbia is 59…
Click the Filter button to move the “pr” variable to the Selection Filter(s) field in the SDA Frequencies/Crosstab Program Enter the value for British Columbia, 59, within the Selection Filter parentheses, and run the table…
Filtering for British Columbia results in a different distribution of immigrant status. Let’s try filtering further – for children under 15 years…
Select the Age Group variable (“agegrp”) and click the View button to see the codebook definition… In the codebook, values 1 to 5 represent children under 15 years…
Because we want to filter for both Province and Age Group, it’s important to select “Append” rather than “Replace” before clicking the Filter button to move the variable to the Selection Filter(s) field. For the agegrp filter, we need to enter a range of values representing children under 15 years (1-5).
Here’s the distribution for immigrant status, filtered for province and age. Now let’s remove the age group filter, and instead compare the immigrant status distribution in British Columbia for men and women…
To see distributions for all values of a particular variable (e.g. sex), select the variable and use the Ctrl button to move it to the Control field in the SDA Frequencies/Crosstab program, and run the table…
Controlling for sex, we get three different frequency distributions: one for women, one for men and one for all valid cases. There is not much variation between these distributions!
Crosstabs in SDA To look for a relationship between 2 variable frequency distributions, you can run a crosstabulation in SDA by copying the dependent variable to the Row field and the independent variable to the Column field and running a table. Let’s use the 2006 Aboriginal Peoples Survey: Adults to see whether living in urban or rural environments affects ability to speak an Aboriginal language…
Ability to speak an Aboriginal language (“bg01”) is the dependent variable and gets copied to the Row field. Geography (“geo”) is the independent variable and gets copied to the Column field. I’ve chosen the Stacked Bar Chart chart type.
The crosstabulation table and chart show that Aboriginal peoples living in Census Metropolitan Areas are less likely to speak an Aboriginal language than those living in rural areas. The stacked bar chart effectively portrays the dramatic difference between the Arctic and the other Geography values.
Other analysis methods are available in SDA, under the Analysis menu
In SDA, you can download custom data sets with the Download Customized Subset command
Specify the data and syntax (e.g. SPSS) file formats you want… Add filtering criteria for cases to include, if desired… Select All, Some or None of the variable categories to be included in your custom data set. If you select Some for any category, you’ll be able to select specific variables on the next screen, after clicking the Continue button…
Hold down the Shift or Ctrl key to select contiguous or non-contiguous variables from the lists provided, and click Continue… Check over your selections and click the “Create the Files” button…
Right-click the links to the files to download them to your local computer or network!