450 likes | 496 Views
Uexplore/Dexter Tutorial. Part 2 More Advanced Topics And Exercises Rev. 5-16-07, jgb. Generate a Report Showing …. Population change (estimated) for Missouri cities (places) from 2000 to 2004. Show change and pct change sorted by change in population, descending.
E N D
Uexplore/Dexter Tutorial Part 2 More Advanced Topics And Exercises Rev. 5-16-07, jgb
Generate a Report Showing … • Population change (estimated) for Missouri cities (places) from 2000 to 2004. • Show change and pct change sorted by change in population, descending. • Only show cities that had growth of at least 100 people and 5% over the period. • Use variable labels (rather than names) as column headings in the report.
Navigate to popests Filetype • Pop Estimates is a major category with (currently) 4 filetypes. By far the most important of these is popests. • Choose the Current version of popests (as opposed to the older estimates in popests2).
Finding the Relevant Dataset • The Census Bureau does estimates at various levels, including nation, state, county and “subcounty”. • The latter includes cities (“places”) and other sub-county governmental units. • Find the relevant dataset by scanning the Datasets.html page in the popests dir.
Determine SumLev Value for Filter • Turns out that this set has some non subcounty data as well (i.e. state & county summaries). • We want complete places – level 162. • Level 157 would give us place-within-county.
Create the Filter • First row selects complete-place summaries. • 2nd row says only places with growth of 100 or more persons. • 3rd row says the change must be at least 5%.
Dexter Query Files • Are simple text files used to encapsulate a query (i.e. save all the specs so that the query can be rerun.) • Written to a temporary file which goes away within 48 hrs. • For now, most users cannot replay a saved query. • But authorized MCDC personnel can use these to create public queries.
Invoking a Saved Query • Can be done via a URL with a parm spec as we see in the current example. • The name of a stored query can also be entered on the Dexter query form near the bottom of the page. • For techies who care: Dexter looks for the file named &query.txt in the Queries subdirectory of the &path data directory.
Saved Queries • Are relatively new to Dexter and not yet fully implemented. • Have good potential for creating “virtual” data products. You save the query file that generates the report (and/or csv file) rather than the files. • We are experimenting with adding “run-time parms” and generating query front-ends to allow customizing the query. • For example, think of turning the growing cities query into one where you could specify the state rather than having it always be Missouri.
Saved Queries and xsamples • We are experimenting with a new kind of documentation for using Dexter. • Sample dexter queries are documented and stored in shtml files in an xsamples dir. • These sample pages include links to let you view and/or invoke the saved query file. • See these at http://mcdc.missouri.edu/xsamples/
V. Advanced Options • A new section on the Dexter input form, targeted at more sophisticated user who wants more control over output. • As with all Dexter sections, click on section header to see online documentation. • Easily ignored. Pretend it’s not there if you want. • Many of the new features are things we want to do when we build public queries.
Advanced Features Example • We will use several advanced features here including data aggregation. • The dataset we shall access is the latest (thru 2004) county level estimates with components of change since 2000 for the entire country. • We have added cbsa (metropolitan and micropolitan area) codes to this dataset.
Query Specifications • We have a dataset that has county-level data but also has CBSA (core based statistical area) codes identifying the metro area. • We want to aggregate (sum up) the pop data to get cbsa-level summaries. • To further complicate matters, we have a variable, cbsatype, that tells us whether it is a Metro or Micro (-politan) area. We not only want summaries for each cbsa within state, but we also want a summary for all the cbsa’s of a type (metro/micro) within each state. • The states of interest are Illinois, Kansas and Mo. • We want HTML output in a custom style.
Define Filter • County level summaries only. • Code of 99999 indicates not in a CBSA; exclude. • Choose states using the postal abbreviations.
Aggregation Specs • Aggby: stab cbsatype cbsa indicates that you want Dexter to combine all rows that have the same value for these 3 category variables, summing all the numeric variables from these rows. • Agglvl: 2 indicates that you want summaries for the rightmost 2 aggby variables. A summary will be generated for all stab/cbsatype combinations, regardless of the value of cbsa.
More Aggregation Specs • Grand Totals? – No means you do not want Dexter to add a summary row at the end of the file with totals for all rows in the entire dataset. • Means or Percents – you specify here cols. that cannot be just summed up. They have to be specially processed using something called a weighted average.
Aggregation Specs 3 • Weights for Means/Pcts – this is a list of columns to correspond with the list specified just above for Means or Percents. • We are saying here that we want the program to weight the value of pctchang using the value of pop00c. Each pctchang value is multiplied by pop00c (“weighted”) prior to aggregation. • During the agg step the weighted values are summed. • In a post-agg step the sum of the weighted values is divided by the sum of the weights.
Just Know That … • Whenever you are aggregating and you have a column/variable that is a percentage, you need to specify it in your Means or Percents list, and the corresponding col/var to use in the “Weights for ..” list is a variable containing the value of which it is the percentage. (We call this the “universe variable”). • E.g. if the variable is PctAsian (Asians as a pct of total persons) then the weight variable is TotPop – the total persons.
If you misspell a Variable Name • When entering variable names in any of the boxes in Section V be extra careful to spell the name exactly. • Also be certain that you select the variable in Section III – you cannot aggregated by State if you have not selected State as 1 or the variables to keep. • You will not get a specific error message about this but instead it will just say that ther were no observations selected.
Variables to Drop • Not important (99% of the time). • The program generates 2 extra variables, named _lvl_ and _nag_ , that occasionally may be useful. • _lvl_ indicates the summary level (in our example it would have value 1 or 2). • _nag_ keeps a count of how many rows/observations were used to form the output summary row/observation.
Advanced Report Formatting • We check the option to use variable labels as column headers in the report. Not very advanced, but it did not fit elsewhere. • By variables for report allows specifying one of more variables that are listed on a separate “by line” instead of as a column. • ID variables for report (not specified here) are variables listed at the far left of each row to identify the observation (instead of having “Obs”, the observation #, used.)
Style to Use for html/pdf Output • You get to pick from a menu of 14 or so. Those followed by ** are recommended. • The default (sasweb) is minimal blue & white. • In this example we chose brick, one of our favorites. • Names are not very mnemonic; you just have to try them to see what they look like.
Exercise 1 • Access 2000 census long-form (sample) data for census tracts in state of Nebraska. • Create a csv file where each record corresponds to a census tract and the variables/columns tell us what metro area and county the tract is in and reports the total population and the number and percent of persons who were poor.
Exercise 2 • Access filetype stf903x2 (under 1990 census data). • Create a file (sas, dbf, or Excel – whichever you prefer) that has the number of hispanics and pct hispanics for all census tracts in Greene and Christian counties (MO). Use 2000 tract geography. • Do similar query using filetype sf32000x to get comparable data for 2000. • (Not part of exercise, but we hope you will have tools to merge these 2 results).
Exercises 3 • Access the beareis filetype. • Pull data for all counties in the new Jefferson City metro area (CBSA). • The id variables for your output should be county, LineCd and LineCdMeaning. • The numeric variables here are a time series; select data for 1993 and 2003. • In the Advanced Report Formatting section specify county as a by variable and linecd as an ID variable. Select electronics as the style.
Exercise 4 • In the beareis filetype we have datasets that report on total transfer payments. Locate the dataset with this data for Mo. • Print a report showing total transfer payments for the state of Missouri for each of the most recently-available 10 years. • Extra credit option: specify that you want all the trf variables displayed using a dollar12. format.
Exercise 5 • Navigate to the filetype georefunder the Geography/GIS major category. • Access the mocogeos dataset. (Missouri county geocodes). • Do a plain text report showing all counties in Mo, their FIPS codes & names along with the DED-Region, RPC and dot (MoDOT) region in which they are contained. • Extra credit opt: the format code $rpcname. Can be used to display names for rpc codes. Specify that you want to see rpc names instead of codes in your report.
Exercise 6 • Find the 25 wealthiest counties in the United States per the 2000 census, using Median Household Income as the measure of wealth. • Print them out in descending order (highest income first) – just the top 25. • Hint: run Dexter twice, the 2nd time applying a filter based on medhhinc.
Exercise 7 • Filetype sf12000x contains the standard extract of data from Summary File 1, 2000 census. • As a bonus, we added the 1990 pop for the corresponding geography to this otherwise all-2k dataset. • Access the block level dataset for Mo and generate a tab-delimited file showing the 1990 & 2000 pops along with change & % change for all blocks in Adair county.
Exercise 8 • The MCDC has done many custom geographic aggregations of 2k census data -- including data for school districts and school districts/counties. • In filetype sf32000x, access dataset moschlcos. • Print a report showing total pop, rural pop and % rural pop for all the districts within St. Charles county.
Want More Exercises? • Take the MCDC Trivia challenge at http://mcdc.missouri.edu/trivia/popests1.shtml • Has ten rather challenging exercises, all involving data in the popests filetype (current population estimates). • Answers included (sort of).
Thank You As usual, questions and comments are encouraged. E-mail preferred: blodgettj@missouri.edu