520 likes | 1.77k Views
Statistical Software Packages:. How do I get this into that ? Gillian Byrne Memorial University of Newfoundland. The Basics. Data is often available in flat ASCII text files. Data Definition Files. Statistical software programs need to know what to do with the data.
E N D
Statistical Software Packages: How do I get this into that? Gillian Byrne Memorial University of Newfoundland
The Basics • Data is often available in flat ASCII text files
Data Definition Files • Statistical software programs need to know what to do with the data. • Data Definition Files “explain” the text file to the software program • For example a data definition file can format the pile of numbers into cases and variables, provide variable labels, define missing cases, and more • Data definition files differ between software packages
SPSS Syntax File Location of the data Variables in the data file Variable labels (as seen in the SPSS Variable View)
Value labels assign descriptions to the values of variables Missing values for each variable
Data Definition Files and the Codebook • Where do the data definition files derive from? • …the Codebook!
Other Statistical Software Packages SAS • Geared towards power users: one of the most powerful statistical packages, but also has the steepest learning curve • Relies more on programming rather than a point-and-click. interface
Other Statistical Software Packages Stata • Combination of command language and point and click interface • Used by economics departments and other social science disciplines • Known for its strong graphing capabilities
Other Statistical Software Packages Shazam • Canadian product • used widely in economics/econometrics • Not as powerful as other statistical programs • Runs on DOS, Windows, Mac, Unix platforms
Other Statistical Software Packages MS Excel • Not a dependable statistical package, but… • Widely available • Easy to understand & use
Tips for Successful Interoperability • Data definition files • By far the easiest way to format raw data • SPSS, SAS, and STATA data definition files (with commenting!) are available in IDLS • Troubleshooting tips: • Ensure you correctly identify the file path to the data • Make sure that commands don’t include breaks (carriage returns) • Check to make sure the correct symbol is used to separate commands (in SPSS it’s a period, in SAS & STATA a semi-colon)
Tips for Successful Interoperability • Comma-Separated Values (csv) files: • Text files (with the extension .csv) with commas separating the data • Often csv files imported into statistical software will require tweaking (variable labels, layout, etc.) • csv files can be imported by most programs: • SPSS, SAS, Stata, Excel • csv files are available in ESTAT and CANSIM II through CHASS • b2020 files can also be converted to csv for use in another program
File Input Chart Adapted from: http://www.chass.utoronto.ca/datalib/caq/format.htm
Conversion Software • Conversion software allows you to seamlessly transport data from one statistical program to another • STAT/Transfer • Supports over 30 software programs, including SAS, SPSS and Stata • Approx. $150 USD for single user license • DBMS/Copy • Supports over 80 software programs, including databases and spreadsheets • Approx. $500 USD for single user
Roundup • There are a proliferation of statistical software packages, all of them with different strengths and weaknesses • Concentrate on getting the data into the software – often users can take it from there • CANSIM II at CHASS, ESTAT, IDLS, and the DLI website all offer different file type options – it can be worthwhile checking different sources to find the file type you’re looking for