1 / 41

Data management Methods & Software

Data management Methods & Software. PEER Session 02/04/15. Data Management. Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g., Excel)

zimmerman
Download Presentation

Data management Methods & Software

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data management Methods & Software PEER Session 02/04/15

  2. Data Management • Multiple good data management software options exist – quantitative (e.g., SPSS), qualitative (e.g, atlas.ti), mixed (e.g., Excel) • Goal is to convert any research data from its ‘raw’ form to a form that can be readily analyzed

  3. Hard Way -- Easy Way in Excel • You can fight Excel • By Formatting your spreadsheets in a traditional report style • You can work together with Excel • By Formatting your spreadsheets in a way Excel prefers

  4. Traditional Way

  5. The Way Excel Likes Things

  6. The Paste Function Provides Numerous Statistical Operations

  7. The Statistical Function Category

  8. Data Analysis functions

  9. The Excel Rules • One value per cell • One type of data in each column • No blank rows • One row of column headings • Be consistent • Windhoek, WHK, WNDHK, Whoek, The Hoek… • Avoid separating similar data across tabs

  10. Features of SPSS • Originally developed for the people in social science areas; no heavy programming background required • Designed as user-friendly and has pull-down menus to execute statistical commands • Ability to do data management & manipulations • Ability to store programs & produce reports/graphs

  11. SPSS Data Flow Outside Data Source Importing SPSS Data File Data Modification/ Transformation Data Analysis Raw Data Direct Entry Pull-Down Menu OR Syntax Menu (Data Steps) (Analysis Steps)

  12. Data View Window - Data Entry Site(Columns=Variables, Rows=Cases) Help Menu Pull-down Menu bar Tool bar Information bar Title bar Variable Names Data View window Active cell Action bar

  13. Variable View WindowData Definition Site 64 Characters Max, No space Between Beg letter, @, #, or $ Numeric, String, & Others Length # of Decimals Variable Description Value Code Description Missing value Description Click here to see this view

  14. Before we see Examples… OK Paste VS. buttons <Output File> 1. OK - results/action will be executed

  15. Hit Paste to obtain • Syntax Window 2. Run Syntax to obtain the results in the Output Window <Syntax File>

  16. Raw Data Subject 1 Subject # (1) Female (1) Intensive (1) Reading (90) Math (67) Subject 2 Subject # (2) Female (1) Moderate (2) Reading (72) Math (46) Subject 3 Subject # (3) Male (0) Basic (3) Reading (41) Math (73) Example - School Data

  17. School DataVariable View Variable View Activated

  18. School DataCompleted Dataset – Data View

  19. School DataCompleted Dataset – Variable View

  20. Importing Excel Data file to SPSS • Open the SPSS Data file 2. Go to File Menu 3. Click “Read Text Data” 4. Click Files of type to Excel & choose Excel file 5. Hit Open 6. Check Worksheet #, Variable on the 1st row, & Hit OK

  21. School DataCompleted Dataset – Data View

  22. How to enter data in SPSS 1.1 Introduction of SPSS 1.2 Data Entry 1.3 Data Cleaning using SPSS

  23. 1.2 Data Entry into SPSS There are 2 ways to enter data into SPSS: 1. Directly enter in to SPSS by typing in Data View 2. Enter into other database software such as Excel then import into SPSS Let’s start with the second option, using data in Excel.

  24. Figure 1. Data from Hell

  25. Data from Heaven

  26. General guidelines for data entry 1. Give each variable a valid name (8 characters or less with no spaces or punctuation, beginning with a letter not a numeric number). Short, easy to remember word names. Avoid the following variable names: TEST, ALL, BY, EQ, GE, GT, LE, LT, NE, NOT, OR, TO, WITH. These are used in the SPSS syntax and if they were permitted, the software would not be able to distinguish between a command and a variable. Each variable name must be unique; duplication is not allowed. Variable names are not case sensitive. The names NEWVAR, NewVar, and newvar are all considered identical. 2. Encode categorical variables. Convert letters and words to numbers. 3. Avoid mixing symbols with data. Convert them to numbers. 4. Give each case a unique, sequential case number (ID). Place this ID number in the first column on the left

  27. 5. Each variable should be in its own column. Change to: Animal Group 1 0 2 0 3 1 4 1 Avoid this: Animal Control1 Control2 Experiment1 Experiment2 * Do not combine variables in one column * It is recommended to use 0/1 for 2 groups with 0 as a reference group. 6. All data for a project should be in one spreadsheet. Do not include graphs or summary statistics in the spreadsheet.

  28. 7. Each case should be entered on a single line or row. Do not copy a patient’s information to another row to perform subgroup analysis. 8. However when data are repeatedly collected over a patient, it’s recommended to have patient-day observation on a simple line to ease data management. SPSS has a nice feature to convert from the longitudinal format to horizontal format. When the number of repeats are few 2 or 3, horizontal format may be preferred for simplicity. Longitudinal data entry Horizontal data entry Date ID SYSBP 1/2/2005 1 130 1/3/2005 1 120 1/4/2005 1 120 3/1/2005 2 110 3/2/2005 2 140 ID SYSBP1 SYSBP2 SYSBP3 1 130 120 120 2 110 140

  29. 9. For yes/no questions, enter “0” for no and “1” for yes. Do not leave blanks for no. Do not enter “?”, “*”, or “NA” for missing data because this indicates to the statistical program than the variable is a string variable. String variables cannot be used for any arithmetic computation. 10. Put ordinal variables into one column if they are mutually exclusive. Preferred: Pain 1 2 3 Avoid: Pain Mild Moderate Severe 1 0 0 0 1 0 0 0 1 11. Do not make columns wider then 8 characters, unless absolutely essential.

  30. Importing data from Excel spreadsheet into SPSS. In SPSS, go to: File, Open, Data Select Type of file (for example, Excel) you want to open Select File name you want to open

  31. Importing data from SPSS to Excel. In SPSS, go to: Data, Save as, Select Type of file (for example, Excel) you want to save into Give File name you want to save into

  32. 1.3 Data Cleaning in SPSS 1. Re-coding existing variables 2. Creating new variables 3. Creating new variable from existing variables 4. Data labeling and formatting

  33. Data cleaning in SPSS (1): Recoding existing variables (1) We want to use numeric coding for group instead of A and B. Old New ID Group Group 1 A 0 2 A 0 3 B 1 4 B 1

  34. Data cleaning in SPSS (2): Recoding existing variables (2) From SPSS dialog box, go to: Transform Recode Into Same variables

  35. Data cleaning in SPSS (1): Recoding existing variables (3) 1. Select Group from the variable box into String Variables box 2. Click on Old and new Values to proceed

  36. Data cleaning in SPSS (1): Recoding existing variables (4) 1. Type the old value and the new value you want to convert into 2. Click on Add (To remove, or change, click on Change or Remove) 3. Type all values in the Old  New box, then click Continue 4. Click OK to execute the commands.

  37. Data Cleaning in SPSS (3) Computing patient’s age from birthday and date enrolled into the study.

  38. Data Cleaning in SPSS (4): Data labeling and formatting (2) Data Labeling

  39. Value Code Information

  40. Key Concepts • Run frequencies and descriptives to get the ‘lay’ of the data • Ensure all values are in bounds and variables are valid • Conduct descriptive analyses • Univariate • Bivariate • Multivariate • Conduct testing for differences (t-test, ANOVA, etc)

More Related