230 likes | 259 Views
Social Statistics. S519: Evaluation of Information Systems. Social Statistics. Statistics describes a set of tools and techniques for describing, organizing and interpreting information or data. Do we need statistics? When and Why?. Why we need statistics.
E N D
Social Statistics S519: Evaluation of Information Systems
Social Statistics • Statistics describes a set of tools and techniques for describing, organizing and interpreting information or data. • Do we need statistics? When and Why?
Why we need statistics • Everybody relies on data in one way or another: • corporate presidents decide company policy based on quarterly sales figures • politicians decide on campaign strategy based on polls • teachers decide grading curves based on a bell curve • you and I decide whether to smoke or not based on health records of other people • Therefore, we need a comprehensive and understandable way to deal with data: • Statistics is the study of making sense of data.
Descriptive statistics • Used to organize and describe the characteristics of a collection of data
Descriptive statistics • How can you describe this table?
Inferential statistics • Make inferences from a smaller group of data to a possible larger one • Sample: a smaller group of data • Population: the whole group of a certain subject
Population & Sample • population • the set of all photographs of Mars • the set of heights of people in the US Army • the set of all measurements of water quality taking from the Hudson river • the set of all problems that can be solved using statistics. • sample • the pictures selected from a specific region of Mars • the heights of people in a particular division of the US Army • the set of water measurements of the Hudson River taken on 7/24/2009 • the statistical problems we are solving in this class
Steps for statistical analysis • Problem definition what is the population of interest, and what are the variables that are to be investigated • Data collection describe and select the sample from the population • Data analysis make some statistical inferences from the sample about the population • Analysis Reporting report the inference together with a measure of reliability for the inference where we use the term variable to mean a characteristic or property of an individual population where the observations can vary.
An example • Example: A tax auditor is responsible for 25,000 accounts. How many accounts are in error? • Defining the problem: The entire population consists of all 25,000 accounts. Our goal is to obtain a reasonable estimate for the number of accounts that are, in all likelihood, in error. Our variable x counts whether an account is in error. • Data collection and summary: The auditor decides to select 2000 accounts at random, tests each of these, and finds that 84 of them are in error. • Data analysis: In this case, the likely theory involves computing 84/2000 = 4.2%. • Analysis reporting: Based on our data analysis we infer that approximately 4.2% of the accounts will be in error.
Tools • Excel • Excel Toolpak • SPSS/PASW
Excel Toolpak (1) • Click the Microsoft Office Button , and then click Excel Options. • Click Add-Ins, and then in the Manage box, select Excel Add-ins. • Click Go. • In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. • If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. • After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab.
Excel Toolpak (2) • Powerful, reliable, accessible, easy, and free
Formula How does it work in Excel?
Basics of a Spreadsheet • So let's get started digging into what makes a spreadsheet work. Spreadsheets are made up of: • columns • Rows • cells • In each cell there may be the following types of data: • text (labels) • number data (constants) • formulas (mathematical equations)
Types of Data ALL formulas MUST begin with an equal sign (=).
Formulas – SUM • The Sum function takes all of the values in each of the specified cells and totals their values. The syntax is: =SUM(first value, second value, etc)
Formulas – AVERAGE • The average function finds the average of the specified data. The syntax is as follows =Average(first value, second value, etc.)
Formulas – MAX/MIN • MAX: This will return the largest (max) value in the selected range of cells. • MIN: This will return the smallest (Min) value in the selected range of cells.
Formulas – COUNT • This will return the number of entries (actually counts each cell that contains number data) in the selected range of cells.