1 / 76

Find a chair. Any chair. Sit down. Relax.

Find a chair. Any chair. Sit down. Relax. WELCOME TO SADISTICS 101 ∑x/n-1. Actually this is Geography 161 Intro to Analytical Techniques ∑x2/n. And this is Lecture #1 on the course outline. The course outline is on the course website. WEBSITE ADDRESS.

pabla
Download Presentation

Find a chair. Any chair. Sit down. Relax.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Find a chair. Any chair. Sit down. Relax.

  2. WELCOMETOSADISTICS101∑x/n-1

  3. Actually this isGeography 161Intro to AnalyticalTechniques ∑x2/n

  4. And this is Lecture #1 on the course outline.

  5. The course outline is on the course website WEBSITE ADDRESS www.geography.ryerson.ca/coppack/geo161 THIS IS NOT A BLACKBOARD SITE ALERT!

  6. What the website looks like… www.geography.ryerson.ca/coppack/geo161

  7. www.geography.ryerson.ca/coppack/geo161 RYERSON UNIVERSITY Department of Geography GEO 161: INTRODUCTORY ANALYTICAL TECHNIQUES FALL 2014 (Also known as “Yippee! It’s statistics!”) Instructor:Dr. Philip Coppack Office:JOR 608 Phone: (416) 979-5000 ex. (I don’t respond well to phone calls but…) E-mail:pcoppack@arts.ryerson.ca (e-mails - within reason - I will answer) Office Hours: Posted - By chance or appointment.

  8. www.geography.ryerson.ca/coppack/geo161 COURSE DESCRIPTION - The fine print: • Welcome to Analytical Techniques I, a one semester professional course within the Geographic Analysis program. You’ll be happy to note that no familiarity with the fundamental elements of statistics is assumed (even though you started doing this in grade 4), though some keyboarding and operating systems experience with microcomputers is – which basically means that if you can log on and navigate Windows and have heard of Excel, you’ll be fine. The larger context for this course is the stuff you’re getting about geographic research in GEO 141. The current course, GEO 161, also sets the stage for GEO 361, inferential statistics, that you will have next term. My goal is to provide you with the fundamentals of data and information extraction, descriptive statistics, picturing your data, sampling distributions and exposes you to computer programmes commonly used in geographic research. While you will take courses in many different aspects of geography, your principle career path will be as a research analyst able to gather, order, and analyze data, extract information from those data and present your findings in a workplace environment using common computer software. Thus, this course provides the groundwork for all future courses you will take. My approach in this course is to ease the fears students usually have about “statistics” – or more to the point, numbers. My basic premise is that statistics can be considered as a set of related concepts driven by some fairly simple arithmetic. Knowledge of the mathematical derivations that underpin statistics is not required for this course. If you can add, subtract, divide, and multiply then you have all the numeracy skills you need. I agree

  9. www.geography.ryerson.ca/coppack/geo161 COURSE EVALUATION: • Lab Assignments (5x10%) 50% (see schedule below) In-class multiple choice quizzes (5x10%) 50% (see schedule below) THERE IS NO FINAL EXAM SO YOU CANNOTMISS QUIZZES • NOTE: Quizzes will be run in the first 40minutes of the lecture period and will be comprised of 40 multiple choice questions. • If you miss a quiz, you lose the grade for it – no exceptions including begging, otherwise you will be swamped. The labs are designed to be fairly short, self contained and most of the calculation work can be done during the lab session in which they are distributed and when I am around. It is expected that you will stay on schedule. If you don’t you’re going to fail – this is a challenging course! If you miss a lab, you lose the grade for it – no exceptions including begging, otherwise you will be swamped.

  10. REQUIRED TEXT • Miethe, Terance, and Jane Florence Gauthier (2008). Simple Statistics. Oxford University Press. • You should also use my course PowerPoint shows, available at: www.geography.ryerson.ca/coppack/geo161

  11. www.geography.ryerson.ca/coppack/geo161 RATING SCHEME FOR TOPIC DIFFICULTY = easy • = fairly easy • = moderate • = fairly difficult • = difficult It’s all relative, after all. You started stats in grade 4. For some of you, all of this will be easy. For others, none of it will be.

  12. Here’s where you look for your assignment handout and due dates. The colours of the cells refer to the topics covered by each assignment.

  13. Here’s when your first assignment is due. IN THE CLASSOF THAT WEEK.

  14. Here’s where you look for your quiz due dates. They will cover all topics since the last quiz.

  15. This is a challenging course, but you will most likely pass it if you: Attend lectures and labs. Do your readings. Ask questions in the lab or class or office if you don’t understand something. Hand in all the assignments. It is hard for me to give you anything but an ‘F’ if I have nothing to grade. Be good – don't cheat (working together is not cheating – handing in someone else’s work is.)

  16. Goal… Of the course: To teach you the basic toolkit of the research analyst – statistics. Of today: To alleviate your abject terror of having to deal with… numbers! ∑x/n-1 -1 1 0 x2 n 2 3.2

  17. What is Statistics?

  18. What Is (Are?) Statistics? Statistics (the discipline) is a way of reasoning plus a collection of tools and methods designed to help us understand the world. Statistics (plural) are particular calculations made from data. Data are values within a variable. Their purpose is to help provide information about research questions. Information allows us to answer a specific research question.

  19. Descriptive and Inferential Statistics Descriptive Statistics (this course). This type of statistics describes a dataset. They tell you only about that specific dataset. They are a crucial first step in analysis, often the only step. You cannot have good inference without good description.

  20. Descriptive and Inferential Statistics Inferential Statistics (GEO 361 next semester) This type of statistics allows you to infer from a sample dataset. They allow you to say things about a population using only a sample of that population. They require very rigorous rules about sampling and data distributions. They require that the margins of error and the confidence in the size of that error between the sample and the population be quantified.

  21. How Does Statistics Work?

  22. What is Statistics Really About? Statistics is really about measuring variation. First, we collect data that answers a research question: e.g. which political party is most popular. But measurements are always imperfect for reasons we’ll look at in a moment, so… Second, we try to estimate how imperfect the measurements are. That is, how far from reality is the picture the data are painting. These variations between reality and the statistical picture of reality are called the margins of error and we must quantify them using statistical tools. We must measure how big the error is and how sure we can be about that measurement.

  23. From where do the imperfections come? Imperfections come from two principle places: Measurement: They come from the fact that we live in a world about which we do not or cannot know everything, or they come from imperfections in the way we measure. Sampling: Most often we are dealing with data collected from a sample of the world and not from all of the world.

  24. #1: Incomplete/imperfect data & measurement The data we have are only a small part of what are actually out there. Some data we choose to count, most we do not. Some data we can count, some we cannot. Counting some things can make other things uncountable. We chose one spatial scale and ignore others. We chose one time period and ignore others.

  25. #2: Sampling – Why we do it. Essential part of statistics. Need to do it because: Population size may be large and thus too expensive to count in its entirety (e.g. Canada, the galaxy). Population may be too large to count at all (e.g. quantity of water in Great Lakes). Population size may be unknown (e.g. # of naked mole rats in the world). Collecting millions of samples would allow human error to creep in anyway.

  26. Differences between a population & a sample A sample is only a small part of the population, so: Chance of population and sample statistics (e.g. the mean and standard deviation) beingexactly the same is very small. Chance of population and sample statistics (mean and standard deviation) being close can be very high depending on size of sample. That’s how statistics works.

  27. Differences between a population & a sample But to make it work we need to know two things: What is the quantitative difference between the population and sample statistics? This is called the margin of error. and How surecan you be in those margins of error? This is called the confidence limit. These are explained as…

  28. Margins of Error & Confidence Limits Margin of Error This number tells you where the unknown population statistic’s value will lie in relation to your known sample statistic’s value. It is always in the same units as the data it measures and it is always a plus or minus value (hence “margins” of error). For example, assume that a sample of Toronto’s population tells you that Toronto’s average income is $50,000 ± $7,000. This means that the actual population average income is somewhere between $43,000 and $57,000. Note that you never know where it is exactly.

  29. Margins of Error & Confidence Limits Confidence Limits This number tells you how certain you can be that the margin of error you calculated is correct. It is always in percentage (most often quoted as an “alpha” value of .05 or .01) and the number is one you decide upon arbitrarily (though it is usually never less than 95% (.05) and is often 99% (.01) or more. For example, when you calculated the ± $7,000 margin of error you would have also added in a confidence limit as part of the formula (say 95%). This means that you can be 95% sure that the real population average income is somewhere between $43,000 and $57,000.

  30. Margins of Error & Confidence Limits Some Final Points Statisticians are always very conservative about their margins of error so they always use very high confidence limits such as 95% or 99%. But there is a trade off between being sure of the numbers you get: Rule #1: the higher the confidence interval you use, the larger the margins of error become. You can compensate for this by increasing your sample size, so another rule is: Rule #2: the larger the sample you use, the smaller the margins of error become.

  31. BUT… BECAUSE YOU ARE BASING YOUR STATISTICS ON A DATA SAMPLE, THE RESULTS YOU GET ARE ONLY AS GOOD AS THE DATA YOU COLLECTED, HENCE THE GIGO RULE… G ARBAGE I N G ARBAGE O UT

  32. Margins of error arise from... Where margins of error come from… Research question: do you think possession of a concealed firearm should merit a life sentence? Response: 70% yes 30% no Canadian population: 34,911,537 …is the population really representative of the population? … in time? … in space? Thus we would be 100% sure that 70% of Canadians agreed with the question and 30% did not. (Not quite as we will see in a minute, but for now…) …is the sample representative of the population? BUT the likelihood of the sample proportion response rate being exactly the same as the population proportion response rate is very small. That is why we have to have the margins of error calculations. …how sure can we be about the margins of error on the sample response rates? …is the sample data accurate, precise and truthful? Research question: do you think possession of a concealed firearm should merit a life sentence? Canadian population: 34,911,537 Response: 70% yes 30% no Sample of Canadian population: 2,500 …is the sample size large enough? Thus we could be ≥95% sure that 70% of Canadians ± ‘x’% agreed with the question and 30% ± ‘x’% did not.

  33. In Summary Statistics can be conceptually tricky but it is not numerically difficult. Statistics gives us a way to work with the variability in the world around us and answer research questions. Statistics is an essential part of your skill set as a research analyst and your life skills as an individual.

  34. What Isn’t Statistics?

  35. It is NOT mathematics – at least the way we will use it. It is… Arithmetic + Symbols + Concepts/logic … that will make you appear very smart.

  36. Consider the following… 1+2+3+4+5 = ? 5

  37. Stats as Arithmetic 1+2+3+4+5 = ? 5 15 = 3 5 You add. You divide. You get your answer. How easy is that? No surprises here you did this in grade 4 remember?

  38. Stats as Concept 1+2+3+4+5 = ? 5 This is a concept. It says that you can take a series of numbers (data values) and calculate a single number that represents them all. This type of statistic is called a measure of central tendency.

  39. Stats as Symbols(don’t panic – it’s only Greek & Latin) They simplify, standardise, and generalise the representation of long forms of arithmetic so as to: • take up less room • mean the same thing to everyone • be useful for all series of numbers And conventionally the symbols used are…

  40. Accepted Convention For sample statistics the Latin alphabet is mostly used – e.g.: a, b, c, x, n, ∑ , √ For population statistics the Greek alphabet is mostly used – e.g.: α, β, λ,δ, ∑ , √ Note that science and engineering may use the same letters to mean different things.

  41. For example: 1+2+3+4+5 = ? 5 What do you have here? 1,2,3,4,5, :are individual data values ++++,___,= :are arithmetic operators 5 :is the number of cases ? :is the answer

  42. So what’s it saying? If you want a number that best represents the data values in the dataset then you add up the data values and divide the sum of all data values by the number of data values. But to shorten this long winded way of stating the operation we can…

  43. formula               . Use a shorthand set of symbols thus: Where: : the arithmetic mean (‘ex’ bar) – your answer ∑: the sum of all x’s x: a value in the dataset n: the number of cases in the dataset So 1/n = 1/5 = 0.2 Sum of all x’s = 15 0.2 * 15 = 3 This is the most common expression for the arithmetic mean of a dataset.

  44. …and you can get the tee shirt. But we will use an easier one…

  45. Use the same shorthand set of symbols thus: where: : the arithmetic mean (‘ex’ bar) – your answer ∑: the sum of all x’s x: a value in the dataset n: the number of cases in the dataset This is the arithmetic mean of a dataset…

  46. Then there’s the standard deviation Wow! This is impressive! In words… The standard deviation of a dataset is equal to the square root of the sum of the squared differences between each data value ( x) and the mean ( ) of the dataset, divided by the number of data values in the dataset ( n ) minus 1. Phew! So much easier with the formula. It measures deviation among the data values – that is, how much on average each data value varies from the arithmetic mean – but lots more on that later.

More Related