310 likes | 394 Views
Class questionnaire 1 sent to your email this morning Fill it out after this class please!. Learn to collect valid data on a topic you want to research Learn to analyse with some statistics software and make conclusions based on your data, and predict behaviour validly
E N D
Class questionnaire 1 sent to your email this morning • Fill it out after this class please!
Learn to collect valid data on a topic you want to research • Learn to analyse with some statistics software and make conclusions based on your data, and predict behaviour validly • Learn to communicate what the data is showing you and how it was collected, both in pictures and in words • Learn how to recognise bad data, bad analysis methods or bad reporting of results in publications
Some basic terminology and concepts for describing data sets • A small bit of maths • A gentle introduction to Statisics
A Random Sample is a sample selected so that each different possible sample of the desired size has an equal chance of being the one chosen. • This implies that each member of the original population has an equal chance of being selected in any random sample.
Descriptive statistics describes a set of data • Inferential statistics seeks to make a decision or prediction based on the data. • Lets take a look at an example.
Number of graduates in ITU 2005 2006 2007 Bachelor Graduates 0 0 0 MSc Graduates 265 208 295 Master and Diploma graduates 29 18 25 PhD graduates 7 12 8 TOTAL 301 238 328
So We have a bunch of numbers, where do I start? How does a statistician look at these numbers? • What kind of values do we take from datasets for working with statistical measures? • What do I do with all these numbers? • What do those values actually mean? • Why do I do that?
Means, medians, and modes. • The mean (average) is the sum of cases divided by the number of cases. • The median is the middle number if all the numbers are arranged in order of value. • The mode is the most frequently occurring number. If no number appears twice, or if several numbers appear equally frequently, there is no mode.
Calculate the mean, mode, and median of the numbers: • 5, 7, 2, 11, 18, 20, 2, 2, 7, 5, 8, 13, 16, 9, 10. ...........to easy? Do it in your head!
2, 2, 2, 5, 5, 7, 7, 8, 9, 10, 11, 13, 16, 18, 20. • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. • Median lies in the centre = 8 • The mean = The sum of values divided by the number of values • 135 /15=9 • The mode is 2.
Variable: A characteristic or property of an individual unit in the population. • Representative Sample: A selection of data chosen from the target population which exhibits characteristics typical of the population. A Representative sample should be unbiased. • The most frequent sampling method used to ensure a representative and unbiased sample is Random Sampling
A Random Sample is a sample selected so that each different possible sample of the desired size has an equal chance of being the one chosen. • This implies that each member of the original population has an equal chance of being selected in any random sample.
Random - means by chance, right? • Mathematically, this means it has a certain probability distribution, defined by a probability density function. • But yes, it means by chance, that every possible measurement value has an equal chance of occurring.
3 groups; small, medium, large. • Flip a coin 20 times each • Record each head and tail pair as 2H, 2T, or 1H1T • Sum for group, describe mean, median and mode number of head and tails.
Perfect randomness never happens in any specific set of data • But it is more like the PDF expectations in large amounts of data, or populations • Populations of data are the entire existing values.
Because we don’t know if the differences between the PDF and the dataset is due to some random factor, or if these differences (or variance) in the data is systematic. • Because we don’t know if we can expect the whole population of data to look like our sample, i.e. We don’t know if we can generalise our results.
If data is sampled, collected and analysed correctly, we expect that the results can be generalised to the population. • We expect that we can make predictions • So, we need statistical analysis to tell us whether everyone will have more energy to work after taking the pill, or if it was just some random or uncontrolled factors in our sample which made it look that way • (e.g. The company manufacturing the pill tested it on their own staff)