Statistics – The Science of Data Descriptive Statistics Summarises and displays information from a dataset Inferential S
0 likes | 130 Views
Statistics – The Science of Data Descriptive Statistics Summarises and displays information from a dataset Inferential Statistics Uses samples data to make decisions or predictions about a larger population of data
Statistics – The Science of Data Descriptive Statistics Summarises and displays information from a dataset Inferential S
An Image/Link below is provided (as is) to download presentationDownload Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.Content is provided to you AS IS for your information and personal use only. Download presentation by click this link.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.During download, if you can't get a presentation, the file might be deleted by the publisher.
E N D
Presentation Transcript
Statistics – The Science of Data Descriptive Statistics Summarises and displays information from a dataset Inferential Statistics Uses samples data to make decisions or predictions about a larger population of data Population: The entire collection of individuals or objects about which information is desired Sample: A part (subset) of the population selected in some prescribed manner.
Collecting data involves selecting a sampling method, designing an experiment or questionnaire/survey, selecting who collects the data and what they know about the aims of the research, where, when and how.
Confounding factors! Women don’t speed as much as men in their cars What possible confounding factors can you think of? Women statistically more likely to have smaller cars? More men actually driving means more speeders?
Graphs Central Tendency
The Range The smallest score subtracted from the largest Example Number of friends of 11 Facebook users. 22, 40, 53, 57, 93, 98, 103, 108, 116, 121, 252 Range = 252 – 22 = 230 Very biased by outliers
Quartiles The three values that split the sorted data into four equal parts. Second Quartile = median. Lower quartile = median of lower half of the data Upper quartile = median of upper half of the data
...available online
Think of this as the average distance from the mean, EXCEPT!!! We actually measure the average squared distance from the mean, for technical reasons. Imagine any data in the set as xi The mean of the set is x The squared distance of the point to the mean is (Xi – x) 2
An example of a (online) probability calculator http://davidmlane.com/hyperstat/z_table.html
A Bimodal Distribution
One way to think about variability is in terms of the spread of data. Are all the values close to the mean or average, or are they more spread out across a wide range? We could look at the spread through percentiles E.g. The average score of the top 10% Or we could look at how far, on average, the cases differ from the average score. This is called the standard deviation.
However, for technical reasons, we use n-1 instead of n in the denominator (the bottom part of the fraction) to give , to define the sample variance.
To convert back into the same units as the dataset, we take the square root of the previous number as Standard Deviation.
Types of Data Analysis
Theories An hypothesized general principle or set of principles that explain known findings about a topic and from which new hypotheses can be generated. Hypothesis A prediction from a theory. E.g. the number of people turning up for a Big Brother audition that have narcissistic personality disorder will be higher than the general level (1%) in the population. Falsification The act of disproving a theory or hypothesis.
Cause and Effect (Hume, 1748) Cause and effect must occur close together in time (contiguity); The cause must occur before an effect does; The effect should never occur without the presence of the cause. Confounding variables: the ‘Tertium Quid’ A variable (that we may or may not have measured) other than the predictor variables that potentially affects an outcome variable. E.g. The relationship between breast implants and suicide is confounded by self esteem. Ruling out confounds (Mill, 1865) An effect should be present when the cause is present and that when the cause is absent the effect should be absent also. Control conditions: the cause is absent.
Between-group/Between-subject/independent Different entities in experimental conditions Repeated measures (within-subject) The same entities take part in all experimental conditions. Economical Practice effects Fatigue
Systematic Variation Differences in performance created by a specific experimental manipulation. Unsystematic Variation Differences in performance created by unknown factors. Age, Gender, IQ, Time of day, Measurement error etc. Randomization Minimizes unsystematic variation.