310 likes | 329 Views
Statistics in Theses. Dr. John P. Abraham Professor University of Texas Pan American. Describe an egg. Students try to do this. Differences in description. Children’s view Adults’ view Shopper’s view Seller’s view Producer’s view Chicken’s view Biologist’s view Dietician’s view
E N D
Statistics in Theses Dr. John P. Abraham Professor University of Texas Pan American
Describe an egg • Students try to do this
Differences in description • Children’s view • Adults’ view • Shopper’s view • Seller’s view • Producer’s view • Chicken’s view • Biologist’s view • Dietician’s view • Chemist’s view
Measurements • You need to describe using some measurements • Errors in measurements
Descriptive statistics • summarizing a collection of data in a clear and understandable way. • Numerical • Graphical
Numerical descriptive statistics • Spread • Range • Semi-interquartile range • Std deviation • central tendency • Mean • Median • Mode
Inferential Statistics • Infer about a population based on a sample • Infer about the future based on past
Hypothesis testing using variables • A variable is characteristic of an object of a study that can be measured. • The measurements will be different for different objects. • Can be quantitative or qualitative • Can be independent or dependent • Continuous or discrete (when we create a 1 to 5 ranking)
Necessity for control • What is a control group • A control group study uses a control group to compare to an experimental group in a test of a causal hypothesis. • The control and experimental groups must be identical in all relevant ways except for the introduction of a suspected causal agent into the experimental group. • For example, if 'C' causes 'E', when we introduce 'C' into the experimental group but not into the control group, we should find 'E' occurring in the experimental group at a significantly greater rate than in the control group. • Significance is measured by relation to chance: if an event is not likely due to chance, then its occurrence is significant.
Double blind study • a control group test where neither the evaluator nor the subject knows which items are controls • A randomized test is one that randomly assigns items to the control and the experimental groups. • The purpose of controls, double-blind, and randomized testing is to reduce error, self-deception and bias.
Placebo • Many control group studies use a placebo in control groups to keep the subjects in the dark as to whether they are being given the causal agent that is being tested. • For example, both the control and experimental groups will be given identical looking pills in a study testing the effectiveness of a new drug. Only one pill will contain the agent being tested; the other pill will be a placebo. • In a double-blind study, the evaluator of the results would not know which subjects got the placebo until his or her evaluation of observed results was completed. This is to avoid evaluator bias from influencing observations and measurements.
Inferential statistics • we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what's going on in our data. • we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.
T-test • compare the average performance of two groups on a single measure to see if there is a difference. • You might want to know whether there is a difference between girls and boys in their math abilities. • Whenever you wish to compare the average performance between two groups you should consider the t-test for differences between groups.
T-test example • The Acme Company has developed a new battery. The engineer in charge claims that the new battery will operate continuously for at least 7 minutes longer than the old battery. • To test the claim, the company selects a simple random sample of 100 new batteries and 100 old batteries. The old batteries run continuously for 190 minutes with a standard deviation of 20 minutes; the new batteries, 200 minutes with a standard deviation of 40 minutes. • Test the engineer's claim that the new batteries run at least 7 minutes longer than the old. Use a 0.05 level of significance. (Assume that there are no outliers in either sample.) • See next slie
4 steps needed • (1) state the hypotheses, (2) formulate an analysis plan, (3) analyze sample data, and (4) interpret results • State the hypotheses. The first step is to state the null hypothesis and an alternative hypothesis. • Null hypothesis: μ1 - μ2 >= 7 Alternative hypothesis: μ1 - μ2 < 7 • Formulate an analysis plan. For this analysis, the significance level is 0.05. Using sample data, we will conduct a two-sample t-test of the null hypothesis. • Analyze sample data. Using sample data, we compute the standard error (SE), degrees of freedom (DF), and the t-score test statistic (t). t = [ (x1 - x2) - d ] / SE = [(200 - 190) - 7] / 4.472 = 3/4.472 = 0.67 • Interpret results. Since the P-value (0.75) is greater than the significance level (0.05), we cannot reject the null hypothesis.
Standard Score • Problem • A national achievement test is administered annually to 3rd graders. The test has a mean score of 100 and a standard deviation of 15. If Jane's z-score is 1.20, what was her score on the test? • From the z-score equation, we know • z = (X - μ) / σ • where z is the z-score, X is the value of the element, μ is the mean of the population, and σ is the standard deviation. • Solving for Jane's test score (X), we get • X = ( z * σ) + 100 = ( 1.20 * 15) + 100 = 18 + 100 = 118
Probability • Mathematically, the probability that an event will occur is expressed as a number between 0 and 1. • Notationally, the probability of event A is represented by P(A). • A coin is tossed three times. What is the probability that it lands on heads exactly one time? • If you toss a coin three times, there are a total of eight possible outcomes. They are: HHH, HHT, HTH, THH, HTT, THT, TTH, and TTT. Of the eight possible outcomes, three have exactly one head. They are: HTT, THT, and TTH. Therefore, the probability that three flips of a coin will produce exactly one head is 3/8 or 0.375.
ANOVA (Analysis of Variance) • gives a statistical test of whether the means of several groups are all equal • MANOVA (multivariate analysis of variance) • Multivariate analysis of variance (MANOVA) is used when there is more than one dependent variable.
Correlation • Statistical correlation is a statistical technique which tells us if two variables are related.If the change in one variable is accompanied by a change in the other, then the variables are said to be correlated. We can therefore say that family income and family expenditure, price and demand are correlated. • You should measure manipulated variables rather than: one could compute 'r' between the size of shoe and intelligence of individuals, heights and income. Irrespective of the value of 'r', it makes no sense and is hence termed chance or non–sense correlation.
r Value • In general, r > 0 indicates positive relationship, r < 0 indicates negative relationship while r = 0 indicates no relationship (or that the variables are independent and not related). Here r = +1.0 describes a perfect positive correlation and r = -1.0 describes a perfect negative correlation. • value of rStrength of relationship-1.0 to –0.5 or 1.0 to 0.5Strong-0.5 to –0.3 or 0.3 to 0.5Moderate-0.3 to –0.1 or 0.1 to 0.3Weak–0.1 to 0.1None or very weak
Analysis of Covariance • Anova mixed with regression analysis • ANCOVA tests whether certain factors have an effect on the outcome variable after removing the variance for which quantitative predictors (covariates) account. • Suppose you analyze the results of a clinical trial of three types of treatment of a disease - "Placebo", "Drug 1", and "Drug 2". The results are three sets of survival times, corresponding to patients from the three treatment groups. The question of interest is whether there is a difference between the three types of treatment in the average survival time.
ANCOVA cont. • You might use analysis of variance to answer this question. But, if you have supplementary information, for example, each patient's age, then analysis of covariance allows you to adjust the treatment effect (survival time, in this case) to a particular age, say, the mean age of all patients. Age in this case is a "covariate" - it is not related to treatment, but can affect the survival time. This adjustment allows you to reduce the observed variation between the three groups caused not by the treatment itself but by variation of age.
Regression Analysis • Regression analysis provides a "best-fit" mathematical equation for the relationship between the dependent variable (response) and independent variable(s) (covariates). • In linear regression, the function is a linear (straight-line) equation. For example, if we assume the value of an automobile decreases by a constant amount each year after its purchase, and for each mile it is driven, we can create a formula to find the value.
Why use share point services? You will have several faculty members on your committee All will have to comment on your thesis and correct. Best way to make appointments with many people One central repository for all your files. Different versions are kept. In case of a mistaken edit can go back. Summarize the course
Why review different theses • Discussed style • Discussed chapters • Discussed content • How to get ideas for your research from suggestions
References • Discussed different types of references and what is acceptable and what is not. • Discussed plagiarism at length • Discussed how to quote and how to cite
Theses and Project • Differences • Similarities • Report writing
Formal research studies • Hypothesis formulation • Collect raw data • Conduct statistical analysis • Make concultions • Report