class 7 10/20/08 intro to statistical methods cont.

class 7 10/20/08 intro to statistical methods cont.

all researchers must learn the trick and avoid the mistake • the trick to doing research is to begin with the question and then to figure out the best way to answer that question • the mistake is to begin with the method and fit the question to the method

research using • measurement description • statistical analysis critical for answering certain kinds of important questions

strengths of measurement description • precise descriptions • often efficient—one can make confident predictions based on relatively small samples—if samples good • increasingly sophisticated ways of analyzing measurement data • powerful stat packages now available for desktop computers, e.g, Systat, SPSS, SAS

cautions • measure only what can be measured • “to replace the unmeasureable with the unmeaningful is not progress” (Achen, 1977) • value precision but realize that a precise description may not be an accurate one • scientific method (drawing inferences from observations) comprises many specific methods—its strength does not come from any one specific method

my personal recommendations • whatever your Ph.D. Research Specialization take at least one stat course • whatever your methodological expertise, find people with similar interests but different methodological expertise and work with them—the best research often uses multiple methods

The statistician knows, for example, that in nature there never was a normal distribution, there never was a straight line, yet with normal and linear assumptions, known to be false, he can often derive results which match, to a useful approximation, those found in the real world. (Box, 792) • All models are false, but some are useful. • Box, George E. P. (1976). Science and statistics. Journal of the American Statistical Association, 71, 791-799.

a caution • Statistics today is in a conceptual and theoretical mess. The discipline is divided into two rival camps, the frequentists and the Bayesians, and neither camp offers the tools that science needs for objectively representing and interpreting statistical data as evidence. (Royall, pp. 127-128) • Royall, Richard (2004). The likelihood paradigm for statistical evidence. In M. L. Taper & S. R. Lele (Eds.), The nature of scientific evidence: Statistical, philosophical, and empirical considerations (pp. 199-152). Chicago: University of Chicago Press.

K ch 19: inferential statistics • inferential statistics allow one to infer the characteristics of a population from a representative sample • from sample one can estimate characteristics of population within a determined range with a given probability • determine whether an effect beyond sampling and chance error exists in a study with a given probability

parameters: refer to population • statistics: refer to sample • sampling distribution: descriptive statistic calculated from repeated sampling • confidence intervals: range that includes the population value with a given probability (based on standard error of measurement)

confidence level: • the probability that the interval will contain the population value (conventionally 68, 95, and 99%, or 2 to 1, 19 to 1, 99 to 1 respectively) • the wider the interval the more certain it contains the population value (and the less valuable the information becomes)

hypothesis testing (traditionally takes form of rejecting the null hypothesis, i.e., that there is no effect beyond sampling and chance error) alpha level: the risk the result is due to chance; set by the researcher in advance, traditionally .10, .05, .01, .001 p-level: the actual probability level found, which is then compared to the alpha level

two-tailed test: • non-directional, puts the alpha level at both ends. Used when one does not expect results in one direction one-tailed test: • directional, puts alpha level at one end (determined by researcher). Increases probability of finding statistically significant result

common statistical tests t test of difference between means • common and simple test for differences between means of two groups chi-square • common test for categorical data and frequencies—are cell values different from what would be expected

ANOVA (analysis of variance) • commonly used in experimental designs where two or more groups or multiple conditions are being compared (thus common in psychology and ed psych, and in educational research in general) • powerful: more accurate measure of error variance, tests significance of each variable as well as combined effect, avoids inflation of probabilities problem

(not in K) Regression analysis • explains (predicts) the variability of a dependent variable using informtion about one or more independent variables. • predicts expected change in the dependent variable given specific changes in the independent variable • traditionally not used in educational research as much as ANOVA, but more useful for policy purposes

errors of inference • type I error (alpha error): a concern when theory testing (K, “when validating a finding”) • type II error (beta error): a concern when theory building (K: “when exploring”) • decreasing the probability of one type increases the probability of the other • pointless to talk about Type I or II error absent discussion of what is at stake

cost of type I error in theory testing • dominant theory not challenged • knowledge production stopped cost of type II error in theory building • possibly important explanations etc. ignored • knowledge production stopped (one of the many challenges the late and great Lee Cronbach (1916-2001) made to the accepted wisdom of the day)

statistical power: 1-beta • increasing statistical power: • increase size of effect (stronger treatment) • increase sample size • reduce variability

statistical & practical significance • statistical: confidence at a given probability that the result is not due to chance • practical: is the result important enough, big enough, feasible, affordable—all value judgments • if one apple a day keeps the doctor away, but it takes three grapefruit, then…?

no statistic or statistical test can make a practical decision • whether one risks being wrong cautiously (type I) or wrong incautiously (type II) cannot be decided absent cost and risk, needs, what’s a stake etc • no statistical analysis better than the numbers (descriptions) fed into it: garbage in, garbage out

statistical significance refers only to samples from population • it does not refer to size of effect—ceteris paribus larger effects are more likely to be statistically significant, but with large samples very small effects will be • if you have the population, then any effects are real, no matter the size

no proof in science: • a statistically significant result (assuming appropriate analysis etc) does not prove that the hypothesis is true, only that it has escaped disconfirmation • the more often an hypothesis passes the test and the more demanding the tests it passes, the more certain we can be that we know something—the more we have reduced uncertainty

other terms • parametric: assumes random sampling, from distribution with known parameters, often normal distribution • nonparametric: when data do not come from known distribution—often with nominal or ordinal data • robust test: accurate even when assumptions violated • effect size: too long and too often ignored—journals now requiring estimates of effect size

Vogt • regression toward the mean • reliability • sample space • sampling frame • scatter plot • self-selection bias • sleeper effect • sociogram • spurious relation (or correlation) • suppressor variable

Sieber ch 6: Strategies for Assuring Confidentiality 6.1 Confidentiality refers to agreements with people about what can be done with data • states steps will be taken to insure privacy • states legal limitations to assurances of confidentiality

6.2 why an issue (be able to discuss the cases) 6.3 confidentiality or anonymity 6.4 procedural approaches to assuring confidentiality 6.4.1 cross-sectional research • anonymity • temporarily identified responses • separately identified responses

6.4.2 longitudinal data (requires links) • aliases 6.4.3 interfile linkage 6.5 statistical strategies for assuring confidentiality (coin flip example) 6.6 certificates of confidentiality • researchers do NOT have testimonial privilege unless they have certificate of confidentiality from Dept of Health and Human Services

6.7 confidentiality and consent: • consent statement must specify promises of confidentiality researcher cannot make—be aware of state reporting laws, e.g., on child abuse 6.8 data sharing • when data shared publicly, all identifiers must be removed and researcher must ensure no way to deduce identity • techniques

thinking simple statistical way to find out what people may not willing to admit • ask people to flip coin • if head, answer “head: no answer” • if tail and have done X, answer “head: no answer” • if tail and have not done X, answer “no” • thus, no’s an estimate of half who have not done x • thus, N minus twice the number of “no’s” gives estimate of those who have done X

case 3 • what did you learn from reading this case? • how would your write this case differently? • do you think that this case is realistic? • what should our hero do?

lit review guidelines • cover page • abstract (APA pp. 12-15) • intro section (2-3 pp) • describe area of interest • specific question or problem review addresses • detailed description of: • "data base" and its parameters • strategies for searching—note possible limitations • how review section is organized

review section (15-20 pp) • review lit, follow explicit and logical scheme. • 3-5 sections, with subsections if useful • end sections and subsections with a discussion • discussion section (2-3 pp) • synthesize the review (discussion of discussions)

conclusion section (< 1 p) • address original question(s) • personal reflections section (1 p) • discuss briefly what you learned in the process of doing the lit review • references • make sure all citations in references • make sure all references cited

general style rules and tips use first person to talk about yourself • I interviewed Liam. (good) • The researcher interviewed Liam. 9bad) do not begin sentences with “there is” or “it is” etc. • There were three kids who answered… (bad) • Three kids answered the questions. (good)

use who for people, that for things • I interviewed the kids, who all agreed….(good) • I interviewed the teacher that was in…. (bad) pronouns must refer to nouns • I entered the room and found the kids running across the table tops and throwing erasersat each other. That made me nervous. (bad—not clear what made you nervous)

introductory adjectival phrases must modify the subject • Rushing into the room, the class had already begun. (bad) • Rushing into the room, I discovered that the class had already begun. (good) find the right word • Mark Twain observed that the difference between the right word and the almost right word is the difference between lightning and a lightning bug.

avoid beginning sentences with “There is (are),” or “It is” • avoid beginning sentences with “however” • avoid “throatclearings to begin sentences, e.g, furthermore, therefore, also, additionally. • ragged right • use a serif font, either Times Roman or Courier (12 pt) (APA, p. 285) • single quotation marks only within double quotation marks: “Smith quoted Chung saying, ‘I believe . . . .’” • use “we” to refer only to you and your co-authors (APA, p. 39)

more bests • fish sandwiches on a Friday night. Philo Tavern. 7 miles south of Urbana on 130 (High Cross Rd).

this week: Tues: Healthy College Cooking, 7-9, Instructional Kitchen, ARC Wed: Convo table, improve Spanish and English speaking skills thru conversation, games etc. La Casa Cultural Latina, 5:30-7:30pm, 1203 W Nevada, U (Adele) Thur-Sat: Hamlet, Krannert, 7:30, $13 Fri: soccer vs Penn State, 7pm, soccer stadium Sun: salsa dance lessons, La Casa Cultural Latina, 7pm (Adele)

class 7 10/20/08 intro to statistical methods cont.

class 7 10/20/08 intro to statistical methods cont.

Presentation Transcript

Class 10

CS573 Data Privacy and Security Anonymization methods

Intro to Linux (cont)

10/7

Methods cont –

class 6, 10/14/13 intro to statistical analysis

The 7 Methods of Characterization

Class 10

class 6 10/13/08 intro to statistical methods

class 7: 10/21/13 intro to statistical methods cont.

class 6, 10/10/11 intro to statistical methods

Experimental Control cont.

Comparing Sapflow Methods: Granier vs. Heat Ratio

Methods(cont.)

Word Objective 7: Performing Mail Merge Operations

Scientific Methods 1

Scientific Methods 1

Authors ...........

class 6, 10/11/10 intro to statistical methods

Ch 10 – Intro To Inference 10.1: Estimating with Confidence

Class 14