340 likes | 498 Views
Graduate Research Methods and Scholarly Writing in the Social Sciences: Government and History Harvard Summer School: SSCI S-100b Section 2 (32761) Joe Bond 6/24/2013. Introduction to the Course, Social Science Approach, ALM Context and Proseminar Objectives
E N D
Graduate Research Methods and Scholarly Writing in the Social Sciences: Government and HistoryHarvard Summer School: SSCI S-100b Section 2 (32761)Joe Bond6/24/2013
Introduction to the Course, Social Science Approach, ALM Context and Proseminar Objectives Course Requirements and Grading Facilitation (1 minimum) & participation in class discussions 7% In-class exercises (8 of 10 required, but NOT graded) 8% Argument writing assignment (1ST paper) 10% Book review writing assignment (2nd paper) 15% Literature review writing assignment (3rd paper) 25% Mid-term exam 10% Research design writing assignment (4th paper) 20% Final class presentation 5% Harvard Extension School is not a traditional graduate program. Explain. Volunteers for next week’s facilitation
Basics – will be brief but the .ppt will be posted on the course website Qualitative vs. Quantitative vs. Mixed Methods • Largely a moot debate (more and more studies utilize mixed methods) • Your questions should always determine your methodological approach, not the reverse • Why and when to use A and why and when to use B depends: • Is there a relationship between regime type and violent conflict? • What are the odds that the nation of Fester will fail in the next 5 years? • What is the role of political culture as it relates to negotiations? • What would have happened if Germany refrained from invading Poland in 1939? • Your choice of methods depends on how you operationalize your variables (e.g. how do you intend to measure political culture, etc.?)
Variables Independent variables (IVs)are those variables that help explain a dependent variable Independent variables must be antecedent to dependent variables (e.g. relationship between education and income) Dependent variables (DVs)are the things you are trying to explain Example: Relationship between SAT scores (IV) and success in college (DV) Dependent variable should always be labeled along the y axis of a graph
Level of Measurement • Why is it important? • Nominal: (measures not ranked: gender, religion, etc.) • Ordinal (measures rank ordered: economic class) • Interval (measures equally ranked: income) • Ratio as characterized in the social sciences (the measure has an absolute zero: mass, length, time) • Think Nominal, Ordinal, Interval, Ratio (NOIR)!
Association • An Association between two variables: the values of one variable tend to coincide (vary or covary) with the values of the another • Example 1: the relationship between sex education and teen pregnancy. • Teen pregnancy as the DV, sex education as an IV (note: in this example we treat the latter as antecedent to the former) • We might hypothesize that increased exposure to sex education programs help mitigate the incidences of teen pregnancy (i.e. they vary: as X goes up, Y goes down) • Example 2: the relationship between education (IV) and income (DV) • We might hypothesize the more education one has, the higher one’s future income will be (i.e. they covary: as X goes up, Y goes up) An anomaly. Wrong career trajectory.
Correlation • A statistical term that indicates the strength and direction of a linear relationship between variables (e.g. the relationship between education and income) • IMPORTANT! Association or correlation DOES NOT imply causation • example 1: drowning (DV) and consumption of ice cream (IV) – they covary (as ice cream consumption goes up, incidents of drowning increases) • example 2: children’s shoe size (IV) and math performance (DV) – they covary (as shoe size gets bigger, math skills go up) • Example 2 also highlights the importance of definitions, operationalization and transparency • example 1: ice cream consumption is a proxy for temperature • example 2: shoe size is a proxy for age
More on Correlation • Correlation is a measure of the direction and degree of strength between two or more variables • A correlation coefficient (r or Pearson’s r) is a numerical index of that relationship • The magnitude of the correlation coefficient indicates the strength of the relationship between variables (i.e. -1 to +1) • +1 means a perfect positive correlation(co-vary) while -1 shows aperfect negative correlation (vary) • The closer the correlation coefficient is to +1 or -1, the stronger the relationship • But even a strong [negative or positive] correlation is meaningless if the level of error (significance) is large (e.g. p < 0.5 vs. p < 0.01)
Hypotheses & Null Hypotheses • H1: as education increases, likelihood of voting increases • H0: education has no effect (≠) on the likelihood of a person voting • Why do we “test” the null hypothesis? • the strongest “proof” is the inability to “disprove” • error cannot be eliminated • like it or not, “facts” change Avoid words like “this proves…” or “this is irrefutable proof…;” instead, use “supports,” lends support to,” etc.
Types of Analysis Analysis may have exploratory, descriptive, explanatory, and predictive objectives or some combination of these aims Evaluation research is a 5th type that is not discussed here, albeit it is no less important
Exploratory Research • Undertaken when very little is known about a phenomenon • Forms the foundation for subsequent descriptive and explanatory research • In the early 1980s we did not have a good handle on how many Americans were infected with HIV/AIDS or even what caused of it. • This sort of research is often linked with activism
Descriptive Research • Serves to identify important areas of inquiry • Often serves as the first step in explanatory inquiry • Addresses whether a phenomenon is a common occurrence or a rate event • Describe the U.S. electorate and electoral behavior: • Jewish Americans tend to vote for democrats • Catholics tended to vote democratic but the abortion issue has created a rift • Latinos tended to vote overwhelmingly democratic but this began to change in 1999 and swung back again in 2008 • Examples: Observational Research, Historical Research, etc.
Explanatory Research • Scientific inquiry usually does not end with description but proceeds to explanation • Descriptive findings are likely to lead to the investigation of the factors associated with the outcome and to attempts to understand how these factors contribute to the occurrence of the outcome • Understanding how something works allows us to better predict the future (applies to both qualitative and quantitative research) • Examples: Lessons Learned, Counterfactual Thought Experiments, Regression Analysis, etc.
Prediction: “optimistic/happy” pop hits predict a bull market six months in advance
Émile Durkheim’s Suicide (1897): An Example of the Research Process
Durkheim’s Variables • Inductive Approach or Theory Building • Dependent Variable(s) (what is he trying to explain): RATES of SUICIDE in Europe (1800s) • Independent Variables (those things that help “explain” the Dependent Variable(s)): CLIMATE, AGE, GENDER, POLITICAL TURMOIL, RELIGION (limited to Christianity), MARITAL STATUS, DEPENDENTS, ETC. • Recall Levels of Measurement (NOIR) • Nominal (can’t be ranked) • Ordinal (ranked with unequal or arbitrary intervals) • Interval (equal intervals) • Ratio (as interval with “true” zero)
Some of Durkheim’s Descriptive Findings • Suicide rates are higher for widowed, single and divorced men than married men • Suicide rates are higher for people without children than with children • Suicide more pronounced in colder climates • Suicide rates are higher among Protestants than Catholics
Differences between Protestants and Catholics • Suicide is [moreof]a sin for Catholics • Role of coroners • if no suicide note is left, it comes down to the coroner's interpretation (circa 1897) • Differences in social integration • Catholics tend to have higher levels of social integration • think the movie My Big, Fat Greek Wedding.
The Notion of Integration: Going Beyond Religion • Catholic countries tend to be more integrated than Protestant countries, with closer family ties • this is why people who are married and/or have children commit less suicide • simply put, they have more to live for • This is even reflected in physical proximity when speaking with others • Social bonds are composed of two factors: • social integration: attachment to other individuals within society • social regulation: attachment to society's norms • Suicide rates may increase when extremities in these factors occur
Building a Theory: Social Integration • abnormally high or low levels of social integration may result in increased suicide rates; • low levels of social integration result in disorganized society (chaos); • high levels of social integration drive some to suicide in order to avoid becoming burdens on society
Durkheim’s Suicide Typology • Egoistic suicide • Ties attaching the individual to society are weak • Few social ties to keep the individual from taking his or her own life (Why not?) • Altruistic suicide • Individuals are extremely attached to society and have no life of their own (self-emulation) • They believe their death can bring about a benefit to the society • Anomic suicide • Weak social regulation between the society's norms and the individual (life becomes too unpredictable and uncertain) • Often brought on by dramatic changes in economic and/or social circumstances (e.g. wars, recessions and other turmoil, etc.) • Fatalistic suicide • Social regulation is completely instilled in the individual (suicide bombers) • No hope of change against an oppressive society
Research Cycle as an Iterative Process Durkheim used an inductiveapproach, moving from steps #2 & #3 to build step #1 (observation theory) Most quantitative research involves deductive research (i.e. theory empirical testing)
Group Exercise (groups of 2 or 3) • Form groups of 3 or 4 • For each group, • defineone of the four concepts, below • operationalizethe concept (i.e. how would you measure the concept in your research?) • Reconvene in 5 – 8 minutes (max) • Attractiveness • Democracy • Leadership • Love
State Failures • State Failure project (1994) • objective: then VP Gore asked the CIA to predict which states will fail 5 years out • analyzed thousands of [structural] variables • found that 3 variables could predict failures 85% of the time looking out 5 years • infant mortality (a proxy? for what?) • level of democratization • openness to trade • other salient factors: youth bulge, religious distributions, etc. • We will return to this later on in the semester
Africa Prospects: Predicting State Failure with Structural Data
Africa Prospects • Purpose:to assess the vulnerability of countries to conflict escalation based on its profile or set of structural indicators. • Overall Accuracy:is defined as the ratio of correct classifications (C) to all classifications (A). Accuracy = C/A * 100%. • Recall: is defined as the ratio of correct classifications (C) to the observed classification (O). Recall = C/O * 100% and represents the ability of the algorithm to classify the conflicts as they were observed. • Precision: is defined as the ratio of correct classifications (C) to correct (C) and incorrect classifications (I). Precision = C/(C+I) *100%. Illuminates the algorithm’s false positives; specifically, the higher the ratio the lower the false positives.
Forecasting Performance Metrics: Definitions and Illustrations Recall Precision Overall Accuracy # of correctly predicted conflicts # of conflicts that occurred # of correctly predicted conflicts # of conflicts predicted to occur # of correct predictions # of predictions made • Bad forecast model #1 (almost every country will be unstable) • High recall= 99% (1% miss rate) • Low precision = 5% (95% false positive rate) • Low accuracy =40-50% • NET IS CAST TOO BROADLY • Bad forecast model #2 (few countries will be unstable) • Low recall = 5% (95% miss rate) • High precision =100% (0% false positives) • Low accuracy =40-50% • NET IS CAST TOO NARROWLY • Near-perfect forecast model • High recall = 99% • High precision = 100% • High accuracy =99.5% Countries forecast to be unstable at some level of intensity False positives Countries that experience instability “Misses” misses Countries that DO NOT experience instability
5-15 Year Validation of Forecasting Average Performance Scores For Different Training Sets / Forecast Periods Forecast Period Accuracy Recall Precision However, high recall scores indicate the net is cast wide enough to correctly forecast conflicts that DO occur (errors fall on the side of caution). Low precision scores (high false positive rates) in the out years indicate that the world was more stable than would have been expected given macro-structural conditions.
Independent Variables • Caloric Intake: Estimate of the average number of calories consumed per person, per day. • GDP per Capita: Annual gross domestic product per person measured in constant 1995 U.S. dollars. • Male/Female Infant Mortality: Number of deaths of male and female children under 1 year of age per 1,000 live births. • Life Expectancy: Average life expectancy (males and females combined). • Youth Bulge: Ratio of population aged 15-29 to those aged 30-69. • Among others……..
Dependent Variable: Index of Instability • Maximum level/intensity of conflict per country-year; source: KOSIMO Data Project, Heidelberg Institute of International Conflict Research (HIIK),1975-2003. http://www.hiik.de/de/index_d.htm • Represents a high threshold of instability 3 Levels of instability intensity Instability Levels High intensity (if combined probability > 67% Moderate intensity (if combined probability > 67% None/Low intensity (if combined probability > 67% Key Assumption: country is unstable if (and only if) the government or its opponent(s)threatens or initiates a conflict to restore equilibrium or harmony with respect to its internal or external relations.
Steps • 1. Compile a time series data set of the selected target variable’s intensity • 2. Compile a time series data set of candidate indicators associated with the target’s intensity • 3. Train an algorithm that explains the historical target intensities with the candidate indicators • 4. Calculate performance measures of the explanation from a time series of historical test data • 5. Generate vulnerability scores based on projections from the current value of the indicators • 6. Calculate a confidence level for the forecasts using the likelihood of occurrence at each intensity • 7. Iterate from step #1 for experimentation with alternative target variables • 8. Iterate from step #2 for experimentation with alternative explanatory variables or indicators
In-Class Writing Exercise 1June24, 2013 Educating Sergeant Pantzke (7:35) Should take no longer than 10-15 minutes On the opposite side of this paper only, take a position: The U.S. government should [should not] decide which schools can receive GI bill funding. For example, veterans working their way through Harvard should be able to use GI bill funds whereas vets working on a degree at the University of Phoenix should be prohibited to fund their education through the GI bill. Include any evaluation criteria that come to mind if you take the position that some schools but not others should qualify for GI bill funding.