210 likes | 443 Views
Research Terminology for The Social Sciences. What is Data?. Data is a collection of observations Observations have associated attributes These attributes are variables A collection of data is often called a “data set” What are variables?
E N D
What is Data? • Data is a collection of observations • Observations have associated attributes • These attributes are variables • A collection of data is often called a “data set” • What are variables? • A measure that takes different values for different observations • Across a population (cross-sectional) • Across time (cross-temporal) • Both! (Panel data) • Independent/explanatory variables are variables we think have an effect on other variables • Control variables are a special category • Dependent/outcome variables are the variables we are trying to explain or predict
Unpacking Variables • Features of variables • Take on some set of values • Different values have different meanings • Could be numerical, meaning they have number values attached • Continuous • Discrete or Limited • Could be categorical, meaning they have descriptive terms attached • Ordinal (the categories have numerical ranks associated with them) • Typological (the categories are descriptive and do not represent some ordering/ranking/values)
Research Design • Determine what kind of data will be needed based upon your research question • Quantitative? • Large-N • Measurable in a clear and consistent way • Qualitative? • Case studies • Not easily quantifiable • The Holy Grail of Social Science Research: Turning Quantitative Data into Qualitative Measures
Collecting Data • Libraries have a large collection of data sets that are ready to be used, in common software formats • Digital Centers have software suites for all steps of data collection process • Bibliographic packages • Data management software • Data analysis software • Reference librarians are useful resources for discovery • Sometimes, you may need to collect original data • Field work: going out and gathering data from observations • Archival work: finding the data in other information sources and aggregating it into a data set
Operationalization • Operationalization is the process of turning theoretical concepts into measurements • Matching theory with variables • Ideological framework • The type of problem should suggest an appropriate measure • Matching levels • Macro vs. micro, and everything in between • Matching observations • Individuals? Pairs? Groups? • Matching meanings • This is the hardest
Using Variables • Models are statements about the way variables related to one another • Two basic types in social science: analytical and formal • Analytical Models • Describe the causal relationships between variables • Rely upon probability and statistics • Formal Models • Describe a simplified version of reality • Variables become elements of this simplified reality • Rely upon theoretical frameworks • Both types of models can be tested with data
Research Methods • Mixed methods analysis is the “gold standard” • Combination of quantitative and qualitative data • Formal models • Mathematical representations of decisions • Game theory • Matching the research design to the hypothesis under investigation is critical • How questions are asked and answered • What counts as evidence?
Discovering the Data • Descriptive Statistics • These are measures designed to help you “picture” your data • Means, Medians, Modes • Standard Deviations, Variances • Exploratory Visualization • These are graphs that depict visually information contained in descriptive statistics • Distribution plots • Histograms • Density plots • Simple correlation plots • Graphing two variables, one on each axis (i.e., X & Y) • You can get more complicated later!
Analyzing the Data • Simple inferences • Correlations/covariances • These measures show the relationships between and among variables • Commonly referred to as ANOVA – ANalysisof VAriance • ANOVA is about comparing two (or more) samples, groups, populations • Basic Linear Models • These models explore • Simple regression: one dependent variable, one independent variable • This is really just a correlation • Multivariate regression: one dependent variable, many independent variables • This technique looks at simultaneous correlations among several variables
Advanced Data Analysis • Models for non-continuous/limited/discrete variables • Logitand probitmodels: the dependent variable can take two values • Tobitmodels: the dependent variable can take a set of values • Ordered logit, ordered probit, and multinomial logit models: the dependent variable can take a small and discrete set of values • Models for complex data • Simultaneous equations models (SEMs): the dependent variable can also effect the independent variable • Instrumental variables are a technique used to deal with this issue • Time-series and panel data models • The data cover multiple years and may have serial correlations (i.e., the values for one year are highly correlated with values from the previous year) • Non-linear models • The relationships between the variables are not of the form Y= mX + B