430 likes | 550 Views
Religion and Economic Change over a Century: Linking Diverse Historical Data New Technologies and Interdisciplinary Research on Religion Harvard, 2010 Robert D. Woodberry Juan Carlos Esparza University of Texas at Austin Sociology Department and Population Research Center.
E N D
Religion and Economic Change over a Century:Linking Diverse Historical DataNew Technologies andInterdisciplinary Research on ReligionHarvard, 2010Robert D. Woodberry Juan Carlos EsparzaUniversity of Texas at AustinSociology Department and Population Research Center
The challenge: • Roots of current differences may go back decades, even centuries – How test? • Religious records • valuable information • seldom used • Linking diverse sources over time
Problems: • Gathering complete data • Digitizing data & maps • Normalizing and linking data from different sources • Dealing with missing data • Creating database for geo-spatial statistical modeling
Complete data • Locating and evaluating “the universe” of sources • Temporal coverage • Spatial coverage • Data Quality • Variables included
Complete data • Complete data often only available in archives: e.g., “Vatican Secret Archives,” & “Archives of Propaganda Fide” • Negotiating access • Locating, copying and digitizing sources
Spatial Linking Issues: 1) Data given for different spatial units 2) Spatial units change over time 3) Accuracy of base map
Spatial Linking 1) Data given for different spatial units • Protestant: points • Catholic: polygons • Censuses, surveys, geo-climatic data: different polygons and grids of cells
Spatial Linking 2) Spatial units change over time • Cities’ & towns’ names change • Catholic ecclesiastical jurisdictions evolve • National, provincial, and other state boundaries change
Spatial Linking Why Important? • Connecting data to proper geographic referent e.g., EJs & provinces in 1913 • Linking data over time For statistical analysis For imputation (How does data in 1892 relate to data in 1934 and 2009)
Spatial Linking 3) Historic maps inaccurate (limited usefulness) • Points: Why matters: 1) change over time 2) link to proper polygon 3) link to proper geo-climatic conditions • Find place in modern gazetteer • Link locations between sources • known alternative names • consistent institutions
Spatial Linking Historic maps inaccurate (limited usefulness) • Territories: map spaghetti Why matters: 1) Arbitrarily linking borders 2) Imputing data to artificial slivers 3) How link data when no maps
Spatial Linking Improving accuracy: • Start with accurate modern maps • Reconstruct border change from legal documents • Reconstruct border overlap from legal documents (e.g., Catholics and state jurisdictions borders) • Bring modern borders back through time
Linking (cont.) • Accurate base maps: • Current world maps insufficient accuracy (e.g., mission stations in ocean or wrong country) Improve coastlines, islands, borders, and maritime boundaries • Remove slivers Allows automatic linking of point and polygon data
Reconstructing historic borders: • Papal decrees document changes in EJs & identify corresponding government borders
Linking (cont.) Reconstructing historic borders: Check accuracy with country & empire records Smallest unit in legal sources determines size of MCGUs and precision of data linking When possible use modern borders, when not digitize border from relatively accurate historical maps
Linking (cont.) Determine Maximum Consistent Geographic Unit (MCGU) before creating digital maps MCGUs foundation for all linking and imputation Only one base map (easy to update) All other geographic units are unions of MCGUs
Linking (cont.) Maximum Consistent Geographic Unit (MCGU) All point and cell data link to MCGUs Protestant data Geo-climatic data Missionary mortality data Also allow contextual analysis (spatial autocorrelations, etc.) Minimizes over-aggregation of data
Linking (cont.) Linking geo-climatic data (endogeneity) Aggregate as grid of cells: Grid of boxes covering world Assign unique IDs and vectorize raster data Normalize so boxes perfectly overlap and IDs match between layers (very hard and time consuming) Aggregate for MCGUs
Linking (cont.) Linking mortality data (endogeneity) Data on over 100,000 missionary lives Calculate comparative mortality estimates by linking lives to 1) points (mission stations) 2) polygons (Countries, EJs & MCGUs) Can generalized to other areas based on geo-climatic conditions, etc.
Missing Data Problems: • Changing categories between sources/years • Inconsistent categories within same source • Missing places in source • Inconsistent years between sources
Missing Data (cont.) • Strategies: Finding missing data: Letters of bishops to Pope Triangulating between sources - To identify missing institutions & organizations - To identify estimates from inconsistencies - To fill in missing data
Missing Data (cont.) • Strategies: Imputing missing data (multiple imputation): Using: 1) trend over time in MCGUs - e.g., using linked MCGUs in 1913 & 1932 to estimate 1923 2) pattern with neighbor Can compare results with and without imputed data
An example: Mexico • Reconstruct all locality changes back to 1815 • Reconstruct all EJ changes from 1850 • Link historical censuses & modern surveys • Re-aggregate data according to any geographic unit (MCGU or larger)
Mexico (cont.) Once completed: • All census, Catholic, and Protestant data linked for about 120 years • Multiple current surveys linked so can analyze modern consequences • Longitudinal database of MCGUs
Mexico (cont.) Interrupted Time Series: • impact of introducing Protestant missions on Catholic church behavior • impact of Catholic and Protestant interventions on the change in literacy between censuses Cumulative Influence: Endogeneity: test correlates of when and where Protestants and Catholics invest in particular areas.
Thank You! • .