460 likes | 683 Views
The SACS Toolkit: E-Social Science from a Systems Perspective Brian Castellani Frederic Hafferty Michael Ball Kent State University, Kent, Ohio USA. Overview.
E N D
The SACS Toolkit: E-Social Science from a Systems Perspective Brian Castellani Frederic Hafferty Michael Ball Kent State University, Kent, Ohio USA
Overview • Our paper is about e-social science and the intermediary role it serves in the new digital (web-based) age of social scientific inquiry. • After a brief introduction to the new field of e-social science, we turn to a review of the SACS Toolkit. • The SACS Toolkit is a new e-scientific method we created in 2008 to help social scientists model complex social systems using digital data. • To demonstrate the e-scientific value of the SACS Toolkit, we apply it to a web-based, community health science digital database we are currently studying. • Our goal is to show how we used the SACS Toolkit to solve several ontological and methodological challenges this database presented us. We end suggesting how others may use the SACS Toolkit in their own work.
E-Social Science: A Quick Review • E-science is a new area of study—emerging in the late 1990s—that seeks to develop, capitalize upon and employ the latest advances in cyberinfrastructure to help scholars make the most of doing research in a digital world. • John Taylor, who coined the term, specifically defines e-science as: “global collaboration in key areas of science and the next generation of infrastructure that will enable it” (www.e-science.clrc.ac.uk/). • One of the leading programs in e-science is the UK e-Science Programme (www.rcuk.ac.uk/escience/ default.htm).
(Go to www.art-sciencefactory.com/complexity-map_feb09.html).
E-Social Science: A Quick Review • A subfield of e-science is called e-social science. Its purpose is to use cyber-infrastructure to develop social scientific inquiry in the digital age. • One of the leading centers is the National Centre for e-Social Science (www.esrcsocietytoday.ac.uk/ESRCInfoCentre/index.aspx). • Of the various definitions available, we follow that of Paul Tennent (http://www.mrl.nott.ac.uk/~pxt/Site/Home.html) • In his paper, titled appropriately enough, “Defining the Role of the e-Social Scientist,” he argues that the real purpose of e-social science is intermediary. The goal is to act as a go-between, interpreter, integrator, liaison, conciliator and link between the fields of social and computer/information science.
E-Social Science: A Quick Review • The intermediary work of the e-social scientist revolves around four main interconnected and interdependent areas: • digital data • cyberinfrastructure • ontology • method
E-Social Science: A Quick Review • Digital Data: • Work on digital data has to do with issues of size and complexity. • When scholars use the term digital data (or any of its synonyms, such as web-based data, digital databases, or grid-data), they are referring to the databases typically encountered on the web, internet or grid. • The defining feature of digital databases are their complexity. • It is also often the case that this data are located on different servers, in different formats, and tend to require different methods of retrieval. • Finally, and very important to our paper, digital databases are typically assembled according to an ontological system of classification or organization that may not be user-friendly for social scientists conducting research.
E-Social Science: A Quick Review • Cyberinfrastructure: • Work on cyberinfrastructure has to do with what, how and where digital data is housed. • Cyber-infrastructure (and its related terms, such as the grid) refers to any and all research environments designed to support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet or web. • For an excellent overview, visit the National Science Foundation’s Cyberinfrastructure Vision for 21st Century Discovery http://www.nsf.gov/pubs/2007/nsf0728/index.jsp
E-Social Science: A Quick Review • Ontology: • Ontology concerns itself with the underlying conceptual framework upon which digital databases and their supporting cyber-infrastructure are organized http://www.shirky.com/writings/ontology_overrated.html • More specifically, it helps designers determine: • (1) what things belong within the domain of an information system (i.e., parts, groups, components, catalogues, classification schemes, servers, databases, storage retrieval mechanisms, computers, software, etc) and • (2) what relationships exist amongst these things. • The guiding question of ontology is: When considering the development or usage of some database and supporting cyberinfrastructure, what kind of framework or classification system will ensure that people, computers and data are connected in the most efficient and effective manner?
E-Social Science: A Quick Review • Methodology: • The final area of e-social science is method. • In terms of methodological innovation, the focus of e-social scientists is the same as scholars in the fields of complexity science, data mining, web-science and computational modeling. • Their job is to develop the computationally-based tools social scientists need to study the massive, multi-dimensional, multi-platform, complex databases regularly housed and analyzed on the web today.
E-Social Science: A Complexity Science Perspective While e-social science has made many important advances in the few short years during which it has existed, much remains to be done. One particular area is the development of intermediary toolkits that sociologists and other scholars can use to model complex social systems with digital data. At present, no such toolkit exists. Enter the SACS Toolkit.
SACS Toolkit • The SACS Toolkit is a new framework for modeling complex social systems. • SACS stands for sociology and complexity science. • The SACS Toolkit is part of the burgeoning literature in complexity science and e-social science (See Castellani and Hafferty 2009).
SACS Toolkit • The SACS Toolkit was specifically created to handle the massive, multi-dimensional, complex databases typically found on the web today. • The SACS Toolkit can handle digital data because of its unique, user-based ontological and methodological organization. • In what follows, we explore the ontological and methodological strengths of the SACS Toolkit for using, organizing and analyzing digital data.
SACS Toolkit • The SACS Toolkit is comprised of three basic parts: • 1. A user-based, ontological and theoretical framework (including related vocabulary) researchers can use to organize their analysis of digital data. This framework is called social complexity theory.
SACS Toolkit • The SACS Toolkit is comprised of three basic parts: • 2. A theoretically and ontologically grounded algorithm, called assemblage, which researchers can use to analyze and assemble, from the “ground up,” a working model of a social system using web-based data. • Assemblage is also highly visual, relying upon a rather extensive repertoire of techniques taken from social network analysis, the new science of networks, social simulation, fractal geometry, cluster analysis, grounded theory, and the self-organizing map literature. • Integrating these techniques, the SACS Toolkit provides a novel approach to visualizing social systems.
SACS Toolkit • The SACS Toolkit is comprised of three basic parts: • 3. A recommended toolset of techniques and methods for modeling with digital data. • While the SACS Toolkit can be used with just about any sociological method or technique, our work finds the following techniques indispensible when it comes to analyzing digital data: • cluster analysis, neural networking (specifically, the self-organizing map), social network analysis, grounded theory method, Foucault’s genealogical method, fractal geometry, chaos theory, computational modeling, and data mining.
SACS Toolkit • Social Complexity Theory • As a framework, social complexity theory is less interested in explaining things and more interested in providing researchers an effective way to organize, coordinate, categorize, sort, connect, link and manage their data regarding some topic of study. • It does this by providing researchers a theoretical filing system and an associated vocabulary that they can use to create their own model of a social system. • Social complexity theory’s user-driven filing system is comprised of five organizational folders. In terms of ontology, the most important is the first, the field of relations.
Field of Relations: • As shown in this figure, the field of relations is the intellectual arrangement and bracketing of all information necessary to construct a model of a complex social system. • We borrow the term from Michel Foucault (Dreyfus and Rabinow 1983). • For us, this term has three ontological functions: conceptual, organizational and methodological.
The purpose of the field of relations is to articulate the domain in which all the elements of a social system of study, and their relationships, can be located and coaxed into coming together. • What makes the field of relations unique in terms of e-science is that it is highly flexible and user-driven. The field of relations is flexible because it changes according to the topic of study; and it is user-driven because the researcher defines what the domain of relations will be, what to include within it and what relationships are the most important. • The user-driven nature of the field of relations is very important. The SACS Toolkit’s utility comes from its ability to act as a supra-ontological framework, which can be placed upon any existing framework found on the web. As such, regardless of the databases, servers, data formats being used, the researcher has a guiding ontological framework that helps to organize, analyze and model some topic of study in complex systems terms.
2. In terms of digital data, the second ontological purpose of the field of relations is organizational. Social complexity theory is a rigorous framework of classification. Social complexity theory provides a way for researchers to make sense of the chaos of digital data, which is does by giving the researcher a set of conceptual folders, sub-folders, a filing system, and so forth for organizing everything in a set of predetermined format. • 3. The third ontological purpose of the field of relations is methodological. The strength of using the field of relations is that it can be directly applied to the management of one’s database, as well as the analysis of empirical data. This is of particular importance when working with digital data because there is no loss of information as the researcher moves from theory to data collection to analysis.
The other Four Folders: • What makes social complexity theory so rigorous and yet flexible when it comes to organizing digital data is that its filing system is designed to form a complex social system. Said more specifically, the four major folders within the field of relations—(1) the web of subsystems, (2) the network of attracting clusters, (3) environment, and (4) system dynamics—represent each of the major domains of a complex social system.
Assemblage: Assemblage is a case-based, system-clustering algorithm for modeling social systems. It is built on the organizational framework of social complexity theory and represents the procedural component of the SACS Toolkit.
The goal of assemblage is to move researchers through a six-step algorithm: • STEP 1: Help the researcher define a set of research questions in systems terms. • STEPS 2-4: Establish the social system’s field of relations and begin to “file and fill in” the information for all of the major folders (web of social practices, network of attracting clusters, etc). • Examine the internal structure and dynamics of model this network for a particular moment in time-space—a snapshot of the model, if you will—including its interactions with key environmental forces and its trajectory within key environmental systems. • Assemble these discrete, cross-sectional snapshots of the system into a moving model, concluding this some overall sense of the system as a whole. • STEPS 5-6: Once done, researchers can “data mine” this model to answer the initial study questions or to generate new questions or models.
WHAT MAKES ASSEMBLAGE UNIQUE: • 1. Assemblage was designed to address the unique challenges associated with modeling complex social systems. • 2. Assemblage is ontologically grounded in social complexity theory. Few methods in e-social science or complexity science come with their own user-based ontology. Assemblage does. • 3. Assemblage has no data preference. Unlike the majority of e-social science or complexity science methods, which tend to focus on numerical data, assemblage works equally well with any and all data types—from numerical to visual to historical.
WHAT MAKES ASSEMBLAGE UNIQUE: 4. Assemblage works with just about any statistical, qualitative, historical or computational technique. The reason assemblage can be used with such a wide variety of tools and toolsets is because these tools do not drive the model building process. Instead, the six-step algorithm of assemblage, along with the theoretical framework upon which it is grounded, drives model building. Any tool can be used as long as the researcher uses it in service of modeling a social system.
WHAT MAKES ASSEMBLAGE UNIQUE: • 5. Assemblage employs a case-based, constant comparative approach to modeling complex social systems. • A case-based, constant comparative approach to digital data treats a social system as a set of cases, each of which represents one of the multiple ways that a complex social system is practiced by the agents of which it is comprised. • Said another way, a “case-based” approach is useful because it allow us to build a social system from the ground-up, by exploring and comparing cases, one or several at a time, to profile and catalogue the various ways that a web of social practices is expressed. Once this process is complete, you are ready to move to the next major step in the assemblage process.
WHAT MAKES ASSEMBLAGE UNIQUE: • 6. Assemblage is a data-compressing, system-clustering method. The ultimate goal of assemblage is to help the researcher cluster the social system into its key attractor points. In this way—and here we draw directly from Kohonen (2001) and his self-organizing map technique—assemblage is a data reduction technique. It tries to reduce and compress the complexity of a social system into a simpler and more understandable form. The product of this simplifying process we call the network of attracting clusters. • 7. Finally, assemblage provides a novel approach to visualizing social systems. As a data compression technique, the goal of assemblage is to help the researcher create low-dimensional picture of high-dimensional data.
Applying the SACS Toolkit to the Healthy Summit 2010 Website: A Case Study in E-Social Science from a Systems Perspective
Summit 2010 • Our example comes from a study we are conducting on the relationship between community health and persistent poverty—or, what the complexity science literature refers to as poverty traps. • The database for our study is entirely web-based. It was put together by the Summit County Combined Health District. The website is called the Healthy Summit 2010 Quality of Life Project(www.healthysummit.org/). • It was designed to bring together the activities, concerns, data, and research agendas of all the health providers in Summit County, including its various public health centers. • In terms of our research, we chose the Summit 2010 website because of the wealth of data it provides. The database is organized into two major types of data: maps and reports
Summit 2010 Maps • The maps show how various social and health factors are spatially distributed across the 20 major census clusters in Summit County. Together, these maps provide a fantastic overview of the economic and health inequalities that exist within Summit County. http://www.healthysummit.org/QOL/QOL_Maps.cfm
Summit 2010 Reports • The reports provide a fantastic amount of data, including • (1) listings of all the health agencies in Summit County; • (2) historical narratives; • (3) in-depth neighborhood studies of three of the poorest communities in Summit County; • (4) statistical summaries of the county as a whole, as well as the 20 clusters, including a long list of economic (e.g., household income, job growth, etc), institutional (e.g., immunizations, education levels, etc), and health outcome indicators (e.g., mortality rates, morbidity rates, etc). http://www.healthysummit.org/QOL/qol_about.cfm
Ontological Challenge of Summit 2010 • The primary challenge we faced researching this database was how to manage and organize all of it. • The ontology used to organize the data is conventional and top-down. • First, the Summit County Combined Health District manages and controls the data and do not provide first-hand access to any of it—most notably the large county-wide database used to generate their statistical reports. • Second, they also decide what to release to the public and how. • Third, the audience for this website is narrowly defined as health care providers, primarily, and then a general audience. • Finally, all of the data are presented as summary reports. • While this ontology works wonderfully for the Summit County Combined Health District, it does not work for public researchers interested in analyzing this data.
Solving our Ontological Challenge • First, because the SACS Toolkit’s filing system (social complexity theory) is fundamentally spatial and relational we capitalized on the Summit 2010 maps. • We began with a spatial analysis of Summit County, exploring how the various economic, institutional and health outcome indicators were distributed across it. • We then used this spatial mapping to conceptualize Summit County as a complex social system comprised of 20 census bureau clusters. http://www.healthysummit.org/QOL/QOL_Maps.cfm
Solving our Ontological Challenge • We then converted the folders and filing system of the SACS Toolkit into a spatially distributed map of Summit County. This conversion allowed us to easily organize all of the visual, statistical and narrative data found on the Summit 2010 website.
Here is what our conceptual map of Summit County as a Complex System.
The best example of the speed and success of our conversion was the statistical database we created for Summit County and its 20 communities. We went through all the reports, copying and pasting information about each of the 20 communities into an SPSS database. The result was a typical vector matrix.
Methodological Challenge of Summit 2010 • The methodological challenge we faced had to do with how the website analyzed the data. • Put simply, the researchers who wrote the Summit 2010 reports employed a very traditional approach to community health science data analysis—one that is out of step with current trends in the community health science literature, which is beginning to treat communities as complex systems (See Cummins, Curtis, Diez-Roux and Macintyre 2007). • In the last ten years the growing concern amongst community health scientists is that communities need to be studied as systems. Reducing communities to dependent variables or little more than social context is static and limiting, yielding less and less meaningful results. Said another way, the traditional approaches to community health science are not teaching us things we do not already know. And yet, most community health websites continue to report data using the traditional causal pathways model of analysis. https://www.policypress.org.uk/catalog/product_info.php?cPath=&products_id=861
Here, for example, is the editorial overview of the special edition of Social Science & Medicine, which, in a few paragraphs, summarizes what we have said in our presentation. In terms of the importance of the complexity science model, Look to the first full paragraph in the second column The issue begins with a critical analysis of the notions of space and place that have been predominant in this field of work. In a careful argument, Cummins, Curtis, Diez-Roux, & Macintyre (2007) adopt a relational perspective to place and health, but go well beyond previous attempts to articulate such a perspective. Specifically, they argue for the importance of considering place ‘on the ground’ as being produced by multiple scales of contextual influence, thinking of places as dynamic and fluid, with different social meanings and importance for different people and most importantly, maintained by power relations at varying spatial scales. While many of these issues are well established in human geography, they have been underdeveloped in most of the recent flurry of research on the effects of places on health.
Solving our Methodological Challenge • The constraints of the current paper unfortunately do not allow us the luxury to review the details of our model building process. We can, however, offer a few highlights. • First, as we discussed above, using the filing system of social complexity theory, we were able to create a statistical database that contained all of the economic, institutional and health outcome indicators for all 20 communities at various points in time between 1990 and 2000. • In the language of matrix algebra, we had a database of 20 multidimensional row vectors, including a row vector for Summit County as a whole. • To this database we applied to two computational techniques: k-means cluster analysis and the self-organizing map (SOM). The SOM is a neural net created by TeuvoKohonenfor visualizing low-dimensional views of high-dimensional data (Castellani, Castellani and Spray 2002).
Solving our Methodological Challenge • We used k-means cluster analysis and the SOM to accomplish several goals. • First, we created a case-based, composite for each of the 20 communities in Summit County—into which our qualitative data could be integrated to create thick descriptions of the 20 communities. We did this for two points in time: 1990 and 2000. • Next, we used these composites to identify the major attractor basins in Summit County around which the 20 communities clustered—again, for both 1990 and 2000. • With our attractor points identified, we compared the 20 communities and their respective attractor basins to one another to determine their relative relationships—again, at two different times, 1990 and 2000.
Sagamore/Macedonia/ Northfield D=888 Copley/Bath/ Fairlawn D=1538 Twinsburg D=225 Richfield/Boston/ Peninsula D=1538 Stow/Silver Lake D=1538 C5 C1 Hudson D=00 Norton D=582 Cuyahoga Falls D=2784 Springfield D=3366 Franklin D=1254 C4 North Akron D=1449 C2 C7 Coventry/Green D=387 West Akron D=1277 C6 Munroe Falls/ Tallmadge D=2823 NW Akron D=1956 South Akron D=2218 C3 Barberton D=53 SW Akron D=2648 SE Akron D=542 Central Akron D=2648 Map of our 20 Clusters; Their 7 attractor points; and the Network They Form.
Solving our Methodological Challenge • Fourth, with our maps for both 1990 and 2000, we explored how Summit County and its network of 20 communities changed over time. This was important for our study of persistent poverty (aka poverty traps). • We computed this relationship by examining the relative change in the 20 communities in relation to their aggregate health outcomes for the years 1990 to 2000.
C1 and C5 moved slightly away from C4, but also moved further away from C3, C6 and C2 C6 and C2 moved closer to C3, which moved them downward in status 00 90 90 00 90 90 00 00 00 90 90 C4 moved upward in status and further away from the other clusters 00 C7 moved away from C4 toward C2 and therefore slightly downward in status C3 stayed the same Comparing 2000 to 1990, although the average per capita income increasedin the county, the location of the clusters in relation to each other remained, for the most part the same, or got worse, as in the case of the poorer communities.
Solving our Methodological Challenge • Why does this happen? • What we found is that the micro-level behaviors of the various communities, particularly the most affluent, result in an emergent, system-level effect wherein the residential mobility patterns of the rich and middle-class communities keep the poor communities poor. • Finally, we used all of this information to construct a simulation of Summit County, which we are now in the process of completing, to explore further the relationship between residential mobility patterns and community health—to see our simulation, visit www.personal.kent.edu/~bcastel3. • NetLogo Model
Conclusion • As we hope this brief introduction has shown, when it comes to modeling complex systems using digital data, the SACS Toolkit is an effective e-social scientific method. It is effective because of: • (1) its explicit, complex systems approach; • (2) its user-based ontology; • (3) its rigorous yet flexibly filing system, which is designed to function as a complex system; • (4) its case-based, data- compression, visual algorithm for modeling complex systems from the bottom-up; and • (5) its tremendous flexibility with all types of data and methods. • Given this list of qualifications, researchers may find the SACS Toolkit likewise effective in those instances where they seek to model a topic as a complex social system using digital data.