550 likes | 826 Views
Content analysis and grounded theory. Dr Ayaz Afsar. Introduction.
E N D
Content analysis and grounded theory • DrAyazAfsar
Introduction • This topic addresses two main forms of qualitative data analysis: content analysis and grounded theory. Many qualitative data analysts undertake forms of content analysis. One of the enduring problems of qualitative data analysis is the reduction of copious amounts of written data to manageable and comprehensible proportions. • Data reduction is a key element of qualitative analysis, performed in a way that attempts to respect the quality of the qualitative data. One common procedure for achieving this is content analysis, a process by which the ‘many words of texts are classified into much fewer categories’.
The goal is to reduce the material in different ways. • Categories are usually derived from theoretical constructs or areas of interest devised in advance of the analysis (pre-ordinate categorization) rather than developed from the material itself, though these may be modified, of course, by reference to the empirical data.
What is content analysis? • The term ‘content analysis’ is often used sloppily. In effect, it simply defines the process of summarizing and reporting written data – the main contents of data and their messages. More strictly speaking, it defines a strict and systematic set of procedures for the rigorous analysis, examination and verification of the contents of written data. • Krippendorp (2004: 18) defines it as ‘a research technique for making replicable and valid inferences from texts (or other meaningful matter) to the contexts of their use’. Texts are defined as any written communicative materials which are intended to be read, interpreted and understood by people other than the analysts.
Originally deriving from analysis of mass media and public speeches, the use of content analysis has spread to examination of any form of communicative material, both structured and unstructured. It may be ‘applied to substantive problems at the intersection of culture, social structure, and social interaction; used to generate dependent variables in experimental designs; and used to study groups as microcosms of society’. • Content analysis can be undertaken with any written material, from documents to interview transcriptions, from media products to personal interviews. It is often used to analyse large quantities of text, facilitated by the systematic, rule-governed nature of content analysis, not least because this enables computer-assisted analysis to be undertaken.
Content analysis has several attractions. It is an unobtrusive technique in that one can observe without being observed. It focuses on language and linguistic features, meaning in context, is systematic and verifiable (e.g. in its use of codes and categories), as the rules for analysis are explicit, transparent and public. • Further, as the data are in a permanent form (texts), verification through reanalysis and replication is possible.
Weber (1990: 9) sees the purposes of content analysis as including the coding of open-ended questions in surveys, the revealing of the focus of individual, group, institutional and societal matters, and the description of patterns and trends in communicative content. • The latter suggestion indicates the role of statistical techniques in content analysis; indeed Weber suggests that the highest quality content-analytic studies use both quantitative and qualitative analysis of texts (texts defined as any form of written communication). • Content analysis takes texts and analyses, reduces and interrogates them into summary form through the use of both pre-existing categories and emergent themes in order to generate or test a theory. It uses systematic, replicable, observable and rule-governed forms of analysis in a theory-dependent system for the application of those categories.
Krippendorp (2004: 22–4) suggests that there are several features of texts that relate to a definition of content analysis, including the fact that texts have no objective reader-independent qualities; rather they have multiple meanings and can sustain multiple readings and interpretations. There is no one meaning waiting to be discovered or described in them. Indeed, the meanings in texts may be personal and are located in specific contexts, discourses, and purposes, and, hence, meanings have to be drawn in context. Content analysis, then, describes the manifest characteristics of communication (asking who is saying what to whom, and how), infers the antecedents of the communication (the reasons for, and purposes behind, the communication, and the context of communication and infers the consequences of the communication (its effects). • Krippendorp suggests that content analysis is at its most successful when it can break down ‘linguistically constituted facts’ into four classes: attributions, social relationships, public behaviours and institutional realities.
How does content analysis work? • Ezzy (2002: 83) suggests that content analysis starts with a sample of texts (the units), defines the units of analysis (e.g. words, sentences) and the categories to be used for analysis, reviews the texts in order to code them and places them into categories, and then counts and logs the occurrences of words, codes and categories. • From here statistical analysis and quantitative methods are applied, leading to an interpretation of the results. Put simply, content analysis involves coding, categorizing (creating meaningful categories into which the units of analysis – words, phrases, sentences etc. – can be placed), comparing (categories and making links between them), and concluding – drawing theoretical conclusions from the text.
Content analysis involves counting concepts, words or occurrences in documents and reporting them in tabular form’. This indicates essential features of the process of content analysis: • breaking down text into units of analysis • undertaking statistical analysis of the units • presenting the analysis in as economical a form as possible. This masks some other important features of content analysis, including, for example, examination of the interconnectedness of units of analysis (categories), the emergent nature of themes and the testing, development and generation of theory. The whole process of content analysis can follow eleven steps.
Steps Step 1: Define the research questions to be addressed by the content analysis • This will also include what one wants from the texts to be content-analysed. The research questions will be informed by, indeed may be derived from, the theory to be tested. Step 2: Define the population from which units of text are to be sampled. • The population here refers not only to people but also, and mainly, to text – the domains of the analysis. For example, is it to be newspapers, programmes, interview transcripts, textbooks, conversations, public domain documents, examination scripts, emails, online conversations and so on? Step 3: Define the sample to be included • Here the rules for sampling people can apply equally well to documents. One has to decide whether to opt for a probability or non-probability sample of documents, a stratified sample (and, if so, the kind of strata to be used), random sampling, convenience sampling, domain sampling, cluster sampling, purposive, systematic, time sampling, snowball and so on.
Robson (1993: 275–9) indicates the careful delineation of the sampling strategy here, for example, such-and-such a set of documents, such-and-such a time frame (e.g. of newspapers), such-and-such a number of television programmes or interviews. • The key issues of sampling apply to the sampling of texts: representativeness, access, size of the sample and generalizability of the results. • Krippendorp (2004: 145) indicates that there may be ‘nested recording units’, where one unit is nested within another, for example, with regard to newspapers that have been sampled it may be thus:
the issues of a newspaper sampled; the articles in an issue of a newspaper sampled; the paragraphs in an article in an issue of a newspaper sampled; the propositions constituting a paragraph in an article in an issue of a newspaper sampled. Step 4: Define the context of the generation of the document. • This will examine, for example: how the material was generated; who was involved; who was present; where the documents come from; how the material was recorded and/or edited; whether the person was willing to, able to, and did tell the truth; whether the data are accurately reported ; whether the data are corroborated; the authenticity and credibility of the documents; the context of the generation of the document; the selection and evaluation of the evidence contained in the document.
Step 5: Define the units of analysis • This can be at very many levels, for example, a word, phrase, sentence, paragraph, whole text, people and themes. Robson (1993: 276) includes here, for newspaper analysis, the number of stories on a topic, column inches, size of headline, number of stories on a page, position of stories within a newspaper, the number and type of pictures. His suggestions indicate the careful thought that needs to go into the selection of the units of analysis. Different levels of analysis will raise different issues of reliability, and these are discussed later. • It is assumed that the units of analysis will be classifiable into the same category text with the same or similar meaning in the context of the text itself (semantic validity) although this can be problematic (discussed later). • The description of units of analysis will also include the units of measurement and enumeration.
Cont…Steps • The coding unit defines the smallest element of material that can be analysed, while the contextual unit defines the largest textual unit that may appear in a single category. • Krippendorp distinguishes three kinds of units. Sampling units are those units that are included in, or excluded from, an analysis; they are units of selection. Recording/coding units are units that are contained within sampling units and are smaller than sampling units, thereby avoiding the complexity that characterises sampling units; they are units of description. • Context units are ‘units of textual matter that set limits on the information to be considered in the description of recording units’; they are units that ‘delineate the scope of information that coders need to consult in characterising the recording units’
Krippendorp (2004) continues by suggesting a further five kinds of sampling units: physical (e.g. time, place, size); syntactical (words, grammar, sentences, paragraphs, chapters, series etc.); categorical (members of a category have something in common); propositional (delineating particular constructions or propositions); and thematic (putting texts into themes and combinations of categories). The issue of categories signals the next step. The criterion here is that each unit of analysis (category – conceptual, actual, classification element, cluster, issue) should be as discrete as possible while retaining fidelity to the integrity of the whole, i.e. that each unit must be a fair rather than a distorted representation of the context and other data. The creation of units of analysis can be done by ascribing codes to the data.
Step 6: Decide the codes to be used in the analysis • Codes can be at different levels of specificity and generality when defining content and concepts. There may be some codes which subsume others, thereby creating a hierarchy of subsumption – subordination and superordination – in effect creating a tree diagram of codes. Some codes are very general; others are more specific. They keep words as words; they maintain context specificity. • Codes may be descriptive and might include: situation codes; perspectives held by subjects; ways of thinking about people and objects; process codes; activity codes; event codes; strategy codes; relationship and social structure codes; methods codes. However, to be faithful to the data, the codes themselves derive from the data responsively rather than being created pre-ordinately. Hence the researcher will go through the data ascribing codes to each piece of datum.
A code is a word or abbreviation sufficiently close to that which it is describing for the researcher to see at a glance what it means (in this respect it is unlike a number). For example, the code ‘trust’ might refer to a person’s trustworthiness; the code ‘ power’ might refer to the status or power of the person in the group. • Miles and Huberman (1984) advise that codes should be kept as discrete as possible and that coding should start earlier rather than later as late coding enfeebles the analysis, although there is a risk that early coding might influence too strongly any later codes. • It is possible, they suggest, for as many as ninety codes to be held in the working memory while going through data, although clearly, there is a process of iteration and reiteration whereby some codes that are used in the early stages of coding might be modified subsequently and vice versa, necessitating the researcher to go through a data set more than once to ensure consistency, refinement, modification and exhaustiveness of coding (some codes might become redundant, others might need to be broken down into finer codes). By coding up the data the researcher is able to detect frequencies (which codes are occurring most commonly) and patterns (which codes occur together).
Hammersley and Atkinson propose that the first activity here is to read and reread the data to become thoroughly familiar with them, noting also any interesting patterns, any surprising, puzzling or unexpected features, any apparent inconsistencies or contradictions (e.g. between groups, within and between individuals and groups, between what people say and what they do). Step 7: Construct the categories for analysis • Categories are the main groupings of constructs or key features of the text, showing links between units of analysis. For example, a text concerning teacher stress could have groupings such as ‘causes of teacher stress’, ‘the nature of teacher stress’, ‘ways of coping with stress’ and ‘the effects of stress’.
Categories are inferred by the researcher, whereas specific words or units of analysis are less inferential; the more one moves towards inference, the more reliability may be compromised, and the more the researcher’s agenda may impose itself on the data. • Categories will need to be exhaustive in order to address content validity; indeed Robson (1993: 277) argues that a content analysis ‘is no better than its system of categories’ and that these can include: subject matter; direction (how a matter is treated – positively or negatively); values; goals; method used to achieve goals; traits (characteristics used to describe people); actors (who is being discussed); authority (in whose name the statements are being made); location; conflict (sources and levels); and endings (how conflicts are resolved).
This stage (i.e. constructing the categories) is sometimes termed the creation of a ‘domain analysis’. This involves grouping the units into domains, clusters, groups, patterns, themes and coherent sets to form domains. A domain is any symbolic category that includes other categories. At this stage it might be useful for the researcher to recode the data into domain codes, or to review the codes used to see how they naturally fall into clusters, perhaps creating overarching codes for each cluster. • Unitization is the process of putting data into meaning units for analysis, examining data, and identifying what those units are.
A meaning unit is simply a piece of datum which the researcher considers to be important; it may be as small as a word or phrase, or as large as a paragraph, groups of paragraphs, or, indeed, a whole text, provided that it has meaning in itself. Spradley (1979) suggests that establishing domains can be achieved by four analytic tasks: • selecting a sample of verbatim interview and field notes • looking for the names of things • identifying possible terms from the sample • searching through additional notes for other items to include. He identifies six steps to achieve these tasks:
six steps to achieve these tasks: • select a single semantic relationship • prepare a domain analysis sheet • select a sample of statements from respondents • search for possible cover terms and include those that fit the semantic relationship identified • formulate structural questions for each domain identified • list all the hypothesized domains.
Domain analysis, then, strives to discover relationships between symbols. • Like codes, categories can be at different levels of specificity and generality. Some categories are general and overarching; others are less so. Typically codes are much more specific than categories. This indicates the difference between nodes and codes. A code is a label for a piece of text; a node is a category into which different codes fall or are collected. A node can be a concept, idea, process, group of people, place or, indeed, any other grouping that the researcher wishes it to be; it is an organizing category. Whereas codes describe specific textual moments, nodes draw together codes into a categorical framework, making connections between coded segments and concepts.
It is rather like saying that a text can be regarded as a book, with the chapters being the nodes and the paragraphs being the codes, or the content pages being the nodes and the index being the codes. Nodes can be related in several ways, for example: one concept can define another; they can be logically related; and they can be empirically related. Step 8: Conduct the coding and categorizing of the data. • Once the codes and categories have been decided, the analysis can be undertaken. This concerns the actual ascription of codes and categories to the text. Coding has been defined by Kerlinger (1970) as the translation of question responses and respondent information to specific categories for the purpose of analysis. Many questions are precoded, that is, each response can be immediately and directly converted into a score in an objective way. Rating scales and checklists are examples of precoded questions. Coding is the ascription of a category label to a piece of data; which is either decided in advance or in response to the data that have been collected.
Mayring suggests that summarizing content analysis reduces the material to manageable proportions while maintaining fidelity to essential contents, and that inductive category formation proceeds through summarizing content analysis by inductively generating categories from the text material. • This is in contrast to explicit content analysis, the opposite of summarizing content analysis, which seeks to add in further information in the search for intelligible text analysis and category location. The former reduces contextual detail, the latter retains it. Structuring content analysis filters out parts of the text in order to construct a cross-section of the material using specifiedpreordinate criteria.
It is important to decide whether to code simply for the existence or the incidence of the concept. • This is important, as it would mean that, in the case of the former – existence – the frequency of a concept would be lost, and frequency may give an indication of the significance of a concept in the text. • Further, the coding will need to decide whether it should code only the exact words or those with a similar meaning. The former will probably result in significant data loss, as words are not often repeated in comparison to the concepts that they signify; the latter may risk losing the nuanced sensitivity of particular words and phrases. • Indeed some speechmakers may deliberately use ambiguous words or those with more than one meaning. • In coding a piece of transcription the researcher goes through the data systematically, typically line by line, and writes a descriptive code by the side of each piece of datum, for example:
Text Code • The students will undertake PROB problem-solving in science • I prefer to teach mixed ability classes MIXABIL One can see that the codes here are abbreviations, enabling the researcher to understand immediately the issue that they denote because they resemble that issue (rather than, for example, ascribing a number as a code for each piece of datum, where the number provides no clue as to what the datum or category concerns). Where they are not abbreviations, Miles and Huberman (1994) suggest that the coding label should bear sufficient resemblance to the original data so that the researcher can know, by looking at the code, what the original piece of datum concerned.
Step 9: Conduct the data analysis • Once the data have been coded and categorized, the researcher can count the frequency of each code or word in the text, and the number of words in each category. This is the process of retrieval, which may be in multiple modes, for example words, codes, nodes and categories. Some words may be in more than one category, for example where one category is an overarching category and another is a subcategory. • To ensure reliability, Weber suggests that it is advisable at first to work on small samples of text rather than the whole text, to test out the coding and categorization, and make amendments where necessary. The complete texts should be analysed, as this preserves their semantic coherence.
Words and single codes on their own have limited power, and so it is important to move to associations between words and codes, i.e. to look at categories and relationships between categories. • Establishing relationships and linkages between the domains ensures that the data, their richness and ‘context-groundedness’ are retained. Linkages can be found by identifying confirming cases, by seeking ‘underlying associations’ and connections between data subsets. • Weber suggests that it is preferable to retrieve text based on categories rather than single words, as categories tend to retrieve more than single words, drawing on synonyms and conceptually close meanings. One can make category counts as well as word counts. Indeed, one can specify at what level the counting can be conducted, for example, words, phrases, codes, categories and themes.
The implication here is that the frequency of words, codes, nodes and categories provides an indication of their significance. This may or may not be true, since subsequent mentions of a word or category may be difficult in certain texts (e.g. speeches). Frequency does not equal importance, and not saying something (withholding comment) may be as important as saying something. Content analysis analyses only what is present rather than what is missing or unsaid. Further, as Weber (1990) says: • pronouns may replace nouns the further on one goes through passage; continuing raising of the issue may cause redundancy as it may be counter-productive repetition; constraints on text length may inhibit reference to the theme; some topics may require much more effort to raise than others
The researcher can summarize the inferences from the text, look for patterns, regularities and relationships between segments of the text, and test hypotheses. The summarizing of categories and data is an explicit aim of statistical techniques, for these permit trends, frequencies, priorities and relationships to be calculated. At the stage of data analysis there are several approaches and methods that can be used. Krippendorp suggests that these can include: • extrapolations: trends, patterns and differences • standards: evaluations and judgements • indices: e.g. of relationships, frequencies of occurrence and co-occurrence, number of favourable and unfavourable items linguistic re-presentations.
Once frequencies have been calculated, statistical analysis can proceed, using, for example: • factor analysis: to group the kinds of response • tabulation: of frequencies and percentages • cross-tabulation: presenting a matrix where the words or codes are the column headings and the nominal variables (e.g. the newspaper, the year, the gender) are the row headings • correlation: to identify the strength and direction of association between words, between codes and between categories • graphical representation: for example to report the incidence of particular words, concepts, categories over time or over texts • regression: to determine the value of one variable/word/code/category in relationship to another – a form of association that gives exact values and the gradient or slope of the goodness of fit line of relationship – the regression line • multiple regression: to calculate the weighting of independents on dependent variables structural equation modelling and LISREL
analysis: to determine the multiple directions of causality and the weightings of differentassociations in a pathway analysis of causal relations • dendrograms: tree diagrams to show the relationship and connection between categories and codes, codes and nodes. • While conducting qualitative data analysis using numerical approaches or paradigms may be criticized for being positivistic, one should note that one of the founders of grounded theory (Glaser 1996) is on record as saying that not only did grounded theory develop out of a desire to apply a quantitative paradigm to qualitative data, but also paradigmal purity was unacceptable in the real world of qualitative data analysis, in which fitness for purpose should be the guide. Further, one can note that Miles and Huberman (1984) strongly advocate the graphic display of data as an economical means of reducing qualitative data. Such graphics might serve both to indicate causal relationships as well as simply summarizing data.
Step 10: Summarizing • By this stage the investigator will be in a position to write a summary of the main features of the situation that have been researched so far. The summary will identify key factors, key issues, key concepts and key areas for subsequent investigation. It is a watershed stage during the data collection, as it pinpoints major themes, issues and problems that have arisen, so far, from the data (responsively) and suggests avenues for further investigation. The concepts used will be a combination of those derived from the data themselves and those inferred by the researcher . At this point, the researcher will have gone through the preliminary stages of theory generation. Patton (1980) sets these out for qualitative data:
finding a focus for the research and analysis organizing, processing, ordering and checking data. • writing a qualitative description or analysis inductively developing categories, typologies and labels. • analysing the categories to identify where further clarification and cross-clarification are needed. • expressing and typifying these categories through metaphors • making inferences and speculations about relationships, causes and effects.
Bogdan and Biklen (1992: 154–63) identify several important factors that researchers need to address at this stage, including forcing oneself to take decisions that will focus and narrow the study and decide what kind of study it will be; developing analytical questions; using previous observational data to inform subsequent data collection; writing reflexive notes and memos about observations, ideas, what is being learned; trying out ideas with subjects; analysing relevant literature while conducting the field research; generating concepts, metaphors and analogies and visual devices to clarify the research. • Step 11: Making speculative inferences • This is an important stage, for it moves the research from description to inference. It requires the researcher, on the basis of the evidence, to posit some explanations for the situation, some key elements and possibly even their causes. It is the process of hypothesis generation or the setting of working hypotheses that feeds into theory generation.
The stage of theory generation is linked to grounded theory, and I will turn to this later in the lecture. Here I will provide an example of content analysis that does not use statistical analysis but which nevertheless demonstrates the systematic approach to analysing data that is at the heart of content analysis.
At a wider level, the limits of content analysis are suggested by Ezzy who argues that, due to the pre-ordinate nature of coding and categorizing, content analysis is useful for testing or confirming a pre-existing theory rather than for building a new one, though this perhaps understates the ways in which content analysis can be used to generate new theory, not least through a grounded theory approach (discussed later). In many cases content analysts know in advance what they are looking for in text, and perhaps what the categories for analysis will be. Ezzy (2002: 85) suggests that this restricts the extent to which the analytical categories can be responsive to the data, thereby confining the data analysis to the agenda of the researcher rather than the ‘other’. In this way it enables pre-existing theory to be tested. Indeed Mayring (2004: 269) argues that if the research question is very open or if the study is exploratory, then more open procedures than content analysis, e.g. grounded theory, may be preferable.
However, inductive approaches may be ruled out of the early stages of a content analysis, but this does not keep them out of the later stages, as themes and interpretations may emerge inductively from the data and the researcher, rather than only or necessarily from the categories or pre-existing theories themselves. Hence to suggest that content analysis denies induction or is confined to the testing of pre-existing theory is uncharitable; it is to misrepresent the flexibility of content analysis. Indeed Flick (1998) suggests that pre-existing categories may need to be modified if they do not fitthe data.
Grounded theory • Theory generation in qualitative data can be emergent, and grounded theory is an important method of theory generation. It is more inductive than content analysis, as the theories emerge from, rather than exist before, the data. Strauss and Corbin (1994: 273) remark: ‘grounded theory is a general methodology for developing theory that is grounded in data systematically gathered and analysed’. There are several features of this definition: • Theory is emergent rather than predefined and tested. • Theory emerges from the data rather than vice versa. • Theory generation is a consequence of, and partner to, systematic data collection and analysis. • Patterns and theories are implicit in data, waiting to be discovered.
Glaser (1996) suggests that ‘grounded theory is the systematic generation of a theory from data’; it is an inductive process in which everything is integrated and in which data pattern themselves rather than having the researcher pattern them, as actions are integrated and interrelated with other actions. Glaser and Strauss’s (1967) seminal work rejects simple linear causality and the decontextualization of data, and argues that the world which participants inhabit is multivalent, multivariate and connected. Glaser (1996) says, ‘the world doesn’t occur in a vacuum’ and the researcher has to take account of the interconnectedness of actions. In everyday life, actions are interconnected and people make connections naturally; it is part of everyday living, and hence grounded theory catches the naturalistic element of research and formulates it into a systematic methodology. Grounded theory is faithful to how people act; it takes account of apparent inconsistencies, contradictions, discontinuities and relatedness in actions.
Grounded theory is a systematic theory, using systematized methods (discussed below) of theoretical sampling, coding constant comparison, the identification of a core variable, and saturation. • Grounded theory is not averse to quantitative methods, it arose out of them (Glaser 1996) in terms of trying to bring to qualitative data some of the analytic methods applied in statistical techniques (e.g. multivariate analysis). • In grounded theory the researcher discovers what is relevant; indeed Glaser and Strauss’s (1967) work is entitled The Discovery of Grounded Theory. • However, where it parts company with much quantitative, positivist research is in its view of theory. In positivist research the theory pre-exists its testing and the researcher deduces from the data whether the theory is robust and can be confirmed. The data are ‘forced’ into a fit with the theory. Grounded theory, on the other hand, does not force data to fit with a predetermined theory;indeed the difference between inductive and deductive research is less clear than it appears to be at first sight. For example, before one can deduce, one has to generate theory and categories inductively.
Grounded theory starts with data, which are then analysed and reviewed to enable the theory to be generated from them; it is rooted in the data and little else. Here the theory derives from the data – it is grounded in the data and emerges from it. As Lincoln and Guba (1985: 205) argue, grounded theory must fit the situation that is being researched. • Glaser (1996) writes that ‘forcing methodologies were too ascendant’, not least in positivist research and that grounded theory had to reject forcing or constraining the nature of a research investigation by pre-existing theories. • As grounded theory sets aside any preconceived ideas, letting the data themselves give rise to the theory, certain abilities are required of the researcher, for example:
tolerance and openness to data and what is emerging • tolerance of confusion and regression (feeling stupid when the theory does not become immediately obvious) • resistance to premature formulation of theory ability to pay close attention to data • willingness to engage in the process of theory • generation rather than theory testing; it is an experiential methodology • ability to work with emergent categories rather than preconceived or received categories.
As theory is not predetermined, the role of targeted pre-reading is not as strong as in other kinds of research (e.g. using literature reviews to generate issues for the research), indeed it may be dangerous as it may prematurely close off or determine what one sees in data; it may cause one to read data through given lenses rather than anew. As one does not know what one will find, one cannot be sure what one should read before undertaking grounded theory. One should read widely, both within and outside the field, rather than narrowly and in too focused a direction. • There are several elements of grounded theory that contribute to its systematic nature, and it is to these that I now turn.
Theoretical sampling • In theoretical sampling, data are collected on an ongoing, iterative basis, and the researcher keeps on adding to the sample until there is enough data to describe what is going on in the context or situation under study and until ‘theoretical saturation’ is reached (discussed below). As one cannot know in advance when this point will be reached, one cannot determine the sample size or representativeness until one is actually doing the research. In theoretical sampling, data collection continues until sufficient data have been gathered to create a theoretical explanation of what is happening and what constitutes its key features. • It is not a question of representativeness, but, rather, a question of allowing the theory to emerge.
Theoretical sampling is the process of data collection for generating theory whereby the analyst jointly collects, codes, and analyses his data and decides what data to collect next and where to find them, in order to develop his theory as it emerges. This process of data collection is controlled by the emerging theory • Tthe basic criterion governing the selection of comparison groups for discovering theory is their theoretical relevance for furthering the development of emerging categories rather than, for example, conventional sampling strategies.
Coding • Coding is the process of disassembling and reassembling the data. Data are disassembled when they are broken apart into lines, paragraphs or sections. These fragments are then rearranged, through coding, to produce a new understanding that explores similarities, differences, across a number of different cases. The early part of coding should be confusing, with a mass of apparently unrelated material. However, as coding progresses and themes emerge, the analysis becomes more organized and structured.
In grounded theory there are three types of coding: open, axial and selective coding, the intention of which is to deconstruct the data into manageable chunks in order to facilitate an understanding of the phenomenon in question. • Open coding involves exploring the data and identifying units of analysis to code for meanings, feelings, actions, events and so on. The researcher codes up the data, creating new codes and categories and subcategories where necessary, and integrating codes where relevant until the coding is complete. • Axial coding seeks to make links between categories and codes, ‘to integrate codes around the axes of central categories’; the essence of axial coding is the interconnectedness of categories (Cresswell 1998: 57). Hence codes are explored, their interrelationships are examined, and codes and categories are compared to existing theory. • Selective coding involves identifying a core code; the relationship between that core code and other codes is made clear and the coding scheme is compared with pre-existing theory. Cresswell (1998: 57) writes that ‘in selective coding, the researcher identifies a ‘‘story line’’ and writes a story that integrates the categories in the axial coding model’.