ECitS Conference, 5-7 September 2012 University of Kent, Canterbury, UK

Evidentiality and Epistemicity in a Corpus of Scientific Biomedical Papers from the British Medical Journal. A focus on “evidence” and “cause/s” ECitS Conference, 5-7 September 2012 University of Kent, Canterbury, UK *I. Riccioni, *R. Bongelli, *C. Canestrari, *C. Buldorini, **R. Pietrobon, *Andrzej Zuczkowski * Universityof Macerata (Italy) ** Duke University, Durham, North Carolina (USA)

INTRODUCTION Asresearchers, weanalyselinguisticcommunication, mostlythrough a qualitative and quantitative analysis of the syntactic, semantic, and pragmaticlevels.Ourtheoretical and methodological background integratesaspects from • ConversationalAnalysis (interruptions, overlaps, negotiation, politeness ,etc.); • Discourse Analysis (speechacts, for examplegivingadvice in trouble talk contexts, etc) • Text Theory (in particular, J. S. Petoefi ‘s structural model of communication.)

Wehavebeenworking for severalyears on differenttypes of oral (recorded and transcribed) corpora (naturallyoccuringconversations, politicaldiscourses, humorousinteractions, doctor-patientdialogues, psychoterapeutic sessions, etc). Wehavebeenworkingalso on the communication of certainty and uncertainty in differenttypes of writtentexts (academic, biomedical, literary and so on) In 2009 wegotinvolved in the projecttitledA Corpus of Scientific Biomedical Texts Spanning over 168 years annotated for Uncertainty with an American colleague from the Duke University of North Caroline Professor Ricardo Pietrobonwhois a surgeoninterested in “research on research” and “scientificwriting” http://goo.gl/zTBPI, https://sites.google.com/site/biouncertainty/ .

The communication of Uncertainty in a corpus of scientific biomedical texts spanning over 168 years • Aims: • identify lexical and morphosyntactic markers of uncertainty and their linguistic scopein a corpus of 80 papers randomly selected from BMJ from 1840 to 2007 and • detect their trends over time.

LITERATURE BACKGROUND • The topic of certainty/uncertainty in communication is related, more or less directly, to what in linguistic literature is calledepistemicity and evidentiality (and with related topics/concepts such as subjectivity, modality and hedging or mitigation) • This area of study has attracted a great deal of interest over the past three decades or so, inevitably resulting in a multitude of terms and conflicting definitions (see Dendale and Tasmowski 2001).

EPISTEMICITY • It refers to those linguistic markers that, according to different authors, reveal speaker’s/writer’s: • attitude regarding the reliability of the information (e.g. Dendale and Tasmowski 2001, González 2005) • judgment of the likelihood of the proposition (e.g. Nuyts 2001b, Plungian 2001, Cappelli 2007, Cornillie 2007) • commitment to the truth of the message (e.g. Sanders and Spooren 1996, De Haan 1999, González 2005)

A piece of information is communicated as certainwhen the speaker’s/writer’s commitment to its truth is at the maximum or high level, such as in the example (1) “These workers showed that there is an inverse correlation between the height of the hyperbilirubinaemia and the amount of bile excreted in the faecese” (Aethiology of physiological jaundice of the newborn, 1951) (2) “All the ill effects of ruptured perineum and prolapsus uteri are relieved with certainty by a simple plastic operation” (Vesico-vaginal and rectovaginal fistula, 1861)

A piece of information is communicated as uncertainwhen the speaker’s/writer’s commitment to its truth is at the minimum or low level, such as in the example (3) “the evidence suggests that it is not likely to have been wrong in more than a small proportion” (Lung Cancer, 1956) (4) “Perhaps, however, the strongest proof of the importance of local rest is furnished by those cases in which a pleural effusion has occurred on the affected side.” (On the importance of rest in the treatment of acute phthisis, )

EVIDENTIALITY • With the term evidentiality, scholars generally refer to the coding of • sources of information and • modes of knowing (Chafe 1986, Nuyts 2001a, 2001b, Plungian 2001, Cornillie 2007, Papafragou et al. 2007) i.e. the linguistic markers that reveal how speakers/writers gain access to the piece of information they communicate (Willett, 1988).

If a doctor says (5) “I see a cyst”, he explicitly communicates the information source; though in the sentence there is no epistemic marker, the verb I see is enough to implicitly communicate Certainty.

EVIDENTIALITY & EPISTEMICITY • Evidentiality and epistemicity seem to be two sides of the same coin, in that: • When a piece of information is communicated as (if it were) certain (epistemicity) by writers, at the same time it is also communicated as (if it were) known (evidentiality) to them (and vice versa). • When a piece of information is communicated as (if it were) uncertain, at the same time it is also communicated as (if it were) believed by them (and vice versa).

KUB THEORY The multitude of evidential and epistemic markers (lexical and morpho-syntactic) can be led back and reduced to three main macro-markers: • I know • I do not know • I do not know whether (believe) These reflect the three basic evidential and epistemic territories of information (adapting Kamio’s terminology (1991, 1994)) of the Known, the Unknown, and the Believed (KUB)

The Known is all that writers say they know (perceive, remember etc.) in a broad sense. From an epistemic viewpoint such markers communicate Certainty. The Believed is all that writers say they do not know if/whether (impressions, opinions, suppositions etc.). From an epistemic viewpoint such markers communicate Uncertainty. The Unknown is when writers communicate what they do not know, i.e. when the information is unknown to them.

THE PRESENT STUDY For this conference we carried out the present pilot study on how evidence, causality and their relationships are communicated in BMJ papers (i.e. if they are communicated as certain or uncertain; in declarative or hypothetical structures, etc.) In particular, we focused on the terms • Evidence • Cause /causes • Their relations The method combined a qualitative analysis with a quantitative, the latter being performed using the WordSmith Tools version 5 (Scott 2008).

EVIDENCE Out of the 80 papers we extracted and analyzed all 102 fragments where the term “evidence” occurred in a sentence. Our analysis criteria included: • types of sentence (affirmative, negative, interrogative); • the sentence is communicated as Certain-Known, Uncertain-Believed, Unknown; • types of evidence. Affirmative - Certain - Direct observation (6) “Auscultation of the chest revealed evidence of increased activity in the right upper lobe.”(The treatment of pulmonary tuberculosis by nitrogen compression, 1914) Negative - Uncertain - Medical practice (7) This is simply a conjecture, however, which though possible does not seem probable, and has as yet, so far as my experience goes, no evidence to support it.(The treatment of ringworm of the scalp by the x rays, 1905)

Types of evidence: • directobservation: 27 (23.5%); • lab analysis, clinicalexams, histologicalanalysis: 25 (21.7%); • statisticalanalysis: 14 (12.2%); • literaturereview: 13 (11.3%); • experimental data: 11 (9.6%); • medicalinstruments: 6 (5.2%); • others: 5 (4.3%); • unqualified: 5 (4.3%); • epidemiological data: 3 (2.6%); • reasoning, inference: 3 (2.6%); • medicalpractice 3: (2.6%).

CAUSE Out of the 80 papers we extracted and analysed all 103 fragments where the term “cause/s” occurred in a sentence. The analysiscriteriaincluded: • Types of sentence (affirmative, negative, hyphotetical, interrogative); • The causal relations communicated as Certain-Known, Uncertain- Believed, Unknown. Affirmative - Certain (8) “Koch has thus added to our conviction that the bacillus is the cause of the symptoms, seeing that, as he remarks, it is impossible to suppose that an organism can develop in such enormous numbers at the expense of the vital fluid, without exerting a serious influence upon the system. “(Remarks on micro-organisms 1880)

Affirmative – Uncertain (9) “When confronted with a case of this kind, we must avoid the administration of any drug likelyto cause either undue contraction or relaxation of the organ. Absolute rest is the best treatment.” (The determinant of abortion and how to combat them, 1907) Affirmative -Unknown (10) “…the cause of this symptom is unknown, but sleep is an important factor” (Do asthmatics suffer bronchoconstriction during rapid eye movement sleep?, 1986)

EVIDENCE AND CAUSALITY Out of the 80 papers we extracted 42 fragments where a relation between evidence and causality is explicit. We found 7 different types of relations: Type 1. evidence is insufficient to establish a causal link: 14 (33%); (11) “…Recently there has been much experimental data to show the causative relation of adrenalin to these degenerative changes, but it has not been definitely settled whether this is a direct effect or due to increased tension.” (An address of the treatment of chronic degenerative lesions of the heart and aorta, 1909)

Type 2. evidence establishes a causal link: 9 (21.4%); (12) “…This was pretty conclusive evidence that the organism was the cause of the disease, and that it constituted the true infective element; because any other material that might be supposed to accompany it in the blood of the diseased animal must have been got rid of by the successive cultivations in chicken-broth.” (Remarks on micro-organism, 1880) • Type 3. there is no evidence of a causal link: 9 (21.4%); • Type 4. evidence denies a causal link: 3 (7.1%); • Type 5. evidence suggests the existence of a causal link: 3 (7.1%); • Type 6. evidence shows a weak causal link: 3 (7.1%); • Type 7. evidence suggests the non existence of a causal link: 1 (2.4%).

MAIN RESULTS Analysis of the terms “evidence” and “cause/s” demonstrates that in the BMJ corpus they are mainly communicated as • Certain-Known and • in affirmative way. • Out of the 11 different types of evidence we identified, the most common patterns are: • direct observation 27 (23.5%); • lab analysis, clinical exams, histological analysis 25 (21.7%); • statistical analysis 14 (12.2%). • Out of the 7 different relations between evidence and causality we identified, the most common are: • evidence is insufficient to establish a causal link: 14 (33%); • evidence establishes a causal link: 9 (21.4%); • there is no evidence of a causal link: 9 (21.4%).

CONCLUSION & FUTURE STEPS • At the end of the project we started in 2009, we will have made a significant improvement in our knowledge about the historical evolution of the communication of certainty, uncertainty, evidence, causality and their relationships in the writing of scientific papers within a 168-year span. We now plan on: • performing a qualitative analysis of the other terms (related to evidence and causality so far, we have only identified their numerical occurrences using WordSmith Tools); • verifying the significance of a trend observed in the distribution of the terms related to evidence and causality during the period we consider

Thanks for your attention!

Results : • Preliminary results on the corpus data show that there isn’t a significant difference in the use of the different uncertainty markers along the years. • The results of the NLP experiments show that most of the Uncertainty markers can be recognized with good accuracy (Bongelli et. al 2012a; Bongelli et. al 2012b); At the moment, we are working on these results and on theidentification of the scope of the Uncertainty markers. In their grammar Quirk et al (1985) define this word as “…the general term that we shall use to describe the semantic ‘influence’ which such words have on neighbouring parts of a sentence. It deserves attention because of its close connection with the ordering of elements.”

ECitS Conference, 5-7 September 2012 University of Kent, Canterbury, UK

ECitS Conference, 5-7 September 2012 University of Kent, Canterbury, UK

Presentation Transcript

7 th National Homelessness Conference September 7, 2012

Evidence and causality in the sciences University of Kent, Centre for Reasoning 5-7 September 2012

UNIVERSITY OF CANTERBURY

September 5, 2012

Social Justice in the University Context University of Kent at Canterbury March 2012

September 5, 2012

Alabama Water resources Conference; Orange Beach, Alabama, September 5 - 7, 2012

September 7, 2012

September 5, 2012

University of Strathclyde SEYER Conference September 7 th 2013

September 5, 2012

September 7, 2012

Kaniz Fatema David Chadwick Stijn Lievens University of Kent School of Computing Canterbury, UK

Ugwushi Bellema Ihua Dunnhumby Academy of Consumer Research University of Kent, Canterbury

7 September 2012

University of Kent

University of Canterbury

David Long MSc MBA FHEA Canterbury Christ Church University Canterbury Kent United Kingdom

Strive Conference, September, 2012

UNIVERSITY OF CANTERBURY

September 5, 2012

5 September 2012