200 likes | 372 Views
grant. Representing and coding the knowledge embedded in texts of Health Science Web published articles ElPub2007 Conference, Vienna, Austria, June 2007. Carlos Henrique Marcondes, marcon@vm.uff.br , Marília Alvarenga Rocha Mendonça, Luciana Reis Malheiros Leonardo Cruz da Costa
E N D
grant Representing and coding the knowledge embedded in texts of Health Science Web published articlesElPub2007 Conference, Vienna, Austria, June 2007 Carlos Henrique Marcondes, marcon@vm.uff.br, Marília Alvarenga Rocha Mendonça, Luciana Reis Malheiros Leonardo Cruz da Costa Tatiana Cristina Paredes Santos Luciana Guimarães Pereira UFF - Universidade Federal Fluminense, Rio de Janeiro, Brazil Keywords: electronic publishing, scientific methodology, scientific communication, knowledge representation, ontologies, Semantic Web
Context • Scholar electronic journals are still based on print model and do not take full advantage of facilities offered by the Web environment • Semantic Web Initiative • Web Ontologies are becoming the humanity public knowledge bases, alternative to collections in libraries
Problem • Knowledge is embedded in the text of scientific articles for human reading, in an unstructured format, not adequate for program processing • Scientific communication is a slow social process which depends on discourse, text producing and reading/interpreting/inquiring until new knowledge is incorporated to the corpus of Science • IT has been applied to bibliographic information systems to improve scientific communication, providing fast notification and access to full-text documents. But IT is not yet used to directly process the knowledge embedded in the text of scientific articles
Question • Is it feasible the development of a Web authoring/self-publishing tool which enables the publishing of scientific articles both as text and extracting the knowledge embedded in texts, recording it in program “understandable” format? • Knowledge extracted and recorded in program “understandable” format will enable inferences by software agents: • consistency checking and validation of new contributions to Science • rich semantic retrieval, etc • scientific discovery identification
Research objetives • Extract, represent and code in program “understandable” format the knowledge embedded in texts of Health Science Web published articles • step 1 – develop a model to the scientific reasoning procedure and the knowledge content of a scientific article in program “understandable” format √ • step 2 – empirical test, validate and enhance the model by analyzing articles in Health Science √ • step 3 – develop a Web authoring/self-publishing tool which enables the extraction, marking-up and recording of knowledge as a by-product of writing-publishing a scientific an article by a scholar • step 4 – use the model to identify discoveries in Science
Hypotheses • Scientific articles are highly structured pieces of texts reflecting reasoning procedures established by the Scientific Method • “The text of observational and experimental articles is usually… divided into sections with the headings IMRAD - Introduction, Methods, Results, and Discussion. This structure is not simply an arbitrary publication format, but rather a direct reflection of the process of scientific discovery”, Uniform Requirements for Manuscripts Submitted to Biomedical Journals(http://www.icmje.org) • Knowledge embedded in the text of scientific articles has the form of relations between phenomena • as, for ex: “to smoke causes lung carcinoma” • A hypothesi (from Greekὑπόθεσις) is a suggested explanation of a phenomenon or reasoned proposal suggesting a possible correlation between multiple phenomena, WikiPedia,http://en.wikipedia.org/wiki/Hypothesis
Methodology • An initial model was proposed, based on the semantic elements of scientific method, as Problem, Hypotheses, Methodology, Results and Conclusion • Model was tested with 60 journal articles • 20 from Memorias do Instituto Oswaldo Cruz, http://www.scielo.br/revistas/mioc • 20 from Brazilian Journal of Medical and Biological Research, http://www.scielo.br/revistas/bjmbr • 20 about stem cells in international journals (in course) • Test results were used to enhance the Model
The Proposed Model • Model of Authoring/Self-publishing Web environment • Model of scientific reasoning procedure and knowledge content of a scientific article, as an ontology … and the future development of a tool to mark-up/record this knowledge in program “undertandable” format
Authoring/Self-Publishing Web environment A IMPOR Eas kjjsd dj sdk skdkl skls a fd g gfg ggfgg g Author/ scholar A IMPOR Eas kjjsd dj sdk skdkl skls a fd g gfg ggfgg g Authoring tool Scientific literature in a domain, Web published Semantic citaions A IMPOR Eas kjjsd dj sdk skdkl skls a fd g gfg ggfgg g Semantic relations Scientific article - text Knowledge represented in program readable format Web ontology (like UMLS) Semantic retrieval, validate and consistent checking tools Researcher, reader
Reasoning Procedures in scientific articles • Experimental-inductive articles • Experimental-deductive articles • Theoretical-abductive articles
Reasoning Procedures in scientific articles • Experimental-inductive articles • a PROBLEM is identified, with the following aspects and data; • a possible solution to this PROBLEM can be based on the following new HYPOTHESIS; • on the basis of this original HYPOTHESIS the PROBLEM has the following empirical manifestation; • we developed an EXPERIMENT to test this manifestation and it comes at the following RESULTS.
Reasoning Procedures in scientific articles • Experimental-deductive articles • a PROBLEM is identified, with the following aspects and data; • in literature the previous authors/HYPOTHESES are proposed; • we choose the following previous HYPOTHESIS and test, enlarge and re-contextualize this it with the following EXPERIMENT; • the test shows the following RESULTS in this new CONTEXT.
Reasoning Procedures in scientific articles • Theoretic-abdutive articles • a PROBLEM is identified, with the following aspects and data; • the previous authors/HYPOTHESES are not satisfactory to solve the PROBLEM due to the following criticism… ; • so, we propose this original HYPOTHESIS which we consider as a new pathway to solve the PROBLEM.
Analysis procedure (simulating the authoring/Self-Publishing tool) CAMARA, Geni NL, CERQUEIRA, Daniela M, OLIVEIRA, Ana PG et al. Prevalence of human papillomavirus types in women with pre-neoplastic and neoplastic cervical lesions in the Federal District of Brazil. Mem. Inst. Oswaldo Cruz. [online]. Oct. 2003, vol.98, no.7 3 steps: • Step 1- Type of reasoning is identified: experimental-deductive • Step 2-“elements of knowledge” are identified in the text as the main hypothesis stated by the author: “HPV causes pre-neoplastic and neoplastic cervical lesions” • Knowledge as a relation: Antecedent: HPV Type of Relation: causes Consequent: pre-neoplastic and neoplastic cervical lesions • Step 3-Each of these elements is mapped to “Public Knowledge” - UMLS, UMLS Semantic Network* • Papillomavirus, Human • “Causes” , UMLS Semantic network relation T147 • Colonic Neoplasms, • Tumor Vírus Infections /pathology,Tumor Vírus Infections /virology *We used DECS, portuguese version of MESH – Medical Subject Headings -, the main Vocabulary in UMLS
Ontology for knowledge in scientific articles Reasoning Experimental Theoretical Inductive Deductive Hypothesis Prev-Hypoth Problem Conclusions Experiment Results Measure Context: - Space - Time - Group References Title URN Type-of-Rel Consequent Antecedent
Model potentialities – semantic retrieval • which other articles have hypotheses suggesting HPV as the cause of cervical neoplasias in women? • which articles have hypotheses suggesting causes other then HPV to cervical neoplasias in women? • which articles have hypotheses suggesting HPV as the cause of cervical neoplasias in groups others than women? • which articles have hypotheses suggesting HPV as the cause of other pathologies different from neoplasias? • which articles have hypotheses suggesting HPV as the cause of cervical neoplasias in different contexts? (not in women from Federal District, Brazil).
Model potentialities • Software agents can navigate throughout a network of scientific articles published according to the model outlined and make inferences … • To semantic retrieval knowledge • To validate and consistency check of new contributions to Science • Is the knowledge in an article consistent with knowledge recorded in a public Web ontology? • To identify novelties in Science • A failure to map one or more elements of a “record of knowledge” may be an trace of a scientific discovery
Open questions and future research • The model feasibility in scientific areas others than Health Sciences • The need of a taxonomy of relations in Science • Is it feasible a Sk-ML – Scientific Knowledge Markup Language? • Guidelines for the development of a Web authoring/self-publishing interactive scientific editor to implement the model
Comments are welcome! http://www.professores.uff.br/marcondes marcon@vm.uff.br