210 likes | 365 Views
A method to propagate permissions in biomedical data using a semantic web framework Helena F. Deus and Jonas S. Almeida hdeus@mathbiol.org The University of Texas M. D. Anderson Cancer Center. History of the web . Web 1.0 Links -> Documents Web 2.0
E N D
A method to propagate permissions in biomedical data using a semantic web framework Helena F. Deus and Jonas S. Almeida hdeus@mathbiol.org The University of Texas M. D. Anderson Cancer Center
History of the web Web 1.0 Links -> Documents Web 2.0 Links -> Data Structures -> Web services Web 3.0 Links -> Web Services -> Links -> Web Services -> Links -> Web Services .…
Evolution of data representation Nature Biotechnology. 2005 Vol 23 Nr 29
Data management in the life sciences Clinical/Medical data MDAxxxx MDAxxxx MDAxxxx Electronic Health Records RDBMS Life is good!
Heterogeneous data management Core facilities data Clinical/Medical data DNA Sequencing MDAxxxx MDAxxxx MDAxxxx Microarrays RDBMS Protein Arrays Data everywhere! Pulse Field Gel Electrophoresis
A data pyramid Wisdom Knowledge W3C OWL, OBO RDF SPARQL Information XML TEXT Data Files
Snapshots of interfaces using S3DB’s API (Application Programming Interface). These applications exemplify why the semantic web designs can be particularly effective at enabling generic tools to assist users in exploring data documenting very specific and very complex relationships. Snapshot A was taken from S3DB’s web interface, which is included in the downloadable package. This interface was developed to assist in managing the database model and, therefore, is centered on the visualization and manipulation of the domain of discourse, its Collections of Items and Rules defining the documentation of their relations. The application depicted on snapshots B-D describe a document management tool S3DBdoc, freely available as a Bioinformatics Station module (see Figure 6). The navigation is performed starting from the Project (C), then to the Collection (B) and finally to the editing of the Statements about an Item (D). The snapshot B illustrates an intermediate step in the navigation where the list of Items (in this case samples assayed by tissue arrays, for which there is clinical information about the donor) is being trimmed according to the properties of a distant entity, Age at Diagnosis, which is a property of the Clinical Information Collection associated with the sample that originated the array results. This interaction would have been difficult and computationally intensive to manage using a relational architecture. The RDF formatted query result produced by the API was also visualized using a commercial tool, Sentient Knowledge Explorer (IO-Informatics Inc), shown in snapshot E, and by Welkin, F, developed by the digital inter-operability SIMILE project at the Massachusetts Institute of Technology. See text for discussion of graphic representations by these tools. To protect patient confidentiality some values in snapshots B and D are scrambled and numeric sample and patient identifiers elsewhere are altered. PLoS ONE. 2008 Aug 13;3(8):e2946
Example: TCGA data structure http://tcga.s3db.org
S3DB Rule http://tcga.s3db.org/R247 Sample ?? Patient blood Sample Patient tumor Tissue S3DB Statement http://tcga.s3db.org/S234 sampleX patientY R427
TCGA domain - instance PLoS ONE. 2008 Dec;3(12):e4076
Code portability and distributed data API API API SPARQL
Permission management Markov Model
Experimental evolving ontologies Upper ontologies Intermediate Ontologies Domain-Specific Ontologies MGED and others Current entry level for computation Experimental, evolving Data Models Proposed entry level for computation Raw data
S3DB.ORG What is S3DB? • It is a web service that manages semantic web content distinguishing the domain of discourse from its instantiation. It was configured specifically for the needs of Biomedical Informatics projects where: • Those who submit the data keep a fine tuned control over its access and use. • The data model is deployed over a core ontology that allows its editing. • It has a distributed deployment designed to deal with heterogeneous environments. What S3DB is not? • It is not a client application. • It is not a “work in progress”: a SPARQL endpoint assures that experimental data is not kept outside of the Linked Data Web until is matures
In Conclusion • Dissolution of boundaries between data structures is a good thing… But doing it without losing the role of each data element is even better • Some level of explicit granularity in the data is necessary to implement a permission model.
Acknowledgements Jonas S. Almeida Kadir Akdemir Miriã Coelho Cintia Palú Pablo Freire The Integrative Bioinformatics Lab at the University of Texas MD Anderson Cancer Center (Houston, Tx) Instituto de Tecnologia Quimica e Biologica, Universidade Nova de Lisboa (Lisbon, Portugal) http://s3db.org