1 / 16

Addressing Health Disparity Through Linked Data Analysis

Explore how linked data analysis can help evaluate health-care disparity using WHO datasets converted from CSV to RDF format. Learn about interlinking data with SILK and validating via linked data querying to draw meaningful conclusions and identify research areas. Discover the limitations and future work for improving data quality and interlinking. Acknowledge Agile Knowledge Engineering & Semantic Web research group for their contributions.

gean
Download Presentation

Addressing Health Disparity Through Linked Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluating Health-Care Disparity Employing Linked DataandData -driven Discovery Amrapali Zaveri AKSW, Institut für Informatik 1

  2. Outline • Motivation • Methodology • Datasets • CSV to RDF Conversion • Interlinking using SILK • Validation by Linked Data Querying • Conclusions • Limitations • Future Work

  3. Motivation • According to the World Health Organization (WHO), more than one billion people (i.e. one sixth of the world’s population) suffer from one or more neglected tropical diseases. • This shows a significant imbalance between the research intensity invested for the investigation of certain diseases and their prevalence. • Reason • current absence of accurate, interlinked data and information

  4. Methodology

  5. Datasets

  6. CSV to RDF Conversion • WHO’s GHO dataset • Published as Excel sheets • Advantage • Readable by humans • Disadvantages • Cannot be queried efficiently • Difficult to integrate with other data (in different formats) • Our approach • Converting data into a single data model - RDF • Using SCOVO (Statistical Core Vocabulary)* • designed particularly to represent multidimensional statistical data using RDF.  *Michael Hausenblas,et.al. Scovo: Using statistics on the web of data. In ESWC, 2009. 6

  7. What is SCOVO?

  8. Semi-automated approach • Transforming CSV to RDF in a fully automated way is not feasible. • Dimensions may often be encoded in heading or label of a sheet • Our semi-automatic approach: • As a plug-in in OntoWiki# • a semantic collaboration platform developed by the AKSW research group. • A CSV file is converted into RDF using SCOVO # Sören Auer et.al.: OntoWiki: A Tool for Social Semantic Collaboration In: Proceedings of the Workshop on Social and Collaborative Construction of Structured Knowledge CKC 2007 at the 16th International WWW2007 Banff, Canada, 2007

  9. SCOVOfied GHO Data prefix ex:<http://example.org/who-data> prefix scv:<http://purl.org/NET/scovo> ex:Country rdfs:subClassOf scv:Dimension;                  rdf:type rdfs:Class;                dc:title "Country". ex:Disease rdfs:subClassOf scv:Dimension;                  rdf:type rdfs:Class;                dc:title "Disease". ex:CountryCode rdfs:subClassOf scv:Dimension; rdf:type rdfs:Class; dc:title "CountryCode". ex: Afghanistan rdf:type ex:Country;                       dc:title "Afghanistan" .ex:Tuberculosis rdf:type ex:Disease;                       dc:title "Tuberculosis" . ex:3010 rdf:type ex:CountryCode; dc:title “3010”. ex:c1-r6    rdf:type scv:Item;               rdf:value 127;             scv:dimension ex:Afghanistan;               scv:dimension ex:Tuberculosis . scv:dimension ex:3010 Result: 3 million triples

  10. Interlinking Datasets using SILK

  11. Interlinking Results Interlinks for: • Publications - already present • Disease - used SILK$ • Country - used SILK$ Number of interlinks obtained between datasets $ Julius Volz, Christian Bizer, Martin Gaedke, Georgi Kobilarov: Discovering and Maintaining Links on the Web of Data. International Semantic Web Conference (ISWC2009), Westfields, USA, October 2009. http://www4.wiwiss.fu-berlin.de/bizer/silk/spec/

  12. Validation by Linked Data Querying PREFIX who: <http://ghodata.org/> PREFIX ct: <http://data.linkedct.org/resource/linkedct/> PREFIX pubmed:<http://bio2rdf.org/ns/mesh#> SELECT DISTINCT ?disease ?incidence ?country WHERE { ?x who:country "India" . ?x who:incidence ?incidence . ?x who:disease ?disease . FILTER(?incidence>70) } SELECT DISTINCT ?disease ?country ?noOfTrials WHERE { ?disease who:disease "Tuberculosis" . ?y ct:disease ?disease . ?y ct:noOfTrials ?noOfTrials . ?y ct:country ?country . } SELECT ?country COUNT(?reference) WHERE { ?disease who:disease "Tuberculosis" . ?z ct:disease ?disease . ?z ct:country ?country . ?z pubmed:reference ?reference . }GROUP BY ?country

  13. Conclusions • Which disease has the highest percentage of health-care disparity with respect to the burden of disease and the clinical trials conducted in a particular country? • As a research policy maker, which research area would it be most beneficial to allocate funds? • Who are the key people doing most research for a particular disease? • What has been the trend, overtime, for the health-care disparity for a particular region?

  14. Limitations • Information Quality • Coverage • Interlinking Quality • Propagation of Errors

  15. Future Work • Improve Interlinking • Interlinking with other relevant datasets • Updating knowledge-base as new data is published • Creating a user interface

  16. Acknowledgements • Research group Agile Knowledge Engineering & Semantic Web (AKSW): http://aksw.org • Research on Research Group: http://researchonresearch.duhs.duke.edu/site

More Related