1 / 27

Data Repositories & Linked Data

ARD Prasad DRTC Indian Statistical Institute ard@drtc.isibang.ac.in. Data Repositories & Linked Data. Open Access to Information (OAI) A Fairly successful movement, resulted in Open Access Repositories (> 2000) Open Access Journals (> 5000)

ginger
Download Presentation

Data Repositories & Linked Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ARD Prasad DRTC Indian Statistical Institute ard@drtc.isibang.ac.in Data Repositories&Linked Data

  2. Open Access to Information (OAI) A Fairly successful movement, resulted in Open Access Repositories (> 2000) Open Access Journals (> 5000) Partially bridging digital divide in Social, Physical, Natural Sciences and Humanities, Looking Back

  3. Many publications use data. Actual article may not have complete data used For lack of space Author might have overlooked the data Author deliberately did not present data - so that others can not verify the data Nature of Publications

  4. Some suspect that Sigmund Freud's data is of fictious persons, it is not just fictitious names For Example

  5. Others may draw different conclusions contradictory to that of the author Others may deal with other facets of the data Data Transparency supplements the Objectivity and self corrective characteristics of Science If “Case history of patients” is openly available, it will contribute significantly to medical research If data is available ...

  6. Social Sciences do not require laboratory infrastructure However, physical and natural sciences do require expensive infrastructure If experimental data is available to scientists that do not have infrastructure, it will significantly reduce digital divide in Physical and Natural Sciences ODA is a step toward transparency and quality in science Digital Divide

  7. Human Genome data Data from Accelerator Labs (CERN) Recent controversy about particle moving faster than light Not surprisingly, astronomy data is openly available even before the OA movement For Example

  8. Metadata: specify who is the owner, creator etc license the data to waive your rights to facilitate bulk download Open Data Technology Tools: automate data extraction preferable on Cloud Ontology: Index data Features of Open Data Repositories

  9. Creative Commons licenses (apart from CCZero), GPL, BSD, etc are NOT quite appropriate for open data licences Licences

  10. Open Data Commons Public Domain Dedication and Licence (PDDL) Dedicate to the Public Domain (all rights waived) Open Data Commons Attribution License Attribution for data(bases) Open Data Commons Open Database License (OdbL) Attribution-ShareAlike for data(bases) Creative Commons CCZero Dedicate to the Public Domain (all rights waived) Open Data Licences

  11. Public Data Sets on AWS Annotated Human Genome Data provided by ENSEMBL The Ensembl project produces genome databases for human as well as almost 50 other species, and makes this information freely available. Various US Census Databases from The US Census Bureau Demographic data US Censuses Summary information about Business and Industry Economic Household Profile Data. UniGene provided by the National Center for Biotechnology Information Amazon Web Services (AWS)

  12. Sloan Digital Sky Survey DR6 Subset Astronomy

  13. Influenza Virus (including updated Swine Flu sequence Ensembl Annotated Human Genome Data - for MySQL GenBank Biology

  14. PubChem Library A data set of information on the biological activities of small molecules. 3D Version of the PubChem Library UGI Virtual Conformer Library 500,000 molecules for virtual screening. Chemistry

  15. Daily Global Weather Measurements, 1929-2009 Climate

  16. Federal Reserve Economic Data Transportation Databases Labor Statistics Databases US Census Business and Industry Summary Data Economics

  17. Collecting verifiable digital assets Providing digital asset search and retrieval Certification of the trustworthiness and integrity of the collection content Semantic and ontological continuity and comparability of the collection content Use of open standards (formats) for term preservation and future proofing by migration of data Digital Curation

  18. Data repositories are much larger than OA repositories Cloud Computing is a good solution (AWS uses) Semantic Web & Linked Data (Linking Data through various methods) Technology

  19. RDF: Resource Description Framework SKOS: Simple Knowledge Organization System OWL: Web Ontology Language SPARQL: SPARQL Protocol and RDF Query Language Resource Descriptionin terms of Metadata and Ontology

  20. Title: Dil-E-Naddan Artist: Talat Mahamed Artist: Suraiya Company: HMV Country: India Price: Rs.100 Year: 1955 RDF Example

  21. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:cd="http://www.recshop.fake/cd#"> <rdf:Description rdf:about="http://www.hmv.com/cd/Dil-E-Naddan"> <cd:artist>Talat Mahamed</cd:artist> <cd:artist>Suraiya</cd:artist> <cd:country>India</cd:country> <cd:company>HMV</cd:company> <cd:price>Rs. 100</cd:price> <cd:year>1955</cd:year> </rdf:Description>

  22. prefLabel - The preferred term altLabel - These are the See references which point to this record narrower - Contains the related narrower term broader - Contains a sub-element for the authority type which contains the related broader term related - Contains a related term which is at the same level in the heirarchy scopeNote - Note information SKOS Example

  23. Multi-domain ontology derived from Wikipedia 3.77 million “things” (entities - Entitypedia) 400 million “facts” Uses YAGO (Yet Another Great Ontology) DBpedia Data Set

  24. Multilingual controlled vocabulary Entity matching Data quality and type checking Entity type specific services Semantic or faceted search and navigation on entities Summarization of entities and concepts Entitypedia

  25. Living Knowledge (EC funded project) ITPAR: India-Trento Program for Advanced Research (work on Entitypedia) CHAIN – REDS (EC funded Project): Coordination and Harmonization of Advanced e-Infrastructures–Research & Education Data Sets DRTC Projects

  26. Thank You

More Related