1 / 55

Online tools and standards for Biodiversity data in the Semantic Web

Online tools and standards for Biodiversity data in the Semantic Web. Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London. What is the semantic web?. http://…. http://….

vian
Download Presentation

Online tools and standards for Biodiversity data in the Semantic Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online tools and standards for Biodiversity data in the Semantic Web Dr Dimitris Koureas Biodiversity Informatics Group | Department of Life Sciences The Natural History Museum London

  2. What is the semantic web? http://… http://… Slide adjusted from Page R. presentation in pro-iBiosphere

  3. What is the semantic web? http://… http://… link , Slide adjusted from Page R. presentation in pro-iBiosphere

  4. What is the semantic web? http://… http://… http://… Slide adjusted from Page R. presentation in pro-iBiosphere

  5. What is the semantic web? http://… http://… person book is a author of Fred http://… Slide adjusted from Page R. presentation in pro-iBiosphere

  6. The Semantic web: What is the semantic web? “The future of the web …and always will be” – Peter Norvig (Google) Slide adjusted from Page R. presentation in pro-iBiosphere

  7. Biodiversity informatics The study of the transformationand communicationof informationin Life and Earth sciences provides the means (generating and enhancing the necessary infrastructure)

  8. Research vs Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

  9. Research • Discovery • Ephemeral • Individualistic • Massive redundancy • Optional • Risk taking vs Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

  10. Research • Discovery • Ephemeral • Individualistic • Massive redundancy • Optional • Risk taking vs • Implementation • Communal / agreed • Essential • Persistent • Robust & reliable • Adaptable Infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

  11. What are the current challenges in Biodiversity informatics?

  12. Current taxonomic data production Typically generated by smallcommunities for “local” research projects Publications based on countless specimens, images, maps, keysand datasets Figure from Costello M.J et al, 2013 doi: 10.1126/science.1230318

  13. Our current taxonomic data production • 15-20k new spp. described annually (2M total)1 • 30k nomenclatural acts (12M total) 1 • 20k phylogenies (750k total)2 • 31k taxa sequenced (360k taxa total)3 • 800k BioMed papers (40M total pp. of taxonomy) 4 • Countless specimens, images, maps, keys and datasets 1.8 M described spp. (17M names) 300M pages (over last 250 years) 1.5-3B specimens Figures from 1) Zhang, Zootaxa 2011 4, 1-4; 2) Web-of-Science; 3) Genbank and 4) PubMed.

  14. Now imagine that… Estimates of 7.5 million species still undescribed1 1How Many Species Are There on Earth and in the Ocean? Mora C et al. doi:10.1371/journal.pbio.1001127

  15. Biodiversity informatics landscape • Key problems • Landscape is complex, fragmented & hard to navigate • Many audiences (policy makers, scientists, amateurs, citizen scientists) • Many scales (global solutions to local problems) Figure adapted from Peterson et al, Syst. & Biodiv. 2010 doi: 10.1080/14772001003739369

  16. Science is global • It needs global standards • Global workflows • Cooperation of global players • Science is carried out “locally” • By local scientists • Being part of local infrastructures • Having local funders BUT

  17. Expected volume of taxonomicandbiodiversity data Need of extracting, aggregatingandlinkingdataon a global level

  18. To achieve this… • This requires data, information & knowledge to be… • Digital • Not printed paper • Openly accessible • Not behind barriers (e.g. paywalls) • Linked-up • Not in silos “Link together evolutionary data… by developing analytical tools and proper documentation and then use this framework to conduct comparative analyses, studies of evolutionary process and biodiversity analyses” Cyndy Parr, Rob Guralnick, NicoCellinese and Rod Page. TREE doi:10.1016/j.tree.2011.11.001

  19. Hour-glass motif for big data infrastructure Data re-use Data pool Data generation Slide adapted from Patterson D. 2013, Tempe, Arizona

  20. Big data world with re-use data • Re-use • Quality enhancement • Distribute • Make discoverable and actionable • Atomize • Standardize (metadata, ontology) • Use stable UUIDs to identify content • Preserve • Federate • Register • Make accessible • Normalize data • Structure data • Make data digital Visualization Analysis Aggregation Manipulation Data re-use Data pool Data generation Observations Experiments Models Processed

  21. Big data world with re-use data Visualization Analysis Aggregation Manipulation Data re-use Data pool Data generation Observations Experiments Models Processed

  22. Nodes interconnected • Dynamically interconnected • Nodes with sub-discipline specific responsibilities • Standard Exchange formats • Using UUIDs to identify content • Ontologies • Nodes are the essence of infrastructure Slide adapted from Patterson D. 2013, Tempe, Arizona

  23. But how many biodiversity informatics projects are out there?

  24. But how many biodiversity informatics projects are out there? At least 679! Categories: Data Aggregator - a web site that collates data from a variety of sources (digital and hardcopy) and presents it in one formData Indexer - a web site that provides lists or indexes of other sites that provide data Data Provider - a web site that provides data directly from research or other studiesData Standards - a web site that contributes to formulating or developing standards for dataFacilitator- a web site that facilitates the provision of data by other projects or web sites Sources: EDIT, TDWG & ViBRANT 2013

  25. Aggregators GBIF: Our global leader in occurrence data

  26. Aggregators http://www.eu-nomen.eu/portal/ EU-NOMEN - PESI

  27. Aggregators Making taxonomy digital, open& linked

  28. Scratchpads are an integrated system to Enter, Curate, Mark-up, Link and Publish data taxonomicworkflow in asinglevirtualenvironment

  29. The Scratchpads concept External data & services Your data A Scratchpad is a website that holds data for you and your community

  30. 580 Scratchpads Communities by 8,185 active registered users covering 55,607 taxa in 653,274 pages. In total more than1,300,000 visitors Per month unique visitors to Scratchpads sites 65,000 unique visitors/month

  31. Facilitators BOLD Barcode of Life Data Systems Researchers can assemble, test, and analyse their data records in BOLD before uploading them to: International Nucleotide Sequence Database Collaboration (DDBJ, ENA, GenBank)

  32. Providers Biodiversity Heritage Library BHL http://www.biodiversitylibrary.org/ Biodiversity literature openly available to the world as part of a global biodiversity community > 40 M pages of legacy literature

  33. Standard Exchange formats

  34. Standard Exchange formats http://rs.tdwg.org/dwc/index.htm Darwin Core (DwC) Primarily used as a specimen records metadata standard

  35. Standard Exchange formats http://www.tdwg.org/standards/115/ Access to Biological Collection Data (ABCD) highly detailed and aims to provide a complete set of data elements for natural history collection items

  36. Standard Exchange formats http://www.tdwg.org/standards/638/ Audubon Core Multimedia Resources Metadata Schema The Audubon Core metadata schema ("AC") is a representation-neutral metadata vocabulary for describing biodiversity-related multimedia resources and collections.

  37. Standard Exchange formats Taxonomic Concept Transfer Schema (TCS)  http://tdwg.napier.ac.uk/index.php?pagename=HomePage  Mechanism to exchange data concerning the names of organisms

  38. Standards facilitate systems interoperability

  39. We need Unique Identifiers UPIDs to identify content Identifiers A key to find something in a database.

  40. We need Unique Identifiers 10.4289/0013-8797.115.1.75

  41. We need Unique Identifiers http://hdl.handle.net/10.4289/0013-8797.115.1.75 http://dx.doi.org/10.4289/0013-8797.115.1.75 http://www.google.co.uk/search?q=10.4289/0013-8797.115.1.75 http://zoobank.org/10.4289/0013-8797.115.1.75

  42. We need Unique Identifiers Can a taxonomic name be used as a UPID? Are taxonomic names enough for communication between Scientists? YES Are taxonomic names enough for communication between machines? CAN BE IF Is it Unique? Is it Persistent? Is it an Identifier?

  43. We need Unique Identifiers For example: Page R., Brief Bioinform (2008) 9 (5): 345-354. doi: 10.1093/bib/bbn022

  44. We need Unique Identifiers ONLY IF Name reconciliation Patterson, D. J. et al. 2010. Names are key to the big new biology. TREE 25: 686-691 doi: 10.1016/j.tree.2010.09.004

  45. Ontologies Knowledge Organisation Systems The need for Controlled Vocabularies and Ontologies Google has done it: http://googleblog.blogspot.co.uk/2012/05/introducing-knowledge-graph-things-not.html Plant anatomical and structural development Ontology http://www.plantontology.org/

  46. Example of ontology usage • Deans A. et al. Time to change how we describe biodiversity, Trends in Ecology & Evolution 2012 • doi:10.1016/j.tree.2011.11.007

  47. Examples of integrated projects http://protectedplanet.net http://thymus.myspecies.info

  48. How are all this relevant to my work? What should I take home?

  49. Providers Community Data silos Repositories #bigdata

  50. The four nodes of data workflow 1. We collect and generatedata 2.We curate, link and structure data 3.We analysedata 4.We publishdata

More Related