1 / 10

eScience at:

eScience at:. Barend Mons. WHAT I WILL NOT TALK ABOUT:. Goal: in silico Knowledge discovery via pattern recognition in big data. Nature, June 6, 2012. Nature, June 10, 2012. Cell, March 16, 2012. Nature, June 21, 2012. Genetics in Medicine, March , 2011.

aric
Download Presentation

eScience at:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eScience at: Barend Mons WHAT I WILL NOT TALK ABOUT:

  2. Goal: in silico Knowledge discovery via pattern recognition in big data Nature, June 6, 2012 Nature, June 10, 2012 Cell, March 16, 2012 Nature, June 21, 2012 Genetics in Medicine, March, 2011

  3. The Data Challenge…..no option to go on like this • Computer speed and storage capacity is doubling every 18 months and this rate is steady • DNA sequence data is doubling every 6-8 months over the last 3 years and looks to continue for this decade Guy Cochrane, ENA, EMBL-EBI Soon enough, data stewardship and analysis is will be THE limiting factor in eScience

  4. All Legacy information New dataset New Insights User

  5. A Cardinal Assertion aggregates all ‘n’ Nanopublications making the same assertion. It therefore has 1 assertion and ‘n’ provenances, eliminating redundancy. A Nanopublicationis the smallest unit of publishable information containing: Assertion A statement of concepts in terms of one or more ‘subject -> predicate -> object’ (triple) relationships. Provenance Attribution – Who made this assertion, when and where? Supporting information – Any other information which is relevant to the assertion (e.g. this assertion is only valid in humans under 18). Nanopublications & Cardinal Assertions Nanopublication 1 identical assertion ‘n’ different provenances Cardinal Assertion

  6. Managing volume & complexity 5 Combining Cardinal Assertions with Concept profiles reduces the amount of data with ≈99.999996% Individual Concept Profiles ≈4x106 5 4 1 2 5 Individual Cardinal Assertions > 1011 Individual Nanopublications > 1014 2 4 1

  7. Information compression: From 1014 nanopublications, to 1011 cardinal assertions to a concept web of only106Knowlets: from a 1.5 M euro machine to my local server > Data reduction! for KD

  8. Traditional Publishing >>>> eScience Publishing Computer Reasoning (takes charge) Gigsa size of datasets (beyond narrative) Collaborative Intelligence (calls for million minds) Irreversable movement (to OA)

  9. Data and information interoperability in the digital science era New Insights Confirmational reading (full provenance) Research Community Publication 6 App-GUI App-GUI App-GUI App-GUI 5 Multiple Analytics-Enabling Environments (any format) 4 Interoperable Exchange Environment (RDF-Open PHACTS-type) 3 2 Any Format Databases (curated) Patient Blogs Clinical Data TranSmart) Data Sets (TranSmart) 1

  10. Public Terminal The Safe Data Harbour Ground Plan Published Commons Data and Analysis Market Place Freemium Terminal 46 Today Mixed Public/secure Terminal (patient data) High Security Terminal

More Related