160 likes | 173 Views
This article discusses the challenges of data curation in the digital age and proposes the use of structured data entry, nanopublications, and intellectual networking as potential solutions. It emphasizes the need to remove ambiguity and redundancy, involve multiple minds in curation, and shift towards knowledge-centric approaches. The article also highlights the importance of data publication, citation metrics, and bottom-up standard setting.
E N D
“The curation challenge for the next decade: Digital overlap strategy or collective brains?” Barend Mons 07-12-10
Measure ANYTHING in the –omics age…. BIGNORANCE Driven Research
The problem (also with community annotation) in a nutshell Sructured data entry Nobody likes Structured data Everybody wants Free text/cut and paste Everybody likes
Remove redundancy 1014 1011
We should learn how to reason with the essence of available knowledge
S P O Assertion
Preferred Term UUID ARTA table Authority http://www.uniprot.org/uniprot/Q25190 Preferred URI Access to all nano-publications With this this concept as SUBJECT\ In this case from UniProt TS ? predicate Object
Three concepts can form an Assertion….. Concepts (UUID) 1 2 3 S P O Triples (S-P-O) S P O Assertion A1 Annotation A1 Nano-publication Cardinal Assertion NP1 (A1) NP2 (A1) CA1 NP1 A1 √e NP3 (A1) NP4 (A1)
Each Cardinal Assertion maps to three UUID’s • Therefore each new assertion can be checked for uniqueness • Novel combinations of 3 UUID’s are new Cardinal Assertions • Cardinal Assertions are sent for Community Review • Daily reasoning with new ‘credible’ Cardinal Assertions Tweets, blogs Wiki’s Publications Curations Associations CA4* Datapublication CA4* Nano-publication Inferencing Association A1 CA1 Impact Metrics Beyond the article CA2 CA3
The IMI Open Phacts consortium and beyond IMI + ESFRI + NCBO + VIVO + ORCID >> 150 partners
In Summary We need to remove ambiguity We need to remove redundancy We need computer reasonable data to involve a ‘million CPU’s’ We need to involve a ‘million minds’ in curation (preferably at the source) We need data PUBLICATION We need to do the data citation metrics We need bottom up standard setting by best practice in critical mass (OPS)
Current: article-centric Future: Knowledge-centric Minutes: Rethoric Arguments Data Sets Suppl. data SDA etc. Nanopubs TEXT> TRIPLES DOI Images References References article Images Nanopubs (Assertions) Curated ontologies Concept maps In silico reasoning Curated ontologies Concept maps In silico reasoning