200 likes | 295 Views
Which Gene did you mean (again?) Barend Mons HGVS 12 May 2010. 2005. Measure ANYTHING in the – omics age…. BIGNORANCE d riven r esearch. The problem (also with community annotation) in a nutshell. Sructured data entry. Nobody likes. Structured data. Everybody wants.
E N D
Measure ANYTHING in the –omics age…. BIGNORANCE driven research
The problem (also with community annotation) in a nutshell Sructured data entry Nobody likes Structured data Everybody wants Free text/cut and paste Everybody likes
Ambiguity Redundancy
Redundancy ? • No reviewer would accept the exact same paper twice, let alone 1000 times ! • Still the same assertion is published over and over again in traditional publishing….
S P O Assertion
Preferred Term UUID ARTA table Authority http://www.uniprot.org/uniprot/Q25190 Preferred URI Access to all nano-publications With this this concept as SUBJECT predicate Object
Three concepts can form an Assertion….. Concepts (UUID) 1 2 3 S P O Triples (S-P-O) S P O Assertion A1 Annotation A1 Nano-publication Cardinal Assertion NP1 (A1) NP2 (A1) CA1 NP1 A1 √e NP3 (A1) NP4 (A1)
Each Cardinal Assertion is supported by 1-n nano-publications • Provenance is linked but separated from Cardinal Assertions • Evidence Factor (computed by Open Algorithm) associated S S Also Referred To As ..ARTA S Nano-publications with Provenance NP1 (A1) Cardinal Assertions with Evidence score NP2 (A1) NP3 (A1) CA1 √e NP4 (A1) CA2 √e CA3 NP1 (A2) NP2 (A2) NP3 (A3)
Each Cardinal Assertion maps to three UUID’s • ARTA table enables mapping across URI’s • Three UUID’s (subject-predicate-object) form new UUID for CA • Computer Reasoning uses Cardinal Assertions + evidence weight S p Also Referred To As ..ARTA o Inferencing Association CA1 CA2 CA3
Each Cardinal Assertion maps to three UUID’s • Therefore each new assertion can be checked for uniqueness • Novel combinations of 3 UUID’s are new Cardinal Assertions • Cardinal Assertions are sent for Community Review • Daily reasoning with new ‘credible’ Cardinal Assertions Tweets, blogs Wiki’s Publications Curations Associations CA4* CA4* Inferencing Association CA1 CA2 CA3
Conceptwiki: People are concepts too…. • People make nano-publications • People have concept of interest • People have Smart Phones All nano-publications With this this concept as SUBJECT or generated by this Person
Triple detection Redundancy check • Quertle (28-12-09) on Pubmed: • CLCN2-generalized epilepsy (C=19, R=5), causes:R=0 ! Human annotation of triple> no evidence….nano-publication • PNPLA3-Alcoholic liver disease (0) > new triple reported>reviewed>nano-publication • NRAS-noonan syndrome (C=5, R=1) causes: R=0 > new triple reported>reviewed>nano-publication • DLK1-type 1 Diabetes (C=3, R=0), causes:R=0 > new triple reported>reviewed>nano-publication • IFN2-focal segmental glomerulosclerosis (C=0, R=0) > new triple reported>reviewed>nano-publication • PARK2-Glioblastoma (C=4, R=0) > new triple reported>reviewed>nano-publication
Wikipedia Scholar/Commontology (projects starting as I speak) The statement: phosphorylation of p21 promotes cell survival is first run through a tagger (for instance Peregrine) and sows up to the author/editor as: Phosphorylation of p21 promotes cell survival (red means ambiguous) The user, upon clicking on p21 which is a highly ambiguous term, the ‘which gene did you mean approach’ is fired and will ask the user (in a pop up) to choose from many meanings among which the highest ranking are: In case the Homo sapiens protein is chosen, the concept UUID is inserted in the ‘SMW+’ syntax phosphorylation(UUID-Prpv) of p21(UUID-S) promotes (UUID-P)cell survival." (UUID-O) S P O Assertion A1