270 likes | 854 Views
On our way to to Information Overload ?. Or to prevent it by Appropriate use of Technology ?. C19881 0.99 C92992 0.67 C02002 0.66 C99229 0.44 C00392 0.33 C93939 0.21. Collexis Fingerprints (CFP’s). consolidated knowledge. Cross-language networking.
E N D
C198810.99 C929920.67 C020020.66 C992290.44 C003920.33 C939390.21 Collexis Fingerprints (CFP’s) consolidated knowledge
Cross-language networking Peoplemedical researchersaround the world Activitiesin elect. text like projects, publicationsMedline abstracts... Multilingual Thesaurus IndexerMatcheskeywords, translatesthem to identical numbers and ranks them by their relevance The CommonLanguageEach activity is representedas a set of keyword numbersranked by their relevance English „Collexion“ of activities Disease: #12674 Malaria: #24530 #4256 : 1.0 #3627 : 0.8 #19994 : 0.5 #28746 : 0.3 #32874 : 0.1 #32874 : 0.1 #32874 : 0.1 Hospital: #19994 ... #14325 : 1.0 #3627 : 0.8 #19994 : 0.5 #28746 : 0.3 #32874 : 0.1 #32874 : 0.1 #32874 : 0.1 #85643 : 1.0 #3627 : 0.8 #19994 : 0.5 #28746 : 0.3 #32874 : 0.1 #32874 : 0.1 #32874 : 0.1 French #17345 : 1.0 #3627 : 0.8 #19994 : 0.5 #28746 : 0.3 #32874 : 0.1 #1c8456 : 0.1 #00356 : 0.1 Maladie: #12674 Paludisme: #24530 Hôpital : #19994 ... Spanish Enfermedad: #12674 Find similaractivities and the people behind Paludismo: #24530 Hospital: #19994 ... #17345:1.0 #3627 :0.8 #19994:0.5 #28746:0.3 #32874:0.1 Your activity as text You: Submit and indexed to keyword numbers
Emails, Word RFP’s Jobs CV’s, Skills people fingerprints The Early evolution of Fingerprint Manipulation add add Articles, books organization fingerprint contents fingerprints
BIOSEMANTICS • “Cellese”: the language that cells use to communicate internally and externally. • The Molecular Language and its biological MEANING • The Group • Jan Kors PhD. • Erik van Mulligen PhD • Bob Schijvenaars PhD • Marc Weeber PhD • Christiaan v.d. Eyck MsC • Rob Jelier PhD • Barend Mons PhD • Johan van der Lei PhD
A consortium to combine State-of-the-art • Information and Knowledge Mining Technologies • To support: • Thesaurus and ontology enrichment • Disambiguation of concepts • Semantic meta-analysis of massive information • To enable: • Information-based discovery • Evidence based policy making
Thesaurus and Ontology Enrichment • New concepts • Synonyms • Homonyms • Genes, Proteins • Pictures
Elsevier EMBO E-BioSci partners Fingerprints (known concepts) 4 1 FUA • Thesauri: • Mesh • HUGO • SwissProt • SAGE • Others NLP 2 TNO UVA Genebio EUR HUGONC AMC LUMC SERENDIP Free text Unexplained Text (XML) Validation 3 Potential concepts
Too much to read: major trends foreseen: • From Reading to Consulting • From Reading to Meta-analysis • From Text to Knowledge Representations
C198810.99 C929920.67 C020020.66 C992290.44 C003920.33 C939390.21 The first step: to the Conceptual Semantic Network Semantic types Co-occurrence data
Aicardi Goutieres syndrome 1Heterogeneity Linkage (Genetics) Clinical diagnosis Family 2AGS1 ** Lod Score Genetic Heterogeneity analysis Toxoplasmosis Calcium deposition 3Encephalopathy 4Cadmium Genus: Human cytomegalovir... Cerebrospinal fluid abnorm. 5..Interferon-alpha Chromosomes Viral Child Head Tricuspid Valve Stenosis SwissProt: Activator of G-protein signaling 1 (AGS1) G-protein coupled receptors G-substrate Lipoid dermatoarthritis Receptors Complement Factor B RNA, Complementary Xenopus oocyte AGS1 AICARDI-GOUTIERES SYNDROME 1; (AGS1) : OMIM Calcium deposition Pleocytosis Basal Ganglia EncephalopathyCerebrospinal Fluid Tomography, X-Ray Computed Parents FamilyAicardi Goutieres syndromeFerrocalcinotic deposition Spastic quadraplegia Fahr disease Microcephaly AGS1 *225750 x
META-ANALYSIS Fingerprinting disambiguation ACS
Applications • Cross-language, jargon and cross-system matching (implemented): www.sharingpoint.shared-global.org • Information-based discovery (Research) • Community building (Experts,Policy Making) • Trendwatching and Indicators (Policy Making)
BRCA1 Seed-Term based Conceptual Semantic Networks
gene B gene A Clustering of genes on-the-fly ?
III= Distribution over distance categories of concept-pairs without co-occurrence in the learning set. IV= Distance categories of concept pairs related to the probability that there is no explicit relationship or co-occurrence in Medline (zero ratio) . A ratio of 0 means that an automatic Query in Medline with the concept pair with “AND” in between does lead to 0 hits in Medline.
E-BioSci Pharma etc. ORIEL SERENDIP FP6 etc. Private Research Public I-Research Ministies WHO, FAO etc. SHARED BIREME/VHL EDCTP Oxford intiative etc. DC