390 likes | 405 Views
This text explores the challenge of information overload and the need for effective knowledge organization and visualization. It proposes new approaches and tools for maximizing the usefulness of information and increasing actionable knowledge.
E N D
“ Please observe. In the space of one hundred and seventy-six years the Lower Mississippi has shortened itself two hundred and forty-two miles. This is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the Old Oolitic Silurian Period, just a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck out over the Gulf of Mexico like a fishing-rod. And by the same token any person can see that seven hundred and forty-two years from now the Lower Mississippi will be only a mile and three-quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding comfortably along under a single mayor and a mutual board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact. ”
Beyond Open Access Jan Velterop UKSG, Torquay, March 30, 2009
“ There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact. Mark Twain, Life on the Mississippi ”
What’s wrong? • We have far too few returns in terms of actionable knowledge out of such overwhelming investment of fact! • The reason is that a lot of fact is deeply hidden!
Current Knowledge Transfer An analogy Needle transport
Information overload? Too much knowledge? Stop acquiring it? Just filtering it? Or organisation underload? Lack of conceptual structure? Unprecedented opportunity?
Analogy: What is the use of water?
Navigate Drink (take in) H2O
Navigate Read (take in) Age to Know
Publish triples Publish articles Preparing content for the future! Maximizing usefulness
obesity diabetes body composition And visualizations Publish articles
Publishing triples New skills Change in Culture Change in workflow Experienced in publishing articles e.g. providing author proofs with proposed triples and asking them to verify those
Living not on detail alone Getting the big picture – too
Bio commontology Community Annotation (a posteriori) Community Annotation (a posteriori) <ID1><edge><ID2> Community Annotation (a priori) Triplet construction (unsupervised) Peregrine Concept Mapping Direct feed Blogs, etc. MRS Index, virtual concepts Uniprot PubMed Nextprot CALIPHO BioBanks e.g. LOVD InWeb WikiPro SERMO GEO GWA Tools, RDF, OWL, OBO, Protégé Harmonized data Daily feed Information silos
(node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value owner condition Etc. F+ C+ A+ C+ A+ A+ Multiple Triples T-Cell Development Graph Building (e.g. WikiPathways) Unique to 101668678 Cancer Promoting Genes Interleukin-7 Unique to Springer Unique to Plectix } • <Type F1> Database facts (multiple attributes) • <Type F2> Community Annotations • <Type C1> Co-occurrence sentence (abstracts e.g. PubMed) • <Type C2>Co-occurrence Full Text (publisher e.g. Springer) • <Type A1> Concept Profile Match • <Type A3>Co-expression (gene expression Databases) • <Type A4>Modelling hypothesis (e.g. Plectix, InWeb)
(node 1, unique ID) (node 2, unique ID) < Source concept > < Relations (edge) > < Target Concept > class date value owner condition Etc. curated curated curated Co-occ All Triples Smart Triples Curated Remove Ambiguity and Redundancy Observational Inferred Knowledge Space
sustainability sell Literature (peer reviewed) Curated databases ‘Grey’ literature Raw data Community-generated • SwissProt • Gene Ontology • NCBO (ontologies) • Peroxisome • InWeb • STRING • HAPMAP • LOVD • Reactome • IHop • SIB-lab • PatientsLikeMe • Sermo • Plectix • NBIC • WikiProfessional • WikiPathways • OWW • Alert • BioBanks • Blogs • SEED • EURORDIS • UPPMD • (NORD) • SPARC • Research CR • SOUHL • Elsevier • Springer • Wiley • BMC • PubMed • SciELO • PLoS • etc. • GEO • Express • Many NBIC data • Many NGI center data • Many public data.
Download Concept Web Includes edges from: • Pubmed (400,000,000 sentences, 5,000,000,000 concept co-occurrences) (from public data) • Protein databases (UniProt, IntAct, PDB, HPRD – 75,000 human curated PPIs) (from public data) • Gene (co-expression databases (GEO, Express… – 25 square genes) (from public data) • STRING edges (200,000 gene-gene edges) (from semi public data) • InWeb edges (240,000 unique edges from 17 species) (from proprietary data) • Reactome edges (240,000 unique edges from 17 species) (from proprietary data) • Chemspider edges (25,000,000 chemicals) (from semi public data) • Wiki edges (WikEdge = WikiPathways, WikiProfessionals, Omegawiki, Wikigene) • Plectix edges (5,000 extra edges (PPI modeling) (from proprietary data) • Private expression data (3000 extra edges, by Merck) (from proprietary data) • Et Cetera
What one can do to make scientific literature even more useful: Helping users find what is appropriate
Slide by Carl Lagoze (Cornell) – from this presentation: http://journal.webscience.org/112/3/orechem.pdf
An example ‘mash-up’: On the basis of Semantic Highlighting • Using Knewco’s freely available functionality*, scientific publishers can add semantic functionality to their material by way of highlighting concepts and then linking to additional pertinent information about that concept as well as further search possibilities with automatic expansion of the search argument with synonyms. • *Knewco is a Concept Web Alliance member Button changes when clicked
Concept Web AllianceInaugural MeetingMay 8th, New York Hall of Science Info: http://conceptweblog.wordpress.com “…an important and critically necessary meeting”
Credits Thinking Neanderthal man (after Rodin) http://blogs.sundaymercury.net/weirdscience Needle transport http://fisherwy.blogspot.com Cupped hands www.goldcoast.qld.gov.au Ship at sea http://vikingeskibsmuseet.dk Scientist www.drugdevelopment-technology.com Jungle (detail) Henri Rousseau Jungle (aerial – 2x) http://passporttoknowledge.com Triples etc. Barend Mons Demo: Fin http://demo.knewco.com Wikimore: http://wikimore.org Concept Web: http://conceptweblog.wordpress.com