310 likes | 481 Views
and. Tools for exploring the biomedical information landscape. Les Grivell EMBO Electronic Information Programme. EAHIL 2004, Santander,. Electronic information programme. Online research information environment for the life sciences.
E N D
and Tools for exploring the biomedical information landscape Les Grivell EMBO Electronic Information Programme EAHIL 2004, Santander,
Electronic information programme Online research information environment for the life sciences A next generation information service for the life sciences Communities@embo Life Sciences Mobility Portal
But first, let me take you back – not to Altomira, but to the ……early days ofscientific publishing(pre- impact factor)
When libraries were comfortable places that had everything you needed …
and it was possible to keep track of the literature …. (more or less) …
Where are we now? – Publishing is big business • STM publishing is a multi-billion EUR activity(In the UK alone, GBP 22 billion in 2000) • Estimated 164000 scientific periodicals worldwide; around 16% of these are online
– Core science; core journals • PubMed lists some 4600 journals in bio-medical disciplines • As of 19 Sept 2004, 4429 of these are online • The PubMed database provides access to circa 15 million abstracts (but if you can’t be found, you won’t be read …) • The Science Citation Index lists 5876 journals with impact factors ranging from 54.45 – 0.00. (you’ve been found, but are you worth reading? …)
Another information explosion: genomics 35 30 Sequence entries in the EMBL DNA database 25 Base pairs (billions) 20 15 10 Morowitz 5 0 Year 1980 1985 1990 1995 2000 2005
The nice thing about biological information resources is that there are so many ….. • Hundreds of different databases, many in flat-file format • A variety of user interfaces • General lack of interoperability
Micro-array chip Discover relationships Database lookup Wouldn’t it be nice to …… find all published literature references for a large set of gene symbols and explore their relationships? Co-regulated genes Find literature
I don’t want there to be endlesssearching in the library! It is at the expense of nerves and these should not be wasted on such stupidities…. Fritz Saxl (1890– 1948) ‘Ich will nicht, dass in der Bibliothekewig gesucht wird! Dieses Suchenkostet Nerven und die dürfen nichtverschwendet werden an solcheDummheiten... Aby Warburg (1866– 1929)
Biosis Some text search engines Bibliographic databases Full text / web-pages
Pubmed Text-based! Search only title, authors, abstract Boolean keyword search (AND / OR) Search language is English No ranking on relevance to query! No direct linkage to other datasets All documents stored and indexed in one location
main features • Ability to interconnect literature articles with different types of molecular data, including images • Ability to search through and retrieve journal articles and other full text documents, even when in different physical locations • Ability to support multi-lingual documents and queries • Services free to the academic community A discovery tool Features implemented via conceptual fingerprinting
Fingerprint database Full text document Index and link index terms to (multi-lingual) thesauri • 1 conceptual fingerprint (CFP) = 400 bytes • Abstraction: 250.000 pages/PC/day • Matching: 500.000 CFP’s: 40 millisec. conceptual fingerprints
prototypes • Initial prototypes in September 2002 and July 2003 • Current prototype online since 1st March 2004 • Next launch due mid-October 2004
Content selection: abstracts + full text Choose search focus Full text query in English, French or German. Is fingerprinted for search E-BioSci
… and now a word about 8 partners ( DE, ES, FR,UK) (Platform) 13 partners (ES, FR, IT, NL, UK) (Research project)
www.bioimage.org (Dr David Shotton, Univ. Oxford) Wouldn’t it be nice to be able to navigate from an image to literature and molecular databases?
Gene symbol identification in text Text containing symbols
PEO1 GUCY2C TYRO3 CD44 Improved literature – molecular dataset linkage Twinkle, twinkle, little star,How I wonder what you are.Up above the world so high,Like a diamond in the sky.Twinkle, twinkle, little star,How I wonder what you are
Problems in gene symbol recognition • Many gene symbols are indistinguishable from everyday words or abbreviations • Synonyms • Homonyms • Homonym synonyms (ELK1 = SAP1; CAR1 = SAP1; BD-2 = SAP1; RIP1_SAPOF = SAP1)
gene FRDA protein depletion disease frataxin Yah1p required activates Word-“processing”
Protein interaction networks ataxia Yfh1 requires regulates Ssc1 Isu1 interacts activates Oct1
http://www.e-biosci.org http://www.oriel.org http://www.bioimage.org http://www.pdg.cnb.uam.es/UniPub/iHOP/ Some web-addresses