160 likes | 184 Views
Explore innovative tools such as Connotea, MySpace, and Google Scholar for seamless collaborative scholarly research. Learn to integrate existing platforms into a cohesive digital workspace.
E N D
Web 2.0 andGrids for Scholarly Research Peking University July 27 2006 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org
Application Drivers • Science Informatics for document analysis as in case of chemistry which has very precise naming rules for compounds that allow accurate searches in documents • Suggesting how to tag scientific documents either when writing it or after the fact • Journal web site of the future as illustrated by Nature building social bookmarking tool Connotea • Conference support tools as can benefit from features needed by journals • This gives document enhanced Cyberinfrastructure (CI)
Community Tools • e-mail and list-serves are oldest and best used • Kazaa, Instant Messengers, Skype, Napster, BitTorrent for P2P Collaboration – text, audio-video conferencing, files • del.icio.us, Connotea, Citeulike, Bibsonomy, Biolicious manage shared bookmarks • MySpace, Bebo, Hotornot, Facebook, or similar sites allow you to create (upload) community resources and share them; Friendster, LinkedIn create networks • http://en.wikipedia.org/wiki/List_of_social_networking_websites • Writely, Wikis and Blogs are powerful specialized shared document systems • ConferenceXP and WebEx share general applications • Google Scholar tells you who has cited your papers while publisher sites tell you about co-authors • Windows Live Academic Search has similar goals • Note sharing resources creates (implicit) communities • Social network tools study graphs to both define communities and extract their properties
How to use Web2.0 Community tools in CI • Nearly all of them have “profiles”, “users”, “groups”, “friends” etc. • Need to integrate these • P2P File Sharing: Maybe this is useful for sharing files in research groups (virtual organizations) • Will modify Maze http://maze.pku.edu.cn– popular Chinese social P2P system with 2.5 million users • BitTorrent: more popular than FTP – why not use for higher performance fault tolerant cached file sharing? • MySpace etc.: Could consider MyGridSpace or MyScienceSpace that supports a similar document sharing model with users uploading pictures, papers and even data/services of interest • Could include uploaded material in workflows • Social Bookmarking and linking: discuss later • http://gf6.ucs.indiana.edu:48990/SemanticResearchGrid/
MyResearchDatabase Bibliographic Database Web serviceWrappers Document-enhanced Cyberinfrastructure Del.icio.us Windows Live Academic Search TraditionalCyberinfrastructure Export:RSS, BibtexEndnote etc. CiteULike Google Scholar Connotea Citeseer Bibsonomy Science.gov Biolicious PubChem Generic Document Tools CMT ConferenceManagement PubMed Manuscript Central Community Tools Integration/Enhancement User Interface etc. Existing User Interface New Document-enhanced Research Tools Existing Documentbased Research Tools
Strategy • Doesn’t seem useful to build the 251st community tool • In fact a major barrier to use of existing tools is • What happens when a better tool comes along and/or chosen tool disappears (unsupported/removed from Web) • So assume use existing tools but wrap them all as web services so can transfer information to new tools and integrate information between tools • Need some “glue” logic, a “unification” database and minimal user interface • Bookmarking tools: del.icio.us, Connotea, CiteULike (includes plug-ins to major publisher sites) • Document: Google Scholar, Windows Live, Citeseer tools, OSCAR3 for Chemistry, Science.gov (later) • Journals: Manuscript Central • Conferences: CMT from Microsoft or ?
Delicious Semantic Web/Grid • http://del.icio.us purchased by Yahoo for ~$30M • http://www.CiteULike.org • http://www.connotea.org (Nature) • Associate metadata with Bookmarks specified by URL’s, DOI’s (Digital Object Identifiers) • Users add comments and keywords (called tags) • Users are linked together into groups (communities) • Information such as title and authors extracted automatically from some sites (PubMed, ACM, IEEE, Wiley etc.) • Bibtex like additional information in CiteULike • This is perhaps de facto Semantic Web – remarkable for its simplicity
Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid I • Citeseer and Google Scholar scour the Internet and analyze documents for incidental metadata • Title, author and institution of documents • Citations with their own metadata allowing one to match to other documents • Science.gov extracts metadata from lots of US Government databases • These capabilities are sure to become more powerful and to be extended • Give “Citation Index” in real time • Tell you all authors of all papers that cite a paper that cites you etc. (Note it’s a small world so don’t go too far in link analysis) • Tell you all citations of all papers in a workshop
Document-enhanced Cyberinfrastructureaka Semantic Scholar Grid II • It is natural to develop core document Servicessuch as those used in Citeseer/Google Scholar but applied to “your” documents of interest that may not have been processed yet • As just submitted to a conference perhaps • These tools can help form useful lists such as authors of all cited or submitted papers to a journal • OSCAR2/3 (from Peter Murray-Rust’s group at Cambridge) augment the application independent “core” metadata (Title, authors, institutions, Citations) with a list of all chemical terms • This tool is a Service that can be applied to “your” document or to a set of documents harvested in some fashion • Other fields have natural application specific metadata and OSCAR like tools can be developed for them • Such high value tools could appear on “publisher” sites of future (or else publishers will disappear)
OSCAR3 Service from Cambridge UK • Oscar3 is a tool for shallow, chemistry-specific natural language parsing of chemical documents (i.e. journal articles). • It identifies (or attempts to identify): • Chemical names: singular nouns, plurals, verbs etc., also formulae and acronyms. • Chemical data: Spectra, melting/boiling point, yield etc. in experimental sections. • Other entities: Things like N(5)-C(3) and so on. • Uses SMILES, InChI and CML • There is a larger effort, SciBorg, in this area • http://www.cl.cam.ac.uk/~aac10/escience/sciborg.html http://wwmm.ch.cam.ac.uk/wikis/wwmm/index.php/Oscar3
OSCAR2 Chemistry Document analysis • It detects “magic” chemical strings in text and then • Stores them as metadata associated with document • Queries ChemInformatics repositories to tell you lots of information about identified compounds • Tells you which other documents have this compound
Provenance and Delicious CI • We can use del.icio.us style interface to annotate Application Data with (extra) provenance and user comments of any type (describing quality of data or a keyword relating different data etc.) • All data should be labeled by a URI to enable this • One has in addition Citeseer/OSCAR metadata • Current major tagging systems support flat list of tags without name=value (RDF triple) or schema organization • Tradeoff between features and pervasive deployment • Some extra features are easy to add as a custom service • Features not supported by del.icio.us can be uploaded as comments
Current Status • Google Scholar, Windows Live Academic Search, del.icio.us, Connotea, CiteULike, OSCAR3 are Web Services • Debugging on 500 presentations and papers from my CGL research group • Experiment with GGF Presentations, Broad collection of Chemical Informatics resources (explore science document CI link) and Concurrency&Computation: Practice&Experience Web site (?business model for journals)