970 likes | 1.35k Views
Trends in Scholarly Communication. Alex D. Wade Director, Scholarly Communication Microsoft Research. A bit about me…. Microsoft Research Labs. External Research Groups. Technology Learning Labs. Collaborative Institutes and Centers. Microsoft External Research.
E N D
Trends in Scholarly Communication Alex D. Wade Director, Scholarly Communication Microsoft Research
Microsoft Research Labs External Research Groups Technology Learning Labs Collaborative Institutes and Centers
Microsoft External Research Organization within Microsoft Research that engages in strong partnerships with academia, industry and government to advance computer science, education, and research in fields that rely heavily upon advanced computing Initiatives that focus on the research process and its role in the innovation ecosystem, including support for open access, open tools, open technology, and interoperability Developers of advanced technologies and services to support every stage of the research process
External Research Global Themes Advanced Research Tools and Services
External Research Global Themes • Data Intelligence: Understanding web-scale data challenges • Cloud Computing: Researching cloud-service technologies • Device Oriented Computing: Recognizing the reality is Cloud+Client • Many/Multicore: Understanding how to best exploit the emerging trends in chip design/architecture
External Research Global Themes • Visualizing and Experiencing E3 Data + Information: Provide a unique experience to reduce time to insight and knowledge through visualizing data and information • Accessible Data: Ensure E3 data (remote and local sensing) is easily accessible and consumable in the scientists domain
External Research Global Themes • Devices, Sensors and Mobility: Cellphone as a platform for healthcare in 2009; Proof points for the value of new modes of interaction with health data • Genomics in Healthcare: Bring the bioinformatics community to the Windows platform; Apply Microsoft research and tools to challenges in genomics • Modeling living systems (MSRC-led): Long-term healthcare impact in predictive/preventative medicine.
External Research Global Themes • Scholarly Communication: Developing software tools for academics on top of MS technology to facilitate the full lifecycle of their day-to-day research workflow. Evolving modes of academic collaboration & dissemination to speed discovery • Education: Transform education through exploitation of novel uses of MS hardware/software (Gaming, Tablet PC, Surface, etc.)
Mission • Tailor Microsoft software to meet the specific needs of the academic research community • Our approach: • Conduct applied projects to enhance academic productivity by evolving Microsoft’s scholarly communication offerings
Why? • Listen and learn • Increase our relevance in academia • Neartime + longterm • Anticipate evolving requirements • Help researchers spend less time doing computer science and more time doing research
Our Challenges • Audiences • Multi-audience problem • File formats & interoperability • Microsoft software not optimized for the specialized needs of academic researchers • IT Community in Academia • No rich ISV community filling the gap • DIY culture in academia • Open Source Software
Open Access Open Source Open Data Open Science “In order to help catalyze and facilitate the growth of advanced CI, a critical component is the adoption of open access policy for data, publications and software.” NSF Advisory Committee on Cyberinfrastructure (ACCI) • Microsoft Interoperability Principles • Open Connections to Microsoft Products • Support for Standards • Data Portability • Open Engagement http://www.microsoft.com/interop/
“I [want] to clarify a common confusion that I hear from many colleagues: open source vs. open access. Although the terms are related in some ways (indeed, they derive from a very similar philosophy), they refer to two discrete concepts. • Open Access: Focuses on the unrestricted sharing of research results, typically through open access journals (PLoS ONE, PalaeontologiaElectronica, etc.). • Open Source: Computer software, typically (but not always) freely distributed, in which the source code is freely available.” http://openpaleo.blogspot.com/2009/10/its-open-access-week.html
CodePlex Foundation http://www.codeplex.org/
OSI Approved Open Source Licenses • Microsoft Public License (Ms-PL)http://opensource.org/licenses/ms-pl.html • Microsoft Reciprocal License (Ms-RL)http://opensource.org/licenses/ms-rl.html
Themes • ‘Traditional’ Scholarly Communication • Original goals • Reactions to current state • Trends in Computing & Software • A New Paradigm in Research • Trends in Scholarly Communication
The needs of academic researchers? • Discover and digest existing research • Conduct research • Experimentation (lots of domain specific stuff) • Data Collection & Analysis (computer programming) • Collaboration • Communicate research • Author • Disseminate • Measure Impact
For the sake of Scientific Progress • Contract with Researchers • If you share • Then you get credit; science progresses more rapidly • Advantages • Registration • Validation & Quality • Discovery and Access • Aggregation • Dissemination • Reach, Speed, Efficiency
Quality Control • Validation takes time & expertise • Is it fulfilling the promise WRT the science • Can this process keep pace with the growing volume and complexity of research • Tension between timeliness and quality • Preprints in physics
Discovery & Access • Impediments: Business models and IP ownership have led to barriers to access • Fragmentation: Information is channeled into silos and gated communities, making aggregation and analysis cumbersome (if not impossible) • Timeliness: Research cycles to are slowed by the discovery publication time lag • Products: System still anchored to the 'article' container to the exclusion of all else
A Jumble “The truth is that [legal information is] an impossible jumble of materials: in various different formats - Word, pdf, HTML, text; some freely available, some only available for a fee; some only available through private vendors; some reachable by search engines, some not; some available in authenticated formats, some not; some timely, some not; some available in bulk downloadable formats for re-use, some not. The problem is not that legal information isn't available; it's that access to it is highly unstandardized.” https://clients.outsellinc.com/insights/?p=11087
Registration and Measurement • Publish or Perish • “My Data” & “My Research” protectionism • Dark data and negative correlations • Citation Analysis and Impact Factor
A Sea Change in Computing • Massive Data Sets • Federation, Integration & • Collaboration • There will be more scientific • data generated in the next • five years than in the history of • humankind • Evolution of • Many-core &Multicore • Parallelism everywhere • What will you do with • 100 times more • computing power? • The power of the • Client + Cloud • Access Anywhere, Any Time • Distributed, loosely-coupled, • applications at scale across • all devices will be the norm
A Digital Data Deluge in Research • Data collection • Sensor networks, satellite surveys, high throughput laboratory instruments, observation devices, supercomputers, LHC … • Data processing, analysis, visualization • Legacy codes, workflows, data mining, indexing, searching, graphics … • Archiving • Digital repositories, libraries, preservation, … • SensorMap • Functionality: Map navigation • Data: sensor-generated temperature, video camera feed, traffic feeds, etc. • Scientific visualizations • NSF Cyberinfrastructure report, March 2007
Wireless Sensor Networks • Uses 200 wireless (Intel) computers, with 10 sensors each, monitoring • Air temperature, moisture • Soil temperature, moisture, at least in two depths (5cm, 20 cm) • Light (intensity, composition) • Soon gases (CO2, O2, CH4, …) • Long-term continuous data • Small (hidden) and affordable (many) • Less disturbance • >200 million measurements/year • Complex database of sensor data and samples With K. Szlavecz and A. Terzis at Johns Hopkins http://lifeunderyourfeet.org
Joe Hellerstein—UC Berkeley Blog: “The Commoditization of Massive Data Analysis” • We’re not even to the Industrial Revolution of Data yet… • “…since most of the digital information available today is still individually "handmade": prose on web pages, data entered into forms, videos and music edited and uploaded to servers. But we are starting to see the rise of automatic data generation "factories" such as software logs, UPC scanners, RFID, GPS transceivers, video and audio feeds. These automated processes can stamp out data at volumes that will quickly dwarf the collective productivity of content authors worldwide. Meanwhile, disk capacities are growing exponentially,so the cost of archiving this data remains modest. And there are plenty of reasons to believe that this data has value in a wide variety of settings. The last step of the revolution is thecommoditization of data analysis software, to serve a broad class of users.” • How this will interact with the push toward data-centric web services and cloud computing? • Will users stage massive datasets of proprietary information within the cloud? • How will they get petabytes of data shipped and installed at a hosting facility? • Given the number of computers required for massive-scale analytics, what kinds of access will service providers be able to economically offer?
The Future: an Explosion of Data Experiments Simulations Archives Literature Instruments The Challenge: Enable Discovery. Deliver the capability to mine, search and analyze this data in near real time. Enhance our Lives Participate in our own health care. Augment experience with deeper understanding. Petabytes Doubling every 2 years
The Cloud • A model of computation and data storage based on “pay as you go” access to “unlimited” remote data center capabilities • A cloud infrastructure provides a framework to manage scalable, reliable, on-demand access to applications • A cloud is the “invisible” backend to many of our mobile applications • Historical roots in today’s Internet apps • Search, email, social networks • File storage (Live Mesh, MobileMe, Flicker, …)
Types of Cloud Computing • Utility computing [infrastructure] • Amazon's success in providing virtual machine instances, storage, and computation at pay-as-you-go utility pricing was the breakthrough in this category, and now everyone wants to play. Developers, not end-users, are the target of this kind of cloud computing. • Platform as a Service[platform] • One step up from pure utility computing are platforms like Google AppEngine and Salesforce'sforce.com, which hide machine instances behind higher-level APIs. Porting an application from one of these platforms to another is more like porting from Mac to Windows than from one Linux distribution to another. • End-user applications [software] • Any web application is a cloud application in the sense that it resides in the cloud. Google, Amazon, Facebook, twitter, flickr, and virtually every other Web 2.0 application is a cloud application in this sense. From: Tim O'Reilly, O'Reilly Radar (10/26/08)—”Web 2.0 and Cloud Computing”
The Rationale for Cloud Computing in eResearch • We can expect research environments will follow similar trends to the commercial sector • Leverage computing and data storage in the cloud • Small organizations need access to large scale resources • Scientists already experimenting with Amazon S3 and EC2 services • For many of the same reasons • Small, silo’ed research teams • Little/no resource-sharing across labs • High storage costs • Physical space limitations • Low resource utilization • Excess capacity • High costs of acquiring, operating and reliably maintaining machines is prohibitive • Little support for developers, system operators
Cloud Landscape Still Developing • Tools are available • Flickr, SmugMug, and many others for photos • YouTube, SciVee, Viddler, Bioscreencast for video • Slideshare for presentations • Google Docs for word processing and spreadsheets • Data Hosting Services & Compute Services • Amazon’s S3 and EC2 offerings • Archiving / Preservation • “DuraCloud” Project (in planning by DuraSpace organization) • Developing business models • Service-provision (sustainability) • NSF’s “DataNet” – developing a culture, new organizations
Why Semantic Computing http://cacm.acm.org/magazines/2009/12/52840-a-smart-cyberinfrastructure-for-research
“Semantics-based computing” vs. “Semantic web” • There is a distinction between the general approach of computing based on semantic technologies (e.g. machine learning, neural networks, ontologies, inference, etc.) and the semantic web – used to refer to a specific ecosystem of technologies, like RDF and OWL • The semantic web is just one of the many tools at our disposal when building semantics-based solutions
Towards a smart cyberinfrastructure? • Leveraging Collective Intelligence • If last.fm can recommend what song to broadcast to me based on what my friends are listening to, the cyberinfrastructure of the future should recommend articles of potential interest based on what the experts in the field that I respect are reading? • Examples are emerging but the process is presently more manual – e.g. Connotea, Faculty of 1000, etc. • Semantic Computing • Automatic correlation of scientific data • Smart composition of services and functionality • Leverage cloud computing to aggregate, process, analyze and visualize data
A world where all data is linked… • Important/key considerations • Formats or “well-known” representationsof data/information • Pervasive access protocols are key (e.g. HTTP) • Data/information is uniquely identified (e.g. URIs) • Links/associations between data/information • Data/information is inter-connected through machine-interpretable information (e.g. paper Xis about star Y) • Social networks are a special case of ‘data networks’ Attribution: Richard Cyganiak; http://linkeddata.org/
…and stored/processed/analyzed in the cloud visualization and analysis services scholarly communications Vision of Future Research Environment with both Software + Services domain-specific services search books citations blogs &social networking Reference management instant messaging identity mail Project management notification document store storage/data services knowledge management The Microsoft Technical Computing mission to reduce time to scientific insights is exemplified by the June 13, 2007 release of a set of four free software tools designed to advance AIDS vaccine research. The code for the tools is available now via CodePlex, an online portal created by Microsoft in 2006 to foster collaborative software development projects and host shared source code. Microsoft researchers hope that the tools will help the worldwide scientific community take new strides toward an AIDS vaccine. See more. compute services virtualization knowledge discovery
A New Research Paradigm • Digital technologies are completely revolutionizing the way that researchers work… in all subject areas. Sarah Porter, JISC http://www.jisc.ac.uk/whatwedo/campaigns/res3/video.aspx
“Digital technologies are also facilitating collaboration between researchers; which means knowledge can be shared more quickly and more effectively, and the potential for progress is massively advanced.” JISC report http://www.jisc.ac.uk/whatwedo/campaigns/res3/video.aspx
eResearch: data is easily shareable Sloan Digital Sky Server/SkyServer http://cas.sdss.org/dr5/en/