C21st Scholarship: Data as an Agent for Change Dr Liz Lyon, Director, UKOLN, University of Bath, UK Associate Director, UK Digital Curation Centre 3 rd Bloomsbury Conference, London, June 2009.

  2. Perspectives • The 21stC Scholar : Team Science in the Cloud • Chemical Crystallography : Data Publishing Showcase • The Future : a Transformational Agenda

  3. The 21stC Scholar : Team Science in the Cloud http://www.flickr.com/photos/wwarby/3632317031/

  4. What does the C21st research(er) look like? • “From users to choosers” (Yanosky) • Pro-sumers (Toffler) • Digital nomads • Work on the Webtop http://www.flickr.com/photos/shankrad/2905938179/ • Multi-scale & complex • Highly data-intensive • Increasingly “open” http://www.flickr.com/photos/stormsriver/2286011597/

  5. “Continuum of Openness”? OPEN CLOSED

  6. What do we mean by Team Science? • Science as a social activity • Tweet • Blog • Comment • Rate • Vote • Recommend • Tag • Share • Mash • Trust is key • Inter-institutional collaboration –better science (Brian Uzzi, 2008) • Highly collaborative • Multi-disciplinary • Core team skills

  7. A new digital economy? • Data is: • On demand • A utility • Commoditised • Un-differentiated • “Publish then filter” (Shirky) • Traded • “Cloud” model? • Brokers & aggregators are key roles • Free, pay per use, pay as you grow….. http://www.flickr.com/photos/will-lion/2738252562/ • Economies of scale • Network effects • New data publishing business models

  8. Chemical Crystallography : Data Publishing Showcase http://www.flickr.com/photos/thomasreichart/2130018485/sizes/l/

  9. Data Deluge Slide: Dr Simon Coles, Univ Soton “40 years ago a PhD student would determine about 3 crystal structures for their thesis – this can now be easily achieved in a day” 0.5 million ‘Few thousand’ 2.5 million 35 million A bottleneck : the primary cause is the current data publication process, which is tied to journal articles and peer review

  10. Multi-scale : from Diamond Light Source …..

  11. …..to the Laboratory bench

  12. eCrystals Team Simon Coles,Mike Hursthouse, Jeremy Frey, Cameron Neylon, Andrew Milsted, Richard Stephenson, Jamie Robinson, Steven Wilson, Andrew Bailey, Mark Borkum Dave DeRoure, Les Carr, Monica Schraefel, Chris Gutteridge, Tim Myles-Board, Arouna Woukei, Dave Tarrant, Stuart Middleton Liz Lyon, Manjula Patel,Rachel Heery, Monica Duke, Michael Day, Traugott Koch, Pete Cliff Domain (Chemists) Computer science Informatics

  13. eCrystals Data Repository • Quick & simple to deposit • Software tools • Laboratory archive • Community involvement • ‘Embargo’ facility • Structured foundations • Discoverable & harvestable http://ecrystals.chem.soton.ac.uk

  14. Data sustainability • Trust • Standards • Audit and certification tools • TRAC • DRAMBORA • PLATTER • NESTOR • Data Seal of Approval • eCrystals Curation Reports (3) • Preservation metadata • PREMIS Data Dictionary • OAIS • Representation Information • Registry/Repository RRORI

  15. Data Discovery & Access • “Community Criteria for Interoperability” (Scaling Up Report 2008) • Domain data format standard: CIF • Domain data validation standard: CheckCIF • Metadata schema: eCrystals Application Profile • http://www.ukoln.ac.uk/projects/ebank-uk/schemas/ • Crystallography Data Commons: TIDCC Data Model in development • Embargo & Rights http://ecrystals.chem.soton.ac.uk/rights.html • Domain identifier: International Chemical Identifier • Citation & linking: DOI http://dx.doi.org/10.1594/ecrystals.chem.soton.ac.uk/145

  16. Paris, March 2009

  17. Memorandum of Understanding “ “

  18. Dr Simon Coles, Univ Southampton http://wiki.ecrystals.chem.soton.ac.uk/index.php/Main_Page

  19. Slide of data services : CrystalEye, Crystal Web, Chemxseer etc search structures check PMR stuff aggregate, syndiucate, filter etc. New Web service to aggregate published crystallography data...

  20. ... federated search.....

  21. structure search...

  22. Data casts : Lab Blogs Original slide: Dr Simon Coles, Univ Soton Tools Machines Sensors

  23. Publishing and sharing methodologies ...

  24. ... and workflows ... ... data for re-use, mash-ups, mining, computation, models, simulations ...

  25. Slide: Dr Simon Coles, Univ Soton oreChem – The Chemical Semantic Web • At-source capture of chemistry data • Chemical structure search • Compound object authoring • Retrospective harvesting of chemistry data • Reuse through common ORE data model • Semantic authoring • Virtualized triple storage • University of Cambridge • Cornell University • Indiana University • Penn State University • University of Queensland • University of Southampton Mash-up (reuse) SemanticGraph (storage) experiments scientists documents molecules text data Data (capture) molecules data measurements 27

  26. The Future : a Transformational Agenda? http://www.flickr.com/photos/cyber_chof/1246303241/sizes/m/

  27. We need to understand the value and benefits of data publishing and associated data curation / management.... and articulate them clearly • Values & benefits may be: • political • economic • societal... • DCC Research Data Management Forum 3 Some issues and challenges.....

  28. 1. Research quality • Publications based on closed peer review • Maintain reputation • Demonstrate provenance • Open pilots – Nature • Use collective intelligence • Ratings, polls, recommender systems • Data publishing policy?

  29. 2. Research sustainability • Ensure curation & preservation of long term scientific record including the data • Requires significant investment in infrastructure • Assure data security • Demonstrate resilience & robustness • Establish trust • New business models • Understand full costs

  30. 3. Research capacity & capability • Multi-disciplinary team • Hybrid skills • New field - data informatics • New roles for information professionals?

  31. IJDC 2009 (in press) • Increase capacity & capability • Embed skills in LIS curriculum • Develop career paths, incentivise

  32. Take homes Team science is a social activity We need to advocate the value & benefits of data publishing Data informatics underpins C21st scholarship

  33. Moving to Multi-Scale Science: Managing Complexity & Diversity Thank you Slides will be available at :http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.htmlhttp://www.dcc.ac.uk/

