130 likes | 240 Views
Changing methods of data sharing in crystallography. Editor-in-Chief Acta Crystallographica & Chair of the IUCr Journals Commission 1996-2005; IUCr Delegate to ICSTI 2005-. Professor John R Helliwell Imperial College, June 28th, 2006.
E N D
Changing methods of data sharing in crystallography Editor-in-Chief Acta Crystallographica & Chair of the IUCr Journals Commission 1996-2005; IUCr Delegate to ICSTI 2005- Professor John R Helliwell Imperial College, June 28th, 2006 The University of Manchester john.helliwell@manchester.ac.uk
Content of presentation • Data description standards • Quality control in publication • Responsibility for quality control • Data quality standards • Data publication at source
Crystal structures ‘published’ • Curated databases • Cambridge Structural Database • Small organic/metal-organic: 335,280 : 29,000/yr • Protein Data Bank • Biological macromolecules: 34,506 : 5,500/yr • Inorganic Crystal Structure Database (82,676), CrystMet (99,893), Powder Diffraction File (240,050) • IUCr journals • Acta Crystallographica Sections C, E • Small-molecule, inorganic: 2357 articles/year • Acta Crystallographica Sections D, F • Biological macromolecules: ~ 120+ structural articles/year
Standard description of data • Crystallographic Information Framework • International Tables for Crystallography (2005). Vol. G, Definition and exchange of crystallographic data, edited by S. R. Hall & B. McMahon, 1st ed. Berlin: Springer. • CIF file structure • Hall, S. R., Allen, F. H. & Brown, I. D. (1991). The Crystallographic Information File (CIF): a new standard archive file for crystallography.Acta Cryst. A47, 655-685 • Dictionary definition language • Hall, S. R. & Cook, A. P. F. (1995). STAR dictionary definition language: initial specification.J. Chem. Inf. Comput. Sci.35, 819-825. • Data dictionaries
Data dictionary definition _refine_ls_R_Fsqd_factor Name:'_refine_ls_R_Fsqd_factor' Definition: Residual factor R(Fsqd), calculated on the squared amplitudes of the observed and calculated structure factors, for significantly intense reflections (satisfying _reflns_threshold_expression) and included in the refinement. The reflections also satisfy the resolution limits established by _refine_ls_d_res_high and _refine_ls_d_res_low. sum | F(obs)^2^ - F(calc)^2^ | R(Fsqd) = ------------------------------- sum F(obs)^2^ F(obs)^2^ = squares of the observed structure-factor amplitudes, F(calc)^2^ = squares of the calculated structure-factor amplitudes and the sum is taken over the specified reflections. The permitted range is 0.0 infinity Type: numb Category: refine
Quality control at source checkCIF: http://checkcif.iucr.org • Free public service • Sponsored by publishers and databases • Over 340 separate tests Described at http://journals.iucr.org/services/cif/datavalidation.html
Data publication increasingly ‘at source’ • Small-molecule crystallography often ‘high throughput’; thus only a subset of results get into the literature (?5 to 10%?) • There is a rise of local/national laboratory ‘data repositories’ • Examples: eBank (Southampton, UK + 5 other sites); Reciprocal Net (Indiana, USA + 18 other sites)
eBank • ePrints repository • OAI-PMH • Standard metadata • All data • Links to publication • Rights • Quality
Online Dictionary Project • Use wiki approach (à la Wikipedia) to realise community agreed dictionary terms • Pilot stage started September 2005 • Led by Emeritus Professor Andre Authier, Chair of the IUCr Nomenclature Commission
Summary • Quality of scientific argument depends on • Quality of data • Critical appraisal • Accessibility of relevant data • Precision of definitions • Rigorous analysis • IUCr publications strive to provide the highest quality in all these areas so as to inform the Editorial process including the peer review
Acknowledgements • Peter Strickland, Managing Editor at IUCr, Chester. • Brian McMahon, R&D Technical Development Officer at IUCr, Chester.