1 / 31

CIFFOLD Managing Long Lines in CIF

by Kostadin Mitev, Georgi Todorov, Herbert J. Bernstein Department of Mathematics and Computer Science Dowling College, Oakdale, NY 11769 USA Work funded in part by a grant from the IUCr. CIFFOLD Managing Long Lines in CIF. Management of Data. Critical Issues in using computers in Biology

maegan
Download Presentation

CIFFOLD Managing Long Lines in CIF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. by Kostadin Mitev, Georgi Todorov, Herbert J. BernsteinDepartment of Mathematics and Computer ScienceDowling College, Oakdale, NY 11769 USA Work funded in part by a grant from the IUCr CIFFOLDManaging Long Lines in CIF

  2. Management of Data • Critical Issues in using computers in Biology Manage Raw Data Calculate Data Visualize Data Journal Publication Web Publication

  3. Many Data Representations • CIF [Hall et al 1991] Simple tag-value representation Free field, limited line length Oriented to present tables IUCr project • XML [Bray et al 1998] Opening tag - Data - Closing tag Free field, no limit on line length Oriented to represent text W3C project

  4. The Crystallographic Information File • History • 1990 – universal exchange file called Crystallographic Information File – CIF • 1991 – comprehensive Dictionary of crystallographic data items • CIF v1.0 • limit of 80 characters per line • limit of 32 characters per data tag name and data block name • one level of loop_

  5. The Crystallographic Information File (Continued) • CIF v.1.1 • limit of 2048 characters per line • limit of 75 characters per data tag name • #\#CIF_1.1 – version identification ;SHELX97 FOLLOWED BY MOLLY (N.K.HANSEN & P.COPPENS ACTA CRYSTALLOGR. A34, 909-921, 1978) REFINEMENT OF ELECTRON DENSITY. ;

  6. CIF 1.0: 80 column format People work with, need longer lines CIF 1.1 increased line length to 2048 CIF Needed to Adapt

  7. 2 year project To adapt CIFtbx [Hall 1993] [Hall Bernstein 1996], vcif [McMahon] and CIFTEST [McMahon] to CIF 1.1 Add CIFFOLD, a utility to fold and unfold long lines IUCr Project at Dowling

  8. Project Issues Support lines up to 2048 characters Translate between CIF 1.0 and CIF 1.1 Validate against syntax Validate against dictionaries Improve interoperability with XML Open source software C and Fortran

  9. Interoperability with XML Increasingly important, e.g. PDBML Will need UTF-8 character set • Support for internationalization • Support for wider range of characters Entity and character references (&) • Line length after decoding references Tables in XML, nested trees in CIF

  10. CIF Software in General • See IUCR, RCSB PDB, etc • http://journals.iucr.org/iucr-top/cif/software/index.html • http://www.rcsb.org/pdb/software-list.html#mmcif • Not all CIF software is open source • All software from this project is

  11. Open Source Software • This project will release under the GPL CIFFOLD -- in prerelease (0.5) K. Mitev, G. Todorov, H. Bernstein 2004/2005 http://www.bernstein-plus-sons.com/software/ciffold/ CIFtbx3 -- 3.0.1 released S. R. Hall and H. J. Bernstein 2005 http://www.bernstein-plus-sons.com/software/ciftbx/ vcif2 – syntax validator -- in progress Original vcif B. McMahon http://www.iucr.org/iucr-top/cif/software/vcif/index.html CIFTEST Original CIFTEST B. McMahon 2000 http://journals.iucr.org/iucr-top/cif/developers/trip/

  12. Folding Protocol Convert without loss of semantic information Comments and text fields only #\ - comment ;\ - data Initial CIF ; zinc dihydroxide divanadate dihydrate ; Transformed CIF ;\ zinc dihydroxide divan\ adate dihydrate ;

  13. Unfolding Protocol • #\as the last non blank characters of a line marks the beginning of folded comment • End of folded comment: • a line that does not start with # or • a line does not end in \ • ;\as the first characters on the line marks the beginning of folded text field • End of folded text field: • \n; (new line followed by a semi colon)which must be followed by white space

  14. CIFFOLD Folding • Lines are folded if they exceed the maximum line length • Removing of leading/trailing blank characters • In loops: • Tags are placed on a new line • If possible data tokens are in rows and columns

  15. CIFFOLD Unfolding • Trailing blanks are removed • Tag and data are on the same line as long as it is valid • In loops: • Tags are placed on a new line • If possible data tokens are in rows and columns

  16. CIFFOLD The MAP • #_M# - special comment • si - “i” spaces • ti - “i” tabs • d – data • n – new line • s7t2dsdn – 7 spaces, 2 tabs, data item, space, data item and new line

  17. CIFFOLD • Other options • Terse formatting • Preserve leading blanks • Left justification when unfolding line • Format only comments • Format everything except comments

  18. CIFFOLD • Other options • Terse formatting on large loops • What is a large loop • File version • Change • Insert • Perform formatting on user-specified chunks of the file

  19. CIFFOLD • GUI • Ncurses • User friendly • ciffold -g

  20. CIFFOLD Usage: ciffold [-i input_cif] [-o output_cif] [-x n-n,n-n] [-l n] [-m n] [-C n] [-p a[w][e]] [-v file_vers] [-c] [-d] [-e] [-g] [-w] [-u] [-L] [-t] [-h] [-M] [-V] input_cif defaults to stdin output_cif defaults to stdout input_cif of "-" is stdin, output_cif of "-" is stdout -p has character values: a - print warnings and error messages w- print only warnings e - print only error messages -v has string values 1.0 or 1.1 -c format only the comments of input_cif -d indicate that input_cif is a dictionary file -e format everything except comments -g invoke the GUI interface -w wrap/fold input_cif -u unwrap/unfold input_cif -L preserve leading blanks -t format input_cif tersely -h print this help message -M use/generate a map -V print the ciffold version

  21. Folding/Unfolding Examples • Example 1: • 'this is a string that will be folded ;;;;;;;;;;;;;;;;;;;;;;;;;; ; ;' • Could be folded as: • ;\ • this is a string that will be folded ;;;;;;;;;;;;\ • ;;;;;;;;;;;;;; ; ; • ; • If looking only for a new line and a ; as delimiter for end of the text field will be unfolded as: • 'this is a string that will be folded ;;;;;;;;;;;;' • ;;;;;;;;;;;;; ; ; • ; • which splits the string and adds an extra newline

  22. Folding/Unfolding Examples • Example 1 (continued): • It is important that the closing \n; be recognized as a closing delimiter only when followed by white space, and that when strings without newlines are folded that they have a trailing backslash on the last line • 'this is a string that will be folded ;;;;;;;;;;;;;;;;;;;;;;;;;; ; ;' • Should be folded as, say: • ;\ • this is a string that will be folded ;;;;;;;;;;;;\ • ;;;;;;;;;;;;;; ; ;\ • ;

  23. Folding/Unfolding Examples • Example 2: • 'this line is exactly 63 chars long including the delimiters\ ' • If the maximum line length is say 61 then the string • may be folded as • ;\ • this line is exactly 63 chars long including the delimiters\ • ; • But then it would unfold as: • 'this line is exactly 63 chars long including the delimiters' • which is not the same as the original, having lost • the trailing backslash-blank. Such characters need • to be protected

  24. Folding/Unfolding Examples • Example 2 (continued): • 'this line is exactly 63 chars long including the delimiters\ ' • If the maximum line length is say 61 then the string • should be folded as • ;\ • this line is exactly 63 chars long including the \ • delimiters\ \ • ; • Then it would unfold as: • 'this line is exactly 63 chars long including the delimiters\ ' • which agrees with the original string

  25. Folding/Unfolding Examples • Example 3: • #this is a comment that will be folded\unfolded\ • #this is another comment • Should be folded as: • #\ • #this is a comment that will be folded\ • #\unfolded\\ • # • #this is another comment

  26. CIFtbx3 CIFtbx Version 3.0.1 Release, April 2005 (S. R. Hall, H. J. Bernstein) A tool box of Fortran routines for manipulating CIF data CIFtbx3 Licence - GPL http://www.bernstein-plus-sons.com/software/ciftbx/

  27. vcif2 & CIFTEST • vcif2: • vcif was updated by Brian McMahon in December 2004 to use the GPL • New: long lines implemented • CIFTEST: • Original released by Brian McMahon 10 May 2000 • New vcif support and command line arguments support

  28. Open Source All latest versions use the GPL • GNU General Public License • Infectious: • Any software made from this software will be open source Free software foundation – Richard Stallman www.gnu.org This presentation is also under GPL

  29. References • [Bray et al 1998] T. Bray and J. Paoli and C.M. Sperberg-McQueen, “Extensible Markup Language (XML)”, W3C Recommendation 10-Feb-98, REC-xml-19980210, http://www.w3.org/TR/1998/REC-xml-19980210, W3C,1998 • [Hall et al 1991] S. R. Hall , F. H. Allen and I. D. Brown, “The crystallographic information file (CIF): a new standard archive file for crystallography”, Acta Cryst., A47, 655-685, 1991. http://www.iucr.org/iucr-top/cif/ • [Hall 1993] S. R. Hall, “CIF Appl. IV. CIFtbx: A Toolbox for Manipulating CIF”, J. Appl. Cryst. 26, 482-494, 1993. • [Hall_Bernstein 1996] S. R. Hall and H. J. Bernstein, “CIF Applications. CIFtbx2: Extended Tool Box for Manipulating CIFs”, J. Appl. Cryst. 29, 598-603", 1996. • [Mitev et al 2004] K. Mitev, G. Todorov, H. J. Bernstein, “CIFFOLD”, 2004. http://bernstein-plus-sons.com/software/ciffold/

  30. Acknowledgements • The support of IUCr for this project is greatfully acknowledged. The work is done in HJB’s laboratory at Dowling College. The laboratory is funded by grants from NSF, DOE and the IUCr. The current projects in the laboratory have drawn on the skills and support of: • Isaac Awuah Asiamah, Ricky Chachra, Clarice Chigbo, Georgi Darakev, Nikolay Darakev, Niroshan Egodawatte, Stavros Louris, Kostadin Mitev and Georgi Todorov

  31. Thank You • http://arcib.dowling.edu/cifiucr

More Related