1 / 31

Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck

Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013. Overview. Traditional Workflow in Molecular Dynamics Defining the Problem An Interchangeable Approach Aiding Analysis Current Usage. Basics of Atomistic Simulations.

kareem
Download Presentation

Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generalized Atomic Systems: A Tool Kit for Atomistic Simulation Data Michael Waters Katie Sebeck 2/20/2013

  2. Overview • Traditional Workflow in Molecular Dynamics • Defining the Problem • An Interchangeable Approach • Aiding Analysis • Current Usage

  3. Basics of Atomistic Simulations • Atoms in boxes • Positions • Updated by iteratively solving F=ma according to empirical force fields • Velocity • Type, charge, etc.. • System wide data • Simulation box • Number of atoms • Temperature, energy, pair potentials…

  4. ALL molecular dynamics data can be contained in ASCII text files

  5. A Brief Guide to Atomistic File Types • pdb, xyz, mol, cfg, sfd, gro, mdl, LAMMPS read_data, ccm, xsd, cif, car…

  6. Through a Traditional Workflow • Control file • Structure file • Format depends on program units real timestep 1.0 atom_style bond dimension 3 boundary ppp #---------------Coordinates and Bonds -------------- lattice fcc 1.0 region 1 block -9.025 -1.805 0 70.395 0 37.905 #N=28 read_data n28lat pair_stylelj/cut 9.805 pair_coeff 1 1 0.1431 3.923 pair_coeff 2 2 0.1432 3.923 pair_coeff 3 3 4.72 2.616 pair_modify mix arithmetic bond_style harmonic bond_coeff 1 41.82 1.54 group alkane type 1 2 group copper type 3 neighbor 1.0 bin thermo 1 thermo_style custom step temp pekeetotal #minimize 1.0e-4 1.0e-6 100 1000 fix hope all nve run 100000 n=16, 500 Chains, rho=0.7918 8000 atoms 3 atom types 7500 bonds 1 bond types 0 angles 0 dihedrals 0 impropers 0 92.055 xloxhi 0 70.395 yloyhi 0 37.905 zlozhi Masses 1 14.002 2 14.002 3 63.54 Atoms 1 1 2 1.80500000000000 1.80500000000000 1.80500000000000 2 1 1 2.65313400000000 3.07841000000000 1.80500000000000

  7. Through a Traditional Workflow • Information about simulation run in control file • Hardware, software version metadata formatting depends on system configuration • Produces output of overall run statistics Loop time of 3515.13 on 32 procs for 50000 steps with 107008 atoms Pair time (%) = 1108.83 (31.5444) Bond time (%) = 78.4225 (2.231) Neigh time (%) = 162.274 (4.61645) Comm time (%) = 1270 (36.1294) Outpt time (%) = 523.248 (14.8856) Other time (%) = 372.363 (10.5931) Nlocal: 3344 ave 8049 max 0 min Histogram: 16 0 0 0 0 0 2 6 3 5 Nghost: 7940.66 ave 15817 max 0 min Histogram: 8 4 4 0 0 0 0 0 8 8 Neighs: 862976 ave 2.19776e+06 max 0 min Histogram: 16 0 0 0 0 2 2 6 2 4

  8. Through a Traditional Workflow • Output files generally dictated by control file • Final structure file • System properties log • Other run-time analysis outputs • HIGHLY VARIED FORMATING! • Quantitative analysis of output by scripting, MATLAB or Excel

  9. Through a Traditional Workflow • Output structure file may or may not be in a format which can be fed into visualization software • Many software options available: • VMD • Avogadro • POVray • VESTA • … • Analysis output may or may not be in a format which can be parsed by plotting software

  10. An Endless Series of Parsing Problems • Input file • Convert from something you can manipulate/generate to something the code can read • Output analysis • Typically requires writing new parsing routines • Different codes require re-writing scripts • Visualizations • May require extract data from other files manually • Most visualization code is already equipped to parse a variety of file types

  11. Data from Legacy Code • Locally developed molecular dynamics code, FLX • Trying to port data into another code, LAMMPS • Ctrl+C, Ctrl+V and lots of manual editing… • Very time consuming for each file

  12. Obstacles to Data Sharing and Reuse • Energy barrier of converting files formats • Example: A file downloaded directly from Protein Data Bank (.pdb) may not be readable by MD code (LAMMPS) • Extracting relevant quantities from available data sets • Parsing rules not always clear if unfamiliar with the format • Formats not always well documented

  13. Problem Statement • Too much redundant work • Too little documentation or code clarity • Too much time spent manipulating data formatting • How can we fix this?

  14. Our Approach: Interchangeable Libraries • We created a General Atomic System (GAS) class • All file read functions generate a GAS object • GAS objects are accepted by • Write file functions • Analysis functions • Manipulation functions

  15. Examining Existing Standards for Commonalities • Positions • Type • Number of atoms

  16. Examining Existing Standards for Commonalities • Positions • Type • Number of atoms

  17. Examining Existing Standards for Commonalities • Positions • Type • Number of atoms/ end of atoms section

  18. Creating a Common Data Structure • GAS class contains • System data • Internal functions • Trivial ontology • Simplicity in data structure is flexibility • Internal functions should be as reliable as possible • Obvious and explicit naming schemes

  19. Ontological Details

  20. User Time Savings • From read_data to xyz: timing comparisons • Manual copy-paste, eliminating excess columns: 2.15 minutes • Calling functions, including typing out calls: 1.05 minutes • Actual function timing:~6 seconds

  21. Aiding Analysis • With all data in standard structure: • Write all analysis based on this format • Input format independent • Allows reuse of analysis functions • Reuse begs for optimization • Intended reuse encourages documentation • Nested analyses now possible • Modularization saves: • Time • Effort • Error

  22. Traditional Scripting Problems • Scripts typically used for: • Quantitative analysis • Modifying files to be parsed by various software • Rewriting input/output handling for each script • MATLAB, sed, awk and grep are not the friendliest or fastest parsing tools • Lack of commenting • Can only be applied to specific file types or a single file

  23. Examples of Scripting 2.5 seconds

  24. The Python Version… • Once a function is written, can be called in just a few lines by ANY GAS system containing sufficient information 0.4 seconds

  25. CC BY-NC-SA http://www.flickr.com/photos/katieharbath/

  26. User Time Savings • Open source and custom function libraries instead of MATLAB allows for brute force parallelization, shifting of load to external resources • Faster run times: • 2.5 using bash versus 0.4 in Python • Faster coding times • Reuse of functions without additional modifications needed • Eliminating redundant coding efforts • Use of common language promotes code reusability • Writing code for “future” self as well as others

  27. Ways We’re Using GAS • Polymerization • Analyze pair-pair distances • Alter system topology • Automatically generate system readable file • Iterative system analysis • Quantitative analysis of a series of files • Radial distribution functions • Density profile • Bond length distributions • Automatically generates easily parsed output files • Automatic movie rendering

  28. Automatic Movie Rendering

  29. System Manipulation: Unwrapping Coordinates

  30. Moving Forward • More file formats • More advanced analysis methods and functions • Density functional theory support • Non-spherical particles • Collaboration with other groups • Better metadata integration

  31. Final Thoughts • Our lives are much better • Our code is much more consistent • Future users have a hope of understanding what we did • If you want people to use it, it needs to be USEFUL and EASY

More Related