360 likes | 438 Views
Data Wrangling. Managing data. Programs Scripts Data Documentation Text Images Movies. Programs and scripts. Reproducibility Code snapshots Documentation Archiving Version control RCS, Subversion Feature creep Expansion vs modification. Data management. Size matters
E N D
Data Wrangling IS&T Scientific Visualization Tutorial – Summer 2010
Managing data • Programs • Scripts • Data • Documentation • Text • Images • Movies IS&T Scientific Visualization Tutorial – Summer 2010
Programs and scripts • Reproducibility Code snapshots Documentation • Archiving • Version control RCS, Subversion • Feature creep Expansion vs modification IS&T Scientific Visualization Tutorial – Summer 2010
Data management • Size matters • What to keep? What is hard to reproduce Short vs long-term • Archiving SCF archive system Back up to external drive IS&T Scientific Visualization Tutorial – Summer 2010
Back to the pipeline Data Matlab VTK OpenGL Maya IDL Paraview OSG Photoshop Gnuplot DAFFIE Performer Premier Xmgrace Excel IS&T Scientific Visualization Tutorial – Summer 2010
Your data sci-vis package • Minimal conversion, i.e., keep basic structure • Headers • Reformatting • ASCII vs binary • Data type (int, single, double) • Endian-ness • Example – exporting from Matlab to VTK IS&T Scientific Visualization Tutorial – Summer 2010
Array layout • 2-D example, Matlab >> a(1,1) = 11; >> a(1,2) = 12; >> a(2,1) = 21; >> a(2,2) = 22; >> a a = 11 12 21 22 >> a1d = reshape(a,4,1) a1d = 11 21 12 22 IS&T Scientific Visualization Tutorial – Summer 2010
Array layout • 2-D example, C #include <stdio.h> main() { int m[2][2]; int *pm = m; int i; m[0][0] = 11; m[0][1] = 12; m[1][0] = 21; m[1][1] = 22; for (i=0; i<4; i++) printf("%d\n", pm[i]); } Output: 11 12 21 22 IS&T Scientific Visualization Tutorial – Summer 2010
Permuting dimensions • 2-D example, Matlab >> a a = 11 12 21 22 >> b = permute(a, [2,1]) b = 11 21 12 22 >> b1d = reshape(b,4,1) b1d = 11 12 21 22 IS&T Scientific Visualization Tutorial – Summer 2010
Endian-ness Big endian Little Endian IS&T Scientific Visualization Tutorial – Summer 2010
VTK legacy format • Example # vtk DataFile Version 3.0 output of gen_vtk_v3_loop.m BINARY DATASET STRUCTURED_POINTS ORIGIN 0.0 0.0 0.0 SPACING 1.0 1.0 1.0 DIMENSIONS 4 7 12 POINT_DATA 336 VECTORS v3 float @ @ IS&T Scientific Visualization Tutorial – Summer 2010
Writing out a VTK legacy file • Example using Matlab fprintf(fid, '# vtk DataFile Version 3.0\n'); fprintf(fid, 'output of gen_vtk_v3_loop.m\n'); fprintf(fid, 'BINARY\n'); fprintf(fid, 'DATASET STRUCTURED_POINTS\n'); fprintf(fid, 'ORIGIN 0.0 0.0 0.0\n'); fprintf(fid, 'SPACING 1.0 1.0 1.0\n'); fprintf(fid, 'DIMENSIONS %s %s %s\n', int2str(nx), int2str(ny), int2str(nz)); fprintf(fid, 'POINT_DATA %s\n', int2str(nx*ny*nz)); fprintf(fid, 'VECTORS %s float\n‘, varname); fwrite(fid, dv3, 'single'); fclose(fid); IS&T Scientific Visualization Tutorial – Summer 2010
VTK XML format <VTKFile type="ImageData" version="0.1" byte_order="LittleEndian"> <ImageData WholeExtent="0 128 0 32 0 32" Origin="0.0 0.0 0.0" Spacing="1.0 1.0 1.0"> <Piece Extent="0 128 0 32 0 32"> <PointData Vectors="velo"> <DataArray Name="velo" type="Float32" format="ascii” NumberOfComponents="3"> 0.0 8.2 69.2 0.0 1.2 68.8 ... 490.3 67.2 0.2 497.3 77.2 -0.l </DataArray> </PointData> </Piece> </ImageData> </VTKFile> IS&T Scientific Visualization Tutorial – Summer 2010
Larger picture IS&T Scientific Visualization Tutorial – Summer 2010
Example – molecular dynamics • Simulation creates data files • Molecule x,y,z + type • colored spheres (C program) • Electron density as volume data • isosurfaces (IDL) .obj files • Rendered in Maya IS&T Scientific Visualization Tutorial – Summer 2010
Problem statement • Atoms File with x,y,z,Atom type (over time) • Electron density File containing volume data (over time) • Desired output, animation of Atoms as colored balls Electron density as isosurfaces IS&T Scientific Visualization Tutorial – Summer 2010
Decisions • Final display program Find an off-the-shelf solution? Write an OpenGL program? Produce models for generic display software? • How to represent the geometry Colored spheres? Colored isosurfaces? • How to get from input data to this representation • Electron density IS&T Scientific Visualization Tutorial – Summer 2010
Digging down - geometry • Spheres, program: void drawSphere(x, y, z, r, nlat, nlong) { for (i=0; i<nlat; i++) { s0 = sin(PI*(i/nlat)); c0 = cos(PI*(i/nlat)); s1 = sin(PI*((i+1)/nlat)); c1 = cos(PI*((i+1)/nlat)); glBegin(GL_QUAD_STRIP); for (j=0; j<=nlong; j++) { c2 = cos(2*PI*j)/nlong); s2 = sin(2*PI*j)/nlong); glNormal3f(c2*c0, s2*c0, s0); glVertex3f(x+r*c2*c0, y+r*s2*c0, z+r*s0); glNormal3f(c2*c1, s2*c1, s1); glVertex3f(x+r*c2*c1, y+r*s2*c1, z+r*s1); } glEnd(); } } IS&T Scientific Visualization Tutorial – Summer 2010
Digging down - geometry # OBJ file: sphere.obj # nvert = 512 # nface = 128 v 0.05257 0 -8.5065 v 0.05257 0 8.5065 v -0.05257 0 8.5065 ... f 1 2 4 3 f 3 4 6 5 f 5 6 8 7 ... IS&T Scientific Visualization Tutorial – Summer 2010
Surfaces: polygonal representation IS&T Scientific Visualization Tutorial – Summer 2010
Digging down - geometry • Sometimes special types of geometry #Inventor V2.1 ascii … DEF O_mat Material { ambientColor 0.05 0.20 0.40 diffuseColor 0.05 0.20 0.50 specularColor 0.05 0.20 0.20 shininess 0.20 } DEF atom_1187 Separator { USE O_mat Translation { translation -40.0 -60.0 0.0 } Sphere { radius 2.5 } } … IS&T Scientific Visualization Tutorial – Summer 2010
And the isosurface v[0] = ( 0.52, 1.01, 9.50) v[1] = ( 0.57, 0.99, 8.11) v[2] = (-0.67, 0.43, 7.23) ... f[0] = {1, 2, 4} f[1] = {3, 4, 6, 5} f[2] = {5, 6, 8, 7} ... IS&T Scientific Visualization Tutorial – Summer 2010
A variety of data structures for cells IS&T Scientific Visualization Tutorial – Summer 2010
3D file formats • What they represent • How they represent it • What software can read it • What software can write it • How complex is it • Human readable • ASCII vs binary • Proprietary vs open source • Cost IS&T Scientific Visualization Tutorial – Summer 2010
3D file formats - continuum • Simplest: explicit points, lines, planes, patches • Add color information, texture maps, bump maps • More complex: scene graph including lights, etc • Cutover to programmatic paradigm • Conversions may not preserve all features IS&T Scientific Visualization Tutorial – Summer 2010
.obj files • Materials file • List of materials by name • Contain surface reflectance properties • Contain names of texture (image) files • Vertex list • v x y z • Normals list • n x y z • Texture coordinate list • t u v IS&T Scientific Visualization Tutorial – Summer 2010
.obj files • Faces as vertex lists • f v1 v2 v3 … • f v1/vt1 v2/vt2 v3/vt3 ... • f v1/vt1/vn1 v2/vt2/vn2 v3/vt3/vn3 ... • v1//vn1 v2//vn2 v3//vn3 .. IS&T Scientific Visualization Tutorial – Summer 2010
(0,1) (1,1) (0,0) (1,0) .obj file example mtllib ./alien.mtl v 0.0 0.0 0.0 v 1.0 0.0 0.0 v 1.3 0.6 0.0 v 1.0 1.0 0.0 v 0.0 1.0 0.0 v 2.0 -0.3 0.0 v 2.3 0.7 0.0 vn 0.0 0.0 1.0 vt 0.0 0.0 vt 0.5 0.0 vt 0.5 0.6 vt 0.5 1.0 vt 0.0 1.0 vt 1.0 0.0 vt 1.0 0.5 usemtl alien f 1/1/1 2/2/1 3/3/1 4/4/1 5/5/1 f 3/3/1 2/2/1 6/6/1 7/7/1 v5 v4 v3 v7 v1 v2 v6 IS&T Scientific Visualization Tutorial – Summer 2010
Tools for 3D format conversion • VTK import and export (freely available) • Okino nugraf (1 copy in CGL) • Roll-your-own • Meshlab (not tried) IS&T Scientific Visualization Tutorial – Summer 2010
Example – data wrangling flow • For each time step: • atoms file -> obj file (dw) • E density file -> volume file (dw) • volume -> isosurface data file (IDL) • isosurface data file -> obj file (dw) • obj files -> tiff image file (Maya) • tiff file -> png file (ImageMagick • Collect image files into movie (Premiere) IS&T Scientific Visualization Tutorial – Summer 2010
Work flow with lots of data • Cannot fit whole experiment in running program • Cannot fit all data onto disk • Requires staging / tracking dependencies • Requires deleting intermediate data • Requires queue management and etiquette IS&T Scientific Visualization Tutorial – Summer 2010
Example workflow –pressure on turbine • For each time step • Simulation produces plot3d file • plot3d -> obj file with color as texture (dw) • obj file -> tiff image file (Maya) • tiff image file -> ppm file (imagemagick) • ppm file -> DVD wall (SCV movie player) IS&T Scientific Visualization Tutorial – Summer 2010
Conclusion - Tips • Don’t get consumed by tangents • Like, learning a new language • Or, developing a general matrix library • Use existing formats and software when possible • But don’t let them take over • Simple and open source • Make careful choices re ASCII vs binary • Take snapshots and make backups • Document, document, document IS&T Scientific Visualization Tutorial – Summer 2010
The end • Erik Brisson: ebrisson@bu.edu • Tutorial powerpoint slides: • http://www.bu.edu/tech/research/training/presentations/list/ IS&T Scientific Visualization Tutorial – Summer 2010