1 / 67

The ISDA Tools Computationally Scalable File Migration Services to Keep Your Files Current

The ISDA Tools Computationally Scalable File Migration Services to Keep Your Files Current. Kenton McHenry Rob Kooper Luigi Marini Michael Ondrejcek. The Problem. The abundance of file formats is a problem when preserving electronic records Why?

julianb
Download Presentation

The ISDA Tools Computationally Scalable File Migration Services to Keep Your Files Current

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The ISDA Tools Computationally Scalable File Migration Services to Keep Your Files Current Kenton McHenry Rob Kooper Luigi Marini Michael Ondrejcek

  2. The Problem • The abundance of file formats is a problem when preserving electronic records • Why? • Will there be software to load the file in the future? • If not will the specification for the format still exist? • Was the specification ever available to begin with (closed/proprietary formats)?

  3. *.pdf (*.prc, *.u3d) *.k3d *.ma, *.mb, *.mp *.w3d *.lwo *.blend *.iam *.max, *.3ds *.c4d *.dwg *.vtk, *.vtp *.skp

  4. Available 3D File Formats…

  5. Converting Formats • In order to preserve content for future use one option is to convert the file to an open/standardized format that is likely to be supported for some time. • Store both this file and the original for provenance • Ideally with one file format for a particular content type it will be easy for users to view/use the data.

  6. NCSA Polyglot (2009) • Conversions service based on utilizing any and all available 3rd party software • Imposed Code Reuse: Re-attaching a programmable interface to compiled software. • Scripted operations within software • GUI scripting (e.g. AutoHotKey) • Created a simple workflow referred to as an Input/Output Graph • Compared files before/after conversion to measure information loss • Distributed across multiple machines • Web access

  7. ISDA File Migration Tools • Conversion Software Registry • Software Servers • Polyglot • Versus

  8. Software that can Convert between Formats • There is a lot of software available, each with its own unique capabilities • A lot of it is not free • It would be expensive to buy a package just to check if it truly is capable of converting between a desired pair of formats • How can someone know what software to get for their needs? http://isda.ncsa.illinois.edu/NARA/CSR

  9. Adobe 3D Reviewer The Conversion Software Registry (Tool #1)

  10. The Conversion Software Registry (Tool #1)

  11. The Conversion Software Registry (Tool #1)

  12. Adobe 3D Reviewer Input/Output Graphs

  13. Input/Output Graphs 3DS Max Adobe 3D Reviewer AutoCAD Blender Cinema 4D K-3D LightWave 3D Maya Wings 3D

  14. Input/Output Graphs Shortest conversion path

  15. The CSR: I/O-Graphs

  16. The CSR: I/O-Graphs

  17. The CSR: I/O-Graphs

  18. The CSR: Searching for Software

  19. The CSR: File Formats

  20. CSR: Adding Software

  21. Software Servers (Tool #2) • Imposed Code Reuse: The process of attaching an API like interface to software so that its functionality can be called within new code.

  22. Software Servers (Tool #2) • Shares the functionality of software over the web • In contrast to services which share data: ftpd, nfsd, sambad, httpd • Similar to services such as: telnetd, sshd, VNC, rdesktop • The main difference is in the interface: • Uniform across all software http://host:8182/software/<Application>/<Task>/<Output Format>/<InputFile> • Simple • Widely accessible • Capable of being programmed against • Allows any desktop application to become a cloud based web service*

  23. Software Functionality Sharing

  24. Software Functionality Sharing

  25. Software Functionality Sharing

  26. Software Functionality Sharing

  27. Software Functionality Sharing

  28. Software Functionality Sharing

  29. Software Functionality Sharing

  30. Software Functionality Sharing

  31. Software Functionality Sharing

  32. Software Functionality Sharing

  33. Software Functionality Sharing

  34. Software Functionality Sharing

  35. Software Functionality Sharing #!/bin/bash host="http://141.142.224.231:8182" application="A3DReviewer" task="convert" output="igs" input="stp" url=$host/software/$application/$task/$output for input_file in `ls *.$input` ; do output_url=`curl -s -H "Accept:text/plain" -F "file=@$input_file" $url` output_file=${input_file%.*}.$output echo "Converting: $input_file to $output_file" while : ; do wget -q -O $output_file $output_url if [ ${?} -eq 0 ] ; then break fi sleep 1 done done

  36. Software Server Robustness • Software: • 3D Studio Max, Adobe 3D Reviewer, Blender, Google Sketchup, ImageMagick, IrfanView, Microsoft Paint, Microsoft Word, ParaView, VTK • Measure throughput of software on a software server • TRY TO MAKE IT FAIL!!! • Results: • Ideal case: 1395 tasks/hour on a 1 core 1GB VM with an average wait of 4.42 s. • In a less than ideal case: 945 tasks/hour with an average wait of 11.17 s. • Server did not crash!

  37. Software Server Robustness • We are using GUI based software! • Consider command line software as baseline: • ImageMagick: 1871 tasks/hour • IrfanView: 3163 • vs GUI software: • 3DS Max: 355 tasks/hour • Microsoft Word: 756 tasks/hour • How many people would it take using this software for the same throughput?

  38. Polyglot (Tool #3) • Listens for Software Server broadcasts on the network • Catalogues available input/output operations and constructs and I/O-graph • Identifies conversion paths between input and output formats • Carries out CHAINED conversions

  39. Polyglot (Tool #3)

  40. Polyglot (Tool #3)

  41. Polyglot (Tool #3)

  42. Polyglot (Tool #3)

  43. Polyglot (Tool #3)

  44. Versus (Tool #4) • Java library/framework for comparing file content • Under development: • Framework/API designed • Distributed architecture • RESTful Web Interface • http://<host>/versus/comparisons • dataset1, dataset2 • adapter, extractor, measure • Adding extractors, measures

  45. Which conversion preserved the most? • Using the light fields measure: • Emphasizes shape through silhouettes • Adobe 3D Reviewer between *.pdf and *.stp (61.67) • Using the spin image measure: • Emphasize shape through relative vertex positions • Adobe 3D Reviewer between *.obj and *.pdf (59.07)

  46. Which is the best format?Within the context of preservation we can define this as the format that retains on average the most information when converted to by other formats. • Using the light fields measure: • Emphasizes shape through silhouettes • *.stp (40.73) • Using the spin image measure: • Emphasizes shape through relative vertex positions • *.stl (34.89) • *.stp being a CAD format has more variability in vertex positions due to tessellation

  47. ISDA Tools • Conversion Software Registry • Software Servers • Polyglot • Versus • 3D Utilities • Image Utilities • CyberIntegrator

  48. Acknowledgements • The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the National Science Foundation, the National Archive and Records Administration, or the U.S. government. This research was partially supported by a National Archive and Records Administration (NARA) supplement to NSF PACI cooperative agreement CA #SCI-9619019 and by NCSA Industrial Partners.  Imaginations unbound

  49. The ISDA Tools (Free and Open Source) Image, Spatial, and Data Analysis Group http://isda.ncsa.illinois.edu Kenton McHenry Rob Kooper Michal Ondrejcek Luigi Marini

  50. End

More Related