230 likes | 351 Views
Processing 2.5 Terapixels of the Sky in 2 Days. George Fekete, JHU. DR7 Visual Images. DR7 Visual Images. 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels. DR7 Visual Images.
E N D
Processing 2.5 Terapixelsof the Sky in 2 Days George Fekete, JHU
DR7 Visual Images 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels
DR7 Visual Images 1,393 rows 1,984 columns 427,853 fields 1,184,462,470,336 pixels 3,553,387,411,008 3 bands of FITS pixels
Goals • Pretty images • in the eyes of the beholder • Ease of manipulation • store pixels in DB • on demand cutout and mosaic initiated inside the DB • store one entire colour image in < ¾ MB • uncompressed TGA is 8M, good jpeg is 2 ¼ MB • Important preconditions to good compressibility • background should have little or no salt and pepper noise • choose a good despeckler
Two Distinct Despecklers Which seems better?
Same Two ― Laplacian What about now?
Same Two ― Laplacian WINNER! Magick Photoshop
Process Raw Color Images • Despeckle • better visual experience • better compressibility • Photoshop (!) • has best despeckling filter we found • can do jpeg 2000 codec • can do all other necessary tasks • jpeg2000? • compresses better than jpeg • produces fewer undesirable visual artifacts • j2k is 28% of jpeg or 8% of TGA
What's The Big Deal? • 500,000 images in 24 hours? doesn't seem like a lot especially if you can use a thousand processor cluster. • 2 Step process • FITS to TGA (formerly fits2jpeg) • been there, done that • about 2s per field (without optimzation) • Use Photoshop • (cont...)
What's The Big Deal? • Tasks for Photoshop • open a TGA • add a little noise cleaning • apply despeckle filter • save as jpeg2000 • reduce size by ½ to make ½ size image • save as jpeg2000 • reduce again to make ¼ size image • save as jpeg2000 • reduce again to make 1/8 size image • adjust contrast and brightness for small thumbnail • save as jpeg2000 • delete TGA • relese all resources • Do this about 500,000 times robustly
Unsupervised Photoshoping • NECESSARY • Photoshop runs under Windows XP • Windows XP runs under qemu (virtual PC thing) • qemu runs the Linux cluster (HHPC) • Photoshop can be controlled by a custom .net application • Therefore ... photoshop runs on the linux cluster • SUFFICIENT • qemu /WinXP can see the file system • qemu/WinXP/Photoshop can run without a phyisical display • Therefore it is doable
Flow FITS FITS to TGA TGA TGA to j2k j2k
Two Steps Decoupled FITS FITS to TGA TGA Runs asynchronously. Available resources can be added or removed any time TGA to j2k j2k
FITS to TGA jobtable skydev/skyfits WS TGA TGA FITS to TGA FITS
FITS to TGA jobtable skydev/skyfits WS TGA TGA FITS to TGA FITS
TGA to jpeg2000 jobtable skydev/skyfits WS TGA TGA to j2k j2k
Image generation workflow jobtable skydev/skyfits WS TGA TGAPoller .netapp controlsPhotoshopthrough exposedmethods Photoshop j2k
Image generation workflow skydev/skyfits WS jobtable edges node(s) TGAServer TGA TGAPoller Photoshop work nodes j2k shared file system
"Scheduler" is a DB jobtable jobid, run, rerun, camcol, field, status (ready, working, done) TGA path, output directory, nodeid, grabbed(timestamp), finished(timestamp)
Framework • HHPC Cluster • 154 nodes, 1232 processors • PBS job submission • Linux • Windows + Photoshop is run as a qemu job • One time: make a C: disk image, install qemu • All processors use same C: disk image • Each instance of qemu runs in snapshot mode • C: read-only • incremental change to disk image cached locally • can kill qemu instead of gracefull shutdown (PBS proof) • qemu runs without a display window pixels are in /dev/null
Performance for DR7 images • 427,853 fields/job • 140 seconds total per job (measured) • fits to TGA 2s • TGA to j2k 136s • 5,989,940s = 693 days (one processor) • 0.56 day (1232 processors + leap of faith) • add 60% fudge factor penalty • Still does it in a day, with two hours to spare