1 / 27

Optimization Issues for Huge Datasets and Long Computation

Optimization Issues for Huge Datasets and Long Computation. Michael Ferris University of Wisconsin, Computer Sciences ferris@cs.wisc.edu Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta Voelker. Update on Gamma Knife. In use at U. Maryland Hospitals

jaguar
Download Presentation

Optimization Issues for Huge Datasets and Long Computation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimization Issues for Huge Datasets and Long Computation Michael Ferris University of Wisconsin, Computer Sciences ferris@cs.wisc.edu Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta Voelker

  2. Update on Gamma Knife • In use at U. Maryland Hospitals • Covered by Business Week (Apr 2001) • Better models, faster solution • Requires less user input • Skeletonization is key improvement

  3. a. Target area b. A single line skeleton of an image 10 10 20 20 30 30 40 40 10 20 30 40 10 20 30 40 c. 8 initial shots are identified d. An optimal solution: 8 shots 2 10 1.5 20 1 30 0.5 40 10 20 30 40 1-4mm, 2-8mm, 5-14mm 1-4mm, 2-8mm, 5-14mm Skeleton Starting Points 10 20 30 40 50 10 20 30 40 50

  4. Run Time Comparison

  5. Data Mining & Optimization Prediction, Categorization, Separation Equations, LP, QP, MIP, NLP Serial, Parallel, Condor GAMS, Matlab, so/dll

  6. Global Exact Constrained Stochastic Large scale Fast convergence CPU + Memory + Smarts Local Approximate Unconstrained Deterministic Small scale Termination Optimization

  7. MIP formulation minimize cTx subject to Ax  b l  x  u and some xj integer Problems are specified by application convenient format - GAMS, AMPL, or MPS

  8. Data delivery: pay-per-view • Optimization model for regional caches: minimize: Cremote+ PCregional over all possible cached objects/segments subject to: • Cregional Nchannels • regional storage Nsegments • regional server stores 0, k or K segments of each object • MIP (large number of objects/segments)

  9. The “Seymour Problem” • Set covering problem used in proof of four color theorem • CPLEX 6.0 and Condor (2 option files) • Running since June 23, 1999 • Currently >590 days CPU time per job • (13 million nodes; 2.4 million nodes)

  10. FAT COP • FAT - large # of processors • opportunistic environment (Condor) • COP - Master Worker control • fault tolerant: task exit, host suspend • portable parallel programming • Mixed Integer Program Solver • Branch and Bound: LP relaxations • MPS file, AMPL or GAMS input

  11. CPLEX OSL SOPLEX MINOS ... FATCOP Internet Protocol Condor-PVM PVM MW GAMS AMPL MPS Application Problem LPSOLVER INTERFACE

  12. MIP Technology • Each task is a subtree, time limit • Diving heuristic • Cutting planes (global) • Pseudocosts • Preprocessing • Master checkpoint • Worker has state, how to share info?

  13. FATCOP Daily Log Note machine reboot at approx 3:00 am (night)

  14. Back to Seymour • Schmieta, Pataki, Linderoth and MCF • explored to depth 8 in tree • applied cuts at each of these 256 nodes • solved in parallel, using whatever resources available (CPLEX, FATCOP,...) • Problem solved with over 1 year CPU • over 10 million nodes, 11,000 hours

  15. Seymour Node 319 • FATCOP • 47.0 hrs with 2,887,808 nodes • average number of machine used is 108 • CPLEX • 12 days, 10 hrs with 356,600 nodes • single machine, clique cuts useful

  16. Large datasets • Enormous computational resources can sometimes facilitate solution • X-validation, slice modeling • What about the data? • In particular, what if the problem does not fit in core?

  17. Definition: Example: Example: Componentwise definition: NCP functions

  18. Semismooth results

  19. How can you use these? • Specialized codes • Asynchronous I/O • Specialized platforms • Condor (executable per architecture) • Specific input formats • GAMS, Matlab • Handholding operation

  20. Model centric toolbox Specialized input Data warehouse GAMS optimization model Solvers LP,QP,MIP, NLP,MINLP Model data exchange Matlab programming environment Other model formats gms2xx Condor Resource Manager

More Related