260 likes | 379 Views
David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009. BOINC The Year in Review. Volunteer computing. Throughput is now 10 PetaFLOPS mostly Folding@home Volunteer population is constant 330K BOINC, 200K F@h Volunteer computing still unknown in HPC world
E N D
David P. Anderson Space Sciences Laboratory U.C. Berkeley 22 Oct 2009 BOINCThe Year in Review
Volunteer computing • Throughput is now 10 PetaFLOPS • mostly Folding@home • Volunteer population is constant • 330K BOINC, 200K F@h • Volunteer computing still unknown in • HPC world • scientific computing world • general public
ExaFLOPS • Current PetaFLOPS breakdown: • Potential: ExaFLOPS by 2010 • 4M GPUs * 1 TFLOPS * 0.25 availability
Projects • No significant new academic projects • but signs of life in Asia • No new umbrella projects • AQUA@home: D-Wave systems • Several hobbyist projects
BOINC funding • Funded into 2011 • New NSF proposal
Facebook apps • Progress thru Processors (Intel/GridRepublic) • Web-only registration process • lots of fans, not so many participants • BOINC Milestones • IBM WCG
Research • Host characterization • Scheduling policy analysis • EmBOINC: project emulator • Distributed applications • Volpex • Apps in VMs • Volunteer motivation study
Fundamental changes • App versions now have dynamically-determined processor usage attributes (#CPUs, #GPUs) • Server can have multiple app versions per (app, platform) pair • Client can have multiple versions per app • An issued job is linked to an app version
Scheduler request • Old (CPU only) • requested # seconds • current queue length • New: for each resource type (CPU, NVIDIA, ...) • requested # seconds • current high-priority queue length • # of idle instances
Schedule reply • Application versions include • resource usage (# CPUs, # GPUs) • FLOPS estimate • Jobs specify an app version • A given reply can include both CPU and GPU jobs for a given application
Client: work fetch policy • When? From which project? How much? • Goals • maintain enough work • minimize scheduler requests • honor resource shares • per-project “debt” CPU 0 CPU 1 CPU 2 CPU 3 max min
Work fetch for GPUs: goals • Queue work separately for different resource types • Resource shares apply to aggregate Example: projects A, B have same resource share A has CPU and GPU jobs, B has only GPU jobs GPU A B CPU A
Work fetch for GPUs • For each resource type • per-project backoff • per-project debt • accumulate only while not backed off • A project’s overall debt is weighted average of resource debts • Get work from project with highest overall debt
Client: job scheduling • GPU job scheduling • client allocates GPUs • GPU prefs • Multi-thread job scheduling • handle a mix of single-, multi-thread jobs • don’t overcommit CPUs
GPU odds and ends • Default install is non-service • Dealing with sporadic usability • e.g. Remote Desktop • Multiple non-identical GPUs • GPUs and anonymous platform
Other client changes • Proxy auto-detection • Exclusive app feature • Don’t write state file on each checkpoint
Screensaver • Screensaver coordinator • configurable • New default screensaver • Intel screensaver
Scheduler/feeder • Handle multiple app versions per platform • Handle requests for multiple resources • app selection • completion estimate, deadline check • Show specific messages to users • “no work because you need driver version N” • Project-customized job check • jobs need different # of GPU processors • Mixed locality and non-locality scheduling
Server • Automated DB update • Protect admin web interface
Manager • Terms of use feature • Show only projects supporting platform • need to extend for GPUs • Advanced view is keyboard navigable • Manager can read cookies (Firefox, IE) • web-only install
Apps • Enhanced wrapper • checkpointing, fraction done • PyMW: master/worker Python system
Community contributions • Pootle-based translation system • projects can use this • Testing • alpha test project • Packaging • Linux client, server packages • Programming • lots of flames, little code
What didn’t get done • Replace runtime system • Installer: deal with “standby after X minutes” • Global shutdown switch
Things on hold • BOINC on mobile devices • Replace Simple GUI
Important things to do • New system for credit and runtime estimation • we have a design! • Keep track of GPU availability separately • Steer computers with GPUs towards projects with GPU apps • Sample CUDA app
BOINC development • Let us know if you want something • If you make changes of general utility: • document them • add them to trunk