230 likes | 357 Views
s 1. s 0. f. new frontiers in formal software verification. Gerard J. Holzmann gholzmann@acm.org. 23 spacecraft and 10 science instruments. Exploring the Solar System. model checking. s tatic analysis. JASON 2. KEPLER. DAWN. WISE. INSTRUMENTS. CLOUDSAT. Earth Science • ASTER
E N D
s1 s0 f new frontiers informal software verification Gerard J. Holzmann gholzmann@acm.org
23 spacecraft and 10 science instruments Exploring the Solar System model checking static analysis JASON 2 KEPLER DAWN WISE INSTRUMENTS CLOUDSAT Earth Science • ASTER • MISR • TES • MLS • AIRS Planetary • MIRO • Diviner • MARSIS Astrophysics • Herschel • Planck AQUARIUS MRO JUNO VOYAGER 2 VOYAGER 1 CASSINI EPOXI/ DEEP IMPACT STARDUST ACRIMSAT MARS ODYSSEY GRAIL SPITZER JASON 1 GRACE OPPORTUNITY MSL GALEX
formal software verification • after some ~30 years of development, is still rarely used on industrial software • primary reasons: • it is (perceived to be) too difficult • it is takes too long(months to years) • even in safety critical applications, software verification is often restricted to the verification of models of software, instead of software • goal: • make software verification as simple as testing and as fast as compilation
verification of the PathStar switch(flashback to 1999) • a commercial data/phone switch designed in Bell Labs research (for Lucent Technologies) • newly written code for the core call processing engine • the first commercial call processing code that was formally verified • 20 versions of the code were verified with model checking during development 1998-2000 with a fully automated procedure
software structure basic call processing plus a long list of features (call waiting, call forwarding, three-way calling, etc., etc.) PathStarCode (C) call processing (~30KLOC) traditional hurdles: feature interaction feature breakage concurrency problems race conditions deadlock scenarios non-compliance with legal requirements, etc. ~10% call processing control kernel
complex feature precedence relations detecting undesired feature interaction is a serious problem
@ the verification was automated in 5 steps code 2 abstraction map 3 context PathStar call processing 1 4 feature requirements Spin msc 5 bug reports property violations
dial else Crdtmf dial1 Crconn else Cronhook error dial2 1. code conversion ... @dial: switch(op) { default: /* unexpected input */ goto error; case Crdtmf: /* digit collector ready */ x->drv->progress(x, Tdial); time = MSEC(16000); /* set timer */ @: switch(op) { default: /* unexpected input */ goto error; case Crconn: goto B@1b; case Cronhook: /* caller hangs up */ x->drv->disconnect(x); @: if(op!=Crconn && op!=Crdis) goto Aidle; ...etc... PathStar implied state machine: a state change a control state
2. defining abstractions • to verify the code we convert it into an automaton: a labeled transition system • the labels (transitions) are the basic statements from C • each statement can be converted via an abstraction – which is encoded as a lookup table that supports three possible conversions : • relevant: keep (~ 60%) • partially relevant: map/abstract (~ 10%) • irrelevant to the requirements: hide (~ 30%) • the generated transition system is then checked against the formal requirements (in linear temporal logic) with the model checker • the program and the negated requirements are converted into -automata, and the model checker computes their intersection
3. defining the context C code bug reporting model extraction map environment model Spin model database of feature requirements
4. defining feature requirements a sample property: “always when the subscriber goes offhook, a dialtone is generated” failure to satisfy this requirement: <> eventually, the subscriber goes offhook /\ and X thereafter, no dialtone is U generated until the next onhook. -automaton in Linear Temporal Logic this is written: <> (offhook /\ X ( !dialtone U onhook)) mechanical conversion
5. verification C code LTL requirement logical negation environment model abstraction map model extractor bug report * no dialtone generated
hardware support (1999) client/server sockets code scripts
t=40 min. t=15 min. iterative search refinement t=5 min. each verification task is run multiple times, with increasing accuracy
performance (1999) “bugs per minute” 15 bug reports in 3 minutes 25% of all bugs 50% of all bugs reported in 8 minutes 100 (60) percent of bugs reported (number of bugs reported) 75 (50) 50 (30) 25 (15) first bug report in 2 minutes 10 20 30 40 minutes since start of check
that was 1999, can we do better now? 1999 2012 32-core off-the-shelf system, running standard Ubuntu Linux (~ $4K USD) 2.5 GHz clockspeed (~80GHz equiv) 64 Gbyte of shared RAM • 16 networked computers running the plan9 operating system • 500 MHz clockspeed (~8GHz equiv) • 16x128 Mbyte of RAM (~2GB equiv) • difference: • approx. 10x faster, and 32x more RAM • does this change the usefulness of the approach?
performance in 2012:“bugs persecond” number of bugs found 32-core PC, 64 GB RAM 2.5GHz per core Ubuntu Linux (2012) 50% of all bugs in 7 seconds (38 bugs) (1999) 11 bug reports after 1 second 16 PCs, 128 MB per PC 500MHz Plan9 OS 10 seconds 10 min number of seconds since start of check
side-by-side comparison 1999 2012 15% of all bugs reported in 1 second (11 bugs) 50% of all bugs reported in 7 seconds (38 bugs) 1 desktop PC • 25% of all bugs reported in 180 seconds (15 bugs) • 50% of all bugs reported in 480seconds (30 bugs) • 16 CPU networked system
generalization: swarm verification • goal: leverage the availability of large numbers of CPUs and/or cpu-cores • if an application is too large to verify exhaustively, we can define a swarm of verifiers that each tries to check a randomly different part of the code • using different hash-polynomials in bitstate hashing and different numbers of polynomials • using different search algorithms and/or search orders • use iterative search refinement to dramatically speedup error reporting (“bugs per second”)
swarm configuration file: # range k 1 4 # min and max nr of hash functions # limits depth 10000 # max search depth cpus 128 # nr available cpus memory 64MB # max memory to be used; recognizes MB,GB time 1h # max time to be used; h=hr, m=min, s=sec vector 512 # bytes per state, used for estimates speed 250000 # states per second processed file model.pml # the spin model # compilation options (each line defines a search mode) -DBITSTATE # standard dfs -DBITSTATE -DREVERSE # reversed process ordering -DBITSTATE -DT_REVERSE # reversed transition ordering -DBITSTATE –DP_RAND # randomized process ordering -DBITSTATE –DT_RAND # randomized transition ordering -DBITSTATE –DP_RAND –DT_RAND # both # runtime options -c1 -x -n spin front-endhttp://spinroot.com/swarm/ $ swarm –F config.lib –c6 > script swarm: 456 runs, avg time per cpu 3599.2 sec $ sh ./script the user specifies: # cpus to use mem / cpu maximum time
many small jobs do the work of one large job – but much faster and in less memory 100% coverage states reached swarm search 100% coverage with a swarm of 100using 0.06% of RAM each (8MB) compared to a single exhaustive run (13GB) linear scaling #processes(log) (DEOS O/S model)
tools mentioned: thankyou! • Spin: http://spinroot.com (1989) • parallel depth-first search added 2007 • parallel breadth-first search added 2012 • Modex: http://spinroot.com/modex (1999) • Swarm: http://spinroot.com/swarm (2008)