560 likes | 727 Views
Model Checking with User-Definable Abstraction for PGAS Languages. Tatsuya Abe, Toshiyuki Maeda , Mitsuhisa Sato RIKEN AICS. What is model checking?. One approach of program verification which tries to prove a certain given property of a given program E.g. Memory safety Race freedom
E N D
Model Checkingwith User-Definable Abstractionfor PGAS Languages Tatsuya Abe, Toshiyuki Maeda, Mitsuhisa Sato RIKEN AICS
What is model checking? • One approach of program verificationwhich tries to prove a certain given propertyof a given program • E.g. • Memory safety • Race freedom • Deadlock freedom • …
A very simple example of model checking Target program Initial state Target property x := 1; y := 1; z := 1; x = 0 y = 0 z = 0 x >= y >= z
A very simple example of model checking Target program Initial state Target property x := 1; y := 1; z := 1; x = 0 y = 0 z = 0 x >= y >= z x = 1 y = 0 z = 0
A very simple example of model checking Target program Initial state Target property x := 1; y := 1; z := 1; x = 0 y = 0 z = 0 x >= y >= z x = 1 y = 0 z = 0 x = 1 y = 1 z = 0
A very simple example of model checking Target program Initial state Target property x := 1; y := 1; z := 1; x = 0 y = 0 z = 0 x >= y >= z x = 1 y = 0 z = 0 x = 1 y = 1 z = 0 x = 1 y = 1 z = 1
A very simple example of model checking Target program Initial state Target property x := 1; y := 1; z := 1; x = 0 y = 0 z = 0 x >= y >= z x = 1 y = 0 z = 0 x = 1 y = 1 z = 0 x = 1 y = 1 z = 1 The target property is always hold
Big problem of model checkingPGAS languages • The state explosion problem = The number of states to be explored increases dramatically
Why does the state explosion occur? • PGAS languages areconcurrent and shared memory languages • The number of states increasesexponentially/combinatorially • with respect to the size of programs andthe number of processes increase
An example of the state explosion Initial state Target property Target program Proc. 1 x = 0 y = 0 z = 0 x >= y >= z x := 1; Proc. 2 y := 1; Proc. 3 z := 1;
An example of the state explosion Target property Target program Initial state Proc. 1 x >= y >= z x = 0 y = 0 z = 0 x = 1 y = 0 z = 0 x := 1; Proc. 2 y := 1; Proc. 3 z := 1;
An example of the state explosion Target property Target program Initial state Proc. 1 x >= y >= z x = 0 y = 0 z = 0 x = 1 y = 0 z = 0 x := 1; Proc. 2 y := 1; Proc. 3 z := 1; x = 0 y = 1 z = 0
An example of the state explosion x = 0 y = 0 z = 1 Target property Target program Initial state Proc. 1 x >= y >= z x = 0 y = 0 z = 0 x = 1 y = 0 z = 0 x := 1; Proc. 2 y := 1; Proc. 3 z := 1; x = 0 y = 1 z = 0
An example of the state explosion x = 0 y = 0 z = 1 x = 1 y = 0 z = 1 Target property Target program Initial state Proc. 1 x >= y >= z x = 0 y = 0 z = 0 x = 1 y = 0 z = 0 x := 1; Proc. 2 y := 1; x = 0 y = 1 z = 1 x = 1 y = 1 z = 1 Proc. 3 z := 1; x = 0 y = 1 z = 0 x = 1 y = 1 z = 0
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Consider the following stencil code: for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code for (i = 1; i < 9; i++) { … t’[i] = (t[i - 1] + t[i] + t[i + 1]) / 3.0 … } 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code 1 3 0 4 7 6 8 2 5 9
A (slightly) more realistic example • Now, further consider parallelizing the code A race condition may occur without proper synchronization 1 3 0 4 7 6 8 2 5 9
The state explosion in model checking race freedom in the previous example Number of state transitions Size of subarrays assigned to each thread
The state explosion in model checking race freedom in the previous example Number of state transitions Number of states increase exponentially Size of subarrays assigned to each thread
One solution to the state explosion • Program Abstraction • Extract and/or translate part of a programrelated to the properties to be verified • Good abstraction can dramatically reducethe number of states to be explored
Example of abstraction • Consider the previous (parallelized) example, again 1 3 0 4 7 6 8 2 5 9
Example of abstraction A race condition is possible only in the following configuration • Consider the previous (parallelized) example, again 1 3 0 4 7 6 8 2 5 9
Example of abstraction A race condition is possible only in the following configuration • Consider the previous (parallelized) example, again 1 3 0 4 7 6 8 2 5 9 Therefore, we can ignore these elements
Effect of the abstraction Number of state transitions Size of subarrays assigned to each thread
Effect of the abstraction Number of state transitions Number of states remains constant Size of subarrays assigned to each thread
Conventional abstraction approachesand their problems • Automatic abstraction • Automatically infers“good” abstractions • Problem • There are a lot of works,but there is no silver bullet • It sometimes fails to infera proper abstractionwhich is apparent tousers • Manual abstraction • Let users specify abstractions by hand • Problem • Flexible compared to automatic abstraction,but hard to specify themconcisely/correctly
Example of a previous manual abstraction:UPC-SPIN [Ebnenasir 2011] • UPC-SPIN is a model checkerfor UPC (Unified Parallel C) • Users can specify their own abstractionsas text-based pattern matching and rewriting rules
User-defined abstraction 1: ... 2: bit lk[15]; 3: bit read_t[15]; 4: bit write_t[15]; 5: 6: ... 7: upc_lock(lk[base-1]) \ 8: atomic{ lk[base-1]==0 -> lk[base-1]=1 } 9: upc_lock(lk[base]) \ 10: atomic{ lk[base]==0 -> lk[base]=1 } 11: upc_lock(lk[i-1]) \ 12: atomic{ lk[i-1]==0 -> lk[i-1]=1 } 13: upc_lock(lk[base+MAXN-1]) \ 14: atomic{ lk[base+MAXN-1]==0 -> \ 15: lk[base+MAXN-1]=1 } 16: upc_lock(lk[base+MAXN]) \ 17: atomic{ lk[base+MAXN]==0 -> \ 18: lk[base+MAXN]=1 } 19: upc_unlock(lk[base-1]) lk[base-1]=0; 20: upc_unlock(lk[base]) lk[base]=0; 21: upc_unlock(lk[i-1]) lk[i-1]=0; 22: upc_unlock(lk[base+MAXN-1]) \ 23: lk[base+MAXN-1]=0; 24: upc_unlock(lk[base+MAXN]) lk[base+MAXN]=0; 25: tmp[0]=t[0] read_t[0]=1; \ 26: read_t[0]=0; 27: tmp[0]=(((t[(base-1)]+t[base])+\ 28: t[(base+1)])/3) \ 29: read_t[base-1]=1; \ 30: read_t[base]=1; \ 31: read_t[base+1]=1; \ 32: atomic{ read_t[base-1]=0; \ 33: read_t[base]=0; \ 34: read_t[base+1]=0; } 35: e=fabs((t[base]-tmp[0])) read_t[base]=1; \ 36: read_t[base]=0; 37: tmp[1]=(((t[(i-1)]+t[i])+t[(i+1)])/3) \ 38: read_t[i-1]=1; \ 39: read_t[i]=1; \ 40: read_t[i+1]=1; \ 41: atomic{ read_t[i-1]=0; \ 42: read_t[i]=0; \ 43: read_t[i+1]=0; } 44: t[(i-1)]=tmp[0] write_t[i-1] = 1; \ 45: write_t[i-1] = 0; 46: tmp[0]=t[1] read_t[1]=1; \ 47: read_t[1]=0; 48: tmp[1]=(((t[((base+MAXN)-2)]+\ 49: t[((base+MAXN)-1)])+\ 50: t[(base+MAXN)])/3) \ 51: read_t[((base+MAXN)-2)]=1; \ 52: read_t[((base+MAXN)-1)]=1; \ 53: read_t[(base+MAXN)]=1; \ 54: atomic{ read_t[((base+MAXN)-2)]=0; \ 55: read_t[((base+MAXN)-1)]=0; \ 56: read_t[(base+MAXN)]=0; } 57: t[((base+MAXN)-1)]=tmp[1] \ 58: write_t[((base+MAXN)-1)]=1; \ 59: write_t[((base+MAXN)-1)]=0; 60: t[((base+MAXN)-2)]=tmp[0] \ 61: write_t[base+MAXN-2]=1; \ 62: write_t[base+MAXN-2]=0; 63: ... Example of abstraction specificationin UPC-SPIN Target UPC program 1: shared [MAXN] double t[MAXN*THREADS]; 2: upc_lock_t * shared lk[MAXN*THREADS]; 3: ... 4: inti,base; 5: double e; 6: double tmp[2]; 7: 8: base=MYTHREAD*MAXN; 9: 10: if (MYTHREAD==0) { 11: tmp[0]=t[0]; 12: e=0.0; 13: } else { 14: upc_lock(lk[base-1]); 15: upc_lock(lk[base]); 16: tmp[0]=(t[base-1]+t[base]+t[base+1])/3.0; 17: upc_unlock(lk[base]); 18: upc_unlock(lk[base-1]); 19: e=fabs(t[base]-tmp[0]); 20: } 21: 22: for (i=base+1; i<base+MAXN-1; i++) { 23: upc_lock(lk[i-1]); 24: tmp[1]=(t[i-1]+t[i]+t[i+1])/3.0; 25: t[i-1]=tmp[0]; 26: tmp[0]=tmp[1]; 27: upc_unlock(lk[i-1]); 28: } 29: 30: if (MYTHREAD<THREADS-1) { 31: upc_lock(lk[base+MAXN-1]); 32: upc_lock(lk[base+MAXN]); 33: tmp[1]=(t[base+MAXN-2]+t[base+MAXN-1]+ 34: t[base+MAXN])/3.0; 35: upc_unlock(lk[base+MAXN]); 36: upc_unlock(lk[base+MAXN-1]); 37: upc_lock(lk[base+MAXN-1]); 38: t[base+MAXN-1]=tmp[1]; 39: upc_unlock(lk[base+MAXN-1]); 40: } 41: 42: … Longer than the program about 40 lines about 60 lines
Our approach • Let users specify their abstractionsby writing tree translatorsfor abstract syntax trees of their programs • Translator of abstract syntax treescan be written in a concise and flexible way • compared to text-based pattern matching/rewriting
UPC-SPIN’s specification 1: ... 2: bit lk[15]; 3: bit read_t[15]; 4: bit write_t[15]; 5: 6: ... 7: upc_lock(lk[base-1]) \ 8: atomic{ lk[base-1]==0 -> lk[base-1]=1 } 9: upc_lock(lk[base]) \ 10: atomic{ lk[base]==0 -> lk[base]=1 } 11: upc_lock(lk[i-1]) \ 12: atomic{ lk[i-1]==0 -> lk[i-1]=1 } 13: upc_lock(lk[base+MAXN-1]) \ 14: atomic{ lk[base+MAXN-1]==0 -> \ 15: lk[base+MAXN-1]=1 } 16: upc_lock(lk[base+MAXN]) \ 17: atomic{ lk[base+MAXN]==0 -> \ 18: lk[base+MAXN]=1 } 19: upc_unlock(lk[base-1]) lk[base-1]=0; 20: upc_unlock(lk[base]) lk[base]=0; 21: upc_unlock(lk[i-1]) lk[i-1]=0; 22: upc_unlock(lk[base+MAXN-1]) \ 23: lk[base+MAXN-1]=0; 24: upc_unlock(lk[base+MAXN]) lk[base+MAXN]=0; 25: tmp[0]=t[0] read_t[0]=1; \ 26: read_t[0]=0; 27: tmp[0]=(((t[(base-1)]+t[base])+\ 28: t[(base+1)])/3) \ 29: read_t[base-1]=1; \ 30: read_t[base]=1; \ 31: read_t[base+1]=1; \ 32: atomic{ read_t[base-1]=0; \ 33: read_t[base]=0; \ 34: read_t[base+1]=0; } 35: e=fabs((t[base]-tmp[0])) read_t[base]=1; \ 36: read_t[base]=0; 37: tmp[1]=(((t[(i-1)]+t[i])+t[(i+1)])/3) \ 38: read_t[i-1]=1; \ 39: read_t[i]=1; \ 40: read_t[i+1]=1; \ 41: atomic{ read_t[i-1]=0; \ 42: read_t[i]=0; \ 43: read_t[i+1]=0; } 44: t[(i-1)]=tmp[0] write_t[i-1] = 1; \ 45: write_t[i-1] = 0; 46: tmp[0]=t[1] read_t[1]=1; \ 47: read_t[1]=0; 48: tmp[1]=(((t[((base+MAXN)-2)]+\ 49: t[((base+MAXN)-1)])+\ 50: t[(base+MAXN)])/3) \ 51: read_t[((base+MAXN)-2)]=1; \ 52: read_t[((base+MAXN)-1)]=1; \ 53: read_t[(base+MAXN)]=1; \ 54: atomic{ read_t[((base+MAXN)-2)]=0; \ 55: read_t[((base+MAXN)-1)]=0; \ 56: read_t[(base+MAXN)]=0; } 57: t[((base+MAXN)-1)]=tmp[1] \ 58: write_t[((base+MAXN)-1)]=1; \ 59: write_t[((base+MAXN)-1)]=0; 60: t[((base+MAXN)-2)]=tmp[0] \ 61: write_t[base+MAXN-2]=1; \ 62: write_t[base+MAXN-2]=0; 63: ... Comparison of abstraction specificationsin UPC-SPIN and our approach Our specification 1: import Caf2Pml.Caf2Pml 2: ... 3: global d@(Base (Coarray (Array _ n)) s) 4: = [d, Base (Coarray (Array Bit n)) ("read_"++s), 5: Base (Coarray (Array Bit n)) ("write_"++s)] 6: global _ = [] 7: 8: ... 9: lut s = visit s (refer "write") (refer "read") 10: ... 11: refer s (CoarrayRef (ArrayRef (VarRef t) szi) ci) 12: = [Assign (CoarrayRef (ArrayRef 13: (VarRef (s++"_"++t)) szi) ci) (Constant 1), 14: Assign (CoarrayRef (ArrayRef 15: (VarRef (s++"_"++t)) szi) ci) (Constant 0)] 16: refer _ _ = [] Much smaller Not only small,but also reusable even if the target code is modified
Other examples ofsimple abstractions in our approach • Abstraction that ignores assignments to vand preserves everything except them: • Abstraction that preserves assignments to vand ignores everything except them: func (Assign (VarRef v) _) = [] funcs = [s] func s@(Assign (VarRef v) _) = [s] func_ = []
CAF-SPIN: a model checker for Coarray Fortran with user-definable abstractions • Our approach: • Translate a Fortran source programinto an intermediate representationof CAF-SPIN • Translate the intermediate representationinto Promela code • Input for the existing model checker, SPIN • Check the specified propertiesby running SPIN Fortran program Intermediate representation of CAF-SPIN Promela code
CAF-SPIN: a model checker for Coarray Fortran with user-definable abstractions • Users can specifytheir own abstractionsby manipulatingthe intermediate representation Fortran program Intermediate representation of CAF-SPIN Promela code
Overview of CAF-SPIN Fortran program Translator Parser Intermediate representation Code Generator Abstracted Promela Code SPIN Model Checker
XMP-SPIN: a model checker for XcalableMP (Fortran ver.) • XcalableMP (XMP)= a PGAS language developed by RIKEN AICS, University of Tsukuba, etc. (Prof. Mitsuhisa Sato’s team) • XMP-SPIN is implementedby extending CAF-SPIN
Overview of XMP-SPIN XMP program Translator Extended to parse XMP source program Pre-defined translators specialized for XMP primitives (shadow/reflect) are available to users Parser Intermediate representation Code Generator Abstracted Promela Code SPIN Model Checker
Preliminary Experiments:model checking with XMP-SPIN • Targets: • parallel stencil computation programs • 19 small example programs in the XMP tutorial • Himeno benchmark (XMP ver.) • 60 lines of code • SCALE-LES (XMP ver.) • 1442 lines of code • Results: • 4 bugs were found in SCALE-LES within 6.5 minutes • The bugs were introduced when porting to XMP