510 likes | 629 Views
The Code Validation Tool (CVT). Paper by: A . Pnueli O . Shtrichman M . Siegel. Course: 236814 Served to: Professor Orna Grumberg Served By: Yehonatan Rubin. Before we start. This article is from 1998. Thus, I would like you all to join me into a trip to the past.
E N D
The Code Validation Tool(CVT) Paper by: A. Pnueli O. Shtrichman M. Siegel Course: 236814 Served to: Professor OrnaGrumberg Served By: Yehonatan Rubin
Before we start This article is from 1998. Thus, I would like you all to join me into a trip to the past. I would like to introduce you all to a Language called DC+.
DC+ • Intermediate language for multiple synchronous languages. • Advantages: • Portability • Multiple languages: • needing only one “optimizer” • Combining models written in different languages.
Single “optimizer” Can we even trust those code generators? Very complex efficient code generator were written Single Intermediate language for many popular languages.
Two “feasible” solutions Formal validation of the code generator Validate semantic equivalency between the original DC+ code and the resulted C code So lets start from why this is bad Today’s Topic
Formal validation of the code generator • Meaning: full formal verification of the code generator. • Problems: • Extremely hard to do in industrial size code generators. • Once finished- frizzes the design. (Who would dare to change something?)
Code Validation Has to be Automatic Instead of validating the code generator once-Validating each run. Meaning: making sure that the resulted C code is semantically equivalent to the initial DC+ code.
CVT in one slide DC+ code C code CVT Not Approved Approved
What CVT is good for? Production of safety critical systems Enables the use of code generation tool in such high quality systems. “The combination of automatic code generation and validation …” “…eliminating the need for hand-coding the target code...” .
When does a C code correctly implements the DC+ source? Hard question. First, will need to understand DC+’s semantics.
Semantics of DC+ programs No Interrupts Synchronous. Describes a reactive system whose behavior along time is observable as an infinite sequence of states. State changes are triggered by the arrival of new values for the input variables.
Semantics of DC+ programs II The list of constraints determines the transition relation of the system. A list of constraints on the program variables. When new values arrive to the input variables, the other variables values are being determined according to the constraints list. At each instance in time all constraints have to be satisfied by the values that the variables have at that instance.
Semantics of DC+ programs III Input variables Output variables Internal variables Register variables The trigger for state changes Observable variables For internal use Store information about the history of the current computation There are four kinds of variables:
Comparing C program and DC+ program both the DC+ and the generated C program need to be translated into a common semantic domain. STS- synchronous transition systems will be used.
This is how STS works V Θ p A satisfiable assertion Θcharacterizing the initial states of system S transition relation A finite set V of typed variables obtained by a one-to-one translation of the list of constraints into logical formulas all original program variables the initial state STS S=(V,Θ,p)
This is how STS works II Legal State along the computation initial state of the system Solutions of p for given values of the input variables determine the values of the remaining variables. Observable behavior of such a system can be understood in the following way:
This is how STS works III atomic Deterministic Bounded • Reminder: DC+ is Synchronous. • According to synchrony hypothesis: • no time delay between the reception of new values for input variables and the generation of corresponding output values • all variables are updated simultaneously.
The C code • The result of the code generator is a C code. • Will have the following structure: • ANSI-C • One control loopeach iteration corresponds to one step of the DC+ program
The C code II Unlike in DC+, here the variables are not updated simultaneously. the control loop consumes new values for input variables and successively computes (one by one) the values of the remaining variables.
The C code III states marked with a bullet, corresponds to the begin (and end) of the control-loop. those states match the states of the original DC+ program. Intermediate states, where only some variables have been updated, are not depicted since they do not correspond to any state of the DC+ program.
The C code IV For the purpose of semantical comparison, the C program is also translated into an STS representation.
Correct Implementation • Now, we have a common semantic domain for both the C and DC+ programs. • We’’ll say that: “Program C implements DC+ if for every computation σ of C there exists a computation τ of DC+ such that σ and τ agree state-wise on the values of observable variables, i.e., input and output variables…”
One last thing- a mapping Now, we’ll need a mapping from DC+ C (abstraction to concrete). The use-case code generator applies more than 100 optimizations. Thus, the mapping domain will be the observable variables (I/O). The mapping will assign a term over the concrete variables for every abstract variable.
Finally- a logical rule If both of these proof obligations are found to be valid, we can conclude that C is a correct refinement of the corresponding DC+ program.
So? Now we’ll move to the practical part. CVT’s Architecture.
CVT’s Architecture What we talked about so far
Auto Decomposition The right hand side of the implication is in the form of a conjunction. Since the time it takes to verify a programusing BDD based tools is worst-caseexponential in the size and complexity of the formula, it is the size of the single formula that has to be verified thatdetermines the bottleneck of the validitychecking. Before (practical) SAT solvers?
Auto Decomposition II Paper claims to “soon explain why”. I couldn’t find where this soon is The auto-decomposition module breaks the program into n separate files to be validated. Each of these files represent a smaller verification task. In most cases, each file will be bigger than .
Auto Decomposition II This is way Decomposition is so important Why Decomposition? Verifying each formula is exponential in it’s size. Decomposition causes linear increase in the amount of validation tasks. And linear decrease in each task size. which means exponential decrease in verification time of each formula.
Cone of Influence • After breaking the right-hand side, the module returns to the left-hand side of the implication, and calculates the “Cone of Influence”. COI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right-hand side.
Cone of InfluenceHow to calculate CVT makes a list of all the variables that are used in the right-hand side of the implication. Assume is such a variable. CVT looks for the definition of on the left-hand side. This will be an expression over other variables …. It then erases from the list and instead adds each of the variables … that were not in this list before. This procedure is repeated until the list contains only Input variables. At the end, the only conjuncts retained on the left-hand side are the defining equations for the variables that were considered throughout the computation.
Cone of InfluenceExample , are the input variables Decomposition module will split the right hand side into two files. COI calculation:
Recap • COI: the portion of the formula in the left-hand side that is needed for proving the selected conjuncts on the right-hand side. • Now, we have many pairs of files to be calculated (possibly even simultaneously) • The pairs are: • Conjunct from the right side • The conjunct COI from the left side
CVT’s Architecture What we talked about so far
The Abstraction Module abstraction is needed since we are trying to verify a formula which contains integer and float variables, as well as functions over these variables using a BDD-Based decision procedure for finite-state models. The abstraction module treats these functions as uninterpreted functions, replacing them by new symbols.
The Abstraction Module II • The faithfulness of this technique depends on two things: • the way that the compiler manipulates these functions • the kind of functions we leave uninterpreted. • Should we interpret more function? • The more we interpret, the more faithful the model is.(it’s also hard to interpret complex functions) • The less we interpret, the smaller the model is.
The Abstraction Module III The abstraction works in an incremental manner. CVT begins with maximum abstraction. all functions except equalities, Boolean operators and if-then-else are left uninterpreted. If the proof fails, CVT invokes the next level of abstraction. Additionally, comparisons operators on integers (<, >, etc.) are now being interpreted. There are no more levels.
Example • Example for why this is necessary: • If the compiler reads “a<b” and from some reason decides to turn it to “b>a”. • The first level of abstraction will result a false negative. • The second level of abstraction will result a true positive.
This leaves us with a quantifier-free first-order logic formula which enjoys the small model property(i.e., it is satisfiable iff it is satisfiable over a finite domain). Therefore the next issue is the calculation of a finite domain. such that the formula is valid if and only if it is valid over all interpretations into this domain.
Once we have a valid domain, checking whether the formula is satisfiable or not is relatively easy thing to do (BDDs). So, which domain to use?
Choosing a domain Level 0:equalities, Boolean operators and if-then-else Which function do we interpret? Only order is important Level 1:comparisons operators on integers (<, >, etc.) If there are n variables The domain [1..n] contains all possible rearrangement The domain [1..n] is Valid
CVT’s Architecture What we talked about so far
The Range Minimization Module The size of the state-space imposed by the [1…n] domain is . Along comes theRange Minimization Module!
Range minimization That’s R: Goal: mapping each integer variable into a small finite set of integers, such that φis satisfiable iff it is satisfiable over some R-interpretation. Thus, reducing the state-space from to
TLV • TLV- the verifier module • SMV based tool. • Invoked for each pair of files (as created from the COI). • If equivalence proof fails: • It is possible to isolate the conjunct that failed it.
Evaluation Case study: a turbine from the SACRES project. 5 units (manually separated). DC+ is few thousands line of code with over 1000 variables.
Evaluation- Results Unverified conjuncts add a very large Cone Of Influence.
The End Thank you