230 likes | 422 Views
Validating High-Level Synthesis. Sudipta Kundu , Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San Diego. …. x = a * b; c = a < b; if (c) then a = b – x; else a = b + x; a = a + x; b = b * x; …. C/C++, SystemC
E N D
Validating High-Level Synthesis Sudipta Kundu, Sorin Lerner, Rajesh Gupta Department of Computer Science and Engineering, University of California, San Diego
…. x = a * b; c = a < b; if (c) then a = b – x; else a = b + x; a = a + x; b = b * x; …. • C/C++, SystemC < 100 – 10K lines > Functionally Equivalent Data path Functionally Equivalent S0 Controller S1 !f f S3 S2 • Verilog, VHDL < 1K – 100K lines > S4 High-Level Synthesis Algorithmic Design Functionally Equivalent High-Level Synthesis (HLS) Scheduled Design Register Transfer Level Design (RTL)
Equivalence Checker • Once and for all • HLS tool always produce correct results • For each translation • Does not guarantee • => HLS tool is bug free Checker The Problem Transformed Program (Implementation) • Translation Validation (TV) • Optimizing Compiler [Pnueli et al. 98] [Necula 00] [Zuck et al. 05] Input Program (Specification) Transformations Is the Specification “functionally equivalent” to the Implementation? Does guarantee => any errors in translation will be caught when tool runs
Contributions • Developed TV techniques for a new setting: HLS • Algorithm uses a bisimulation relation approach. • Implemented TV for a realistic HLS tool: SPARK • Widely used: 4,000 downloads, over 100 active users. • Large Software: around 125, 000 LoC. • Ran it on 12 benchmarks • Modular: works on one procedure at a time. • Practical: took on average 6 secs to run per procedure. • Easy to Implement: only a fraction of development cost of SPARK. • Useful: found 2 previously unknown bugs.
Outline • Motivation and Problem definition • Our Approach using an Example • Definition of Equivalence • Translation Validation Algorithm • Generate Constraints • Solve Constraints • Experiments and Results • SPARK: Parallelizing HLS Framework • Conclusion
+ + < a0 a0 a2 i3: (k < 10) i6: ¬ (k < 10) a3 a5 i7: return sum a6 a4 i1: sum = 0 a1 i1: sum = 0 i2: k = p Resource Allocation: a1 i2: k = p a2 i3: (k < 10) i6: ¬ (k < 10) i4: k = k + 1 a3 a5 i7: return sum i4: k = k + 1 i5: sum = sum + k a6 a4 i5: sum = sum + k An Example of HLS Specification: Original Program: intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; 10 sum = ∑k p+1
+ + < a0 a0 a0 a2 i3: (k < 10) i6: ¬ (k < 10) i1: sum = 0 a1 a3 a5 i2: k = p i7: return sum a6 a4 i1: sum = 0 a1 i1: sum = 0 i2: k = p Resource Allocation: a1 i2: k = p a2 i3: (k < 10) i6: ¬ (k < 10) a3 a5 i7: return sum i4: k = k + 1 a6 a4 i5: sum = sum + k An Example of HLS Specification: Loop Pipelining: intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; t = k + 1 i4: k = k + 1 i4: k = t i5: sum = sum + k t = k + 1
+ + < a0 a0 a2 i3: (k < 10) i6: ¬ (k < 10) i1: sum = 0 a1 a3 a5 i2: k = p i7: return sum a6 a4 i1: sum = 0 Resource Allocation: a1 i2: k = p a2 i3: (k < 10) i6: ¬ (k < 10) a3 a5 i7: return sum i4: k = k + 1 a6 a4 i5: sum = sum + k An Example of HLS Specification: Copy Propagation: intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; t = p + 1 t = k + 1 i4: k = t i5: sum = sum + t i5: sum = sum + k t = k + 1 t = t + 1
+ + < a0 a0 a0 i1: sum = 0 a1 i1: sum = 0 i2: k = p i41: t = p + 1 i2: k = p i1: sum = 0 t = p + 1 Resource Allocation: a1 a2 i3: (k < 10) i6: ¬ (k < 10) i2: k = p a3 a5 a2 i3: (k < 10) i6: ¬ (k < 10) i7: return sum i4: k = t a3 a5 a6 a4 i7: return sum i4: k = k + 1 i5: sum = sum + t a6 a4 Read After Write dependency i5: sum = sum + k t = t + 1 i4: k = t i5: sum = sum + t i42: t = t + 1 An Example of HLS Specification: Scheduling: intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum;
a0 b0 j1: sum = 0 j2: k = p j41: t = p + 1 Re-labeled i1: sum = 0 a1 b1 j3: (k < 10) j6: ¬ (k < 10) i2: k = p b2 b3 a2 i3: (k < 10) i6: ¬ (k < 10) j7: return sum a3 a5 b4 i7: return sum i4: k = k + 1 a6 a4 i5: sum = sum + k j4: k = t j5: sum = sum + t j42: t = t + 1 An Example of HLS Specification: Implementation: intSumTo10 (int p) int sum = 0, k = p; while (k < 10) sum += ++k; return sum; ≡
Definition of Equivalence • Specification ≡ Implementation => They have the same set of execution sequences of visible instructions. • Visible instructions are: • Function call and return statements. • Two function calls are equivalent if the state of globals and the arguments are the same. • Two returns are equivalent if the state of the globals and the returned values are the same.
Specification Implementation s’1 s1 i i s2 s’2 Technique for Proving Equivalence • Bisimulation Relation: relates a given program state in the implementation with the corresponding state in the specification and vice versa. • It satisfies the following properties: • The start states are related. Visible Instruction • Theorem: If there exists a bisimulationrelation between the specification and the implementation, then they are equivalent.
Our Approach • Split program state space in two parts: • control flow state, which is finite. => explored by traversing the CFGs. • dataflow state, which may be infinite. => explored using Automated Theorem Prover (ATP). • Bisimulation relation: set of entries of the form (p1, p2, Ф). • p1 – program point in Specification. • p2 – program point in Implementation. • Ф – formula that relates the data.
a0 Specification ps = pi Implementation b0 i1: sum = 0 j1: sum = 0 j2: k = p j41: t = p + 1 a1 ks = kiΛ sums = sumi Λ (ks + 1) = ti i2: k = p a2 b1 i3: (k < 10) j3: (k < 10) i6: ¬ (k < 10) j6: ¬ (k < 10) a3 b2 sums = sumi a5 i4: k = k + 1 b3 j7: return sum i7: return sum j4: k = t j5: sum = sum + t j42: t = t + 1 a4 a6 b4 i5: sum = sum + k Bisimulation Relation • Bisimulation relation: set of entries of the form (p1, p2, Ф). • p1 – program point in Specification. • p2 – program point in Implementation. • Ф – formula that relates the data.
Translation Validation Algorithm • Two step approach. • Generate Constraints: traverses the CFGs simultaneously and generates the constraints required for the visible instructions to be matched. • Solve Constraints: solves the constraints using a fixpoint algorithm. • For loops: iterate to a fixed point. • May not terminate in general. • But in practice can find the bisimulation relation.
a0 Ψ(a0, b0) b0 i1: sum = 0 j1: sum = 0 j2: k = p j41: t = p + 1 a1 Ψ(a2, b1) i2: k = p a2 b1 i3: (k < 10) j3: (k < 10) i6: ¬ (k < 10) j6: ¬ (k < 10) Ψ(a5, b3) a3 b2 a5 i4: k = k + 1 b3 j7: return sum i7: return sum j4: k = t j5: sum = sum + t j42: t = t + 1 a4 a6 b4 i5: sum = sum + k Ψ(a2, b1) Ψ(a0, b0) a2 b1 Ψ(a2, b1) a0 b0 i3 a2 b1 j3 i4 j4 j5 j42 i6 j6 i5 a5 b3 j1 j2 j41 i1 Ψ(a5, b3) a2 b1 a2 b1 Ψ(a2, b1) i2 Ψ(a2, b1) Generate Constraints Specification Implementation Constraints: Constraint Variable • Branch Correlation • Strongest Post Condition • Structure of the code Ψ(a2, b1) ⇒ks = ki Ψ(a5, b3) ⇒ sums = sumi
a0 Ψ(a0, b0) Specification Implementation Ψ(a2, b1) b0 i1: sum = 0 j1: sum = 0 j2: k = p j41: t = p + 1 ATP[ Ψ(a2, b1)⇒ WP( Ψ(a2, b1)) ] a1 Ψ(a2, b1) i2: k = p a2 Ψ(a2, b1): ks = ki b1 i3: (k < 10) j3: (k < 10) i6: ¬ (k < 10) j6: ¬ (k < 10) a2 b1 Ψ(a5, b3) a3 b2 j3: (ki < 10) j4: ki = ti j5: sumi = sumi + ti j42: ti = ti + 1 a5 i4: k = k + 1 b3 i3: (ks < 10) i4: ks = ks + 1 i5: sums = sums + ks j7: return sum i7: return sum j4: k = t j5: sum = sum + t j42: t = t + 1 a4 a6 b4 i5: sum = sum + k Ψ(a2, b1) Ψ(a0, b0) a2 b1 a2 b1 Ψ(a2, b1) a0 b0 Ψ(a2, b1): ks = ki i3 a2 b1 j3 i4 j4 j5 j42 i6 j6 i5 a5 b3 j1 j2 j41 i1 Ψ(a5, b3) a2 b1 a2 b1 Ψ(a2, b1) i2 Ψ(a2, b1) Solve Constraints Ψ(a0, b0): ps = pi : ks = ki ∧ WP(Ψ(a2, b1) ) : ks = ki∧ (ks + 1) = ti Ψ(a2, b1) : ks = ki Λ (ks + 1) = ti Ψ(a2, b1) : ks = kiΛ (ks + 1) = ti Λ sums = sumi Ψ(a2, b1): ks = ki Ψ(a2, b1): True WP( Ψ(a2, b1)) : (ki + 1) ⇒ (ks + 1) ⇒ (ks + 1) = ti Ψ(a5, b3) : True Ψ(a5, b3) : sums = sumi Constraints: Ψ(a2, b1) ⇒ks = ki X Ψ(a5, b3) ⇒ sums = sumi X X X X
Validation Engine SPARK: Parallelizing HLS Framework SPARK C Program Intermediate Representation (IR) No Pointer No Recursion No goto Binding Scheduled IR High Level Synthesis RTL
Read After Write dependency Did not copy array index Bugs Found in SPARK • Array Copy Propagation • Code motion
Related Work • Translation Validation • Sequential Programs [Pnueli et al. 98] [Necula 00] [Zuck et al. 05] • CSP Programs [Kundu et al. 07] • HLS Verification • Scheduling Step • Correctness preserving transformation [Eveking 99] • Symbolic Simulation [Ashar 99] • Formal assertions [Narasimhan 01] • Relational approaches for Equivalence of FSMDs [Kim 04, Karfa 06]
Conclusion and Future Directions • Presented an automated algorithm for translation validation of the HLS process. • Implemented it for a HLS tool called SPARK. • Modular: works on one procedure at a time. • Practical: took on average 6 secs to run per procedure. • Easy to Implement: only a fraction of development cost of SPARK. • Useful: found 2 previously unknown bugs. • In future, • Remaining phases of SPARK: parsing, binding and code generation.