110 likes | 134 Views
Explore shared memory models through a read/write program run on 8 processors, discussing sequential consistency and order consistency. Dive into exercises simulating examples in Itanium architecture. Discover the importance of formal tools in understanding memory consistency. Exciting MPEC paper in the works!
E N D
Shared Memory ConsistencyModels A review and some fun exercises
Example (courtesy of Qadeer/Rajamani):A simple Read/Write program is run on eight processors, and the values fetched by the Reads are shown in blue P2 P3 P4 P5 P6 P7 P8 P1 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WA0 RA0 RA1 WA1 RB0 RB1 RC0 RC1
Is this execution Sequentially Consistent? P2 P3 P4 P5 P6 P7 P8 P1 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WA0 RA0 RA1 WA1 RB0 RB1 RC0 RC1
What does SC mean? • The program APPEAR to execute in such a manner that • One can build a total order containing each (value-annotated) • instruction exactly once • This total order must be consistent with the Per-Processor Order • In this total order, every read must return the value of the • most recent write P2 P3 P4 P5 P6 P7 P8 P1 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WA0 RA0 RA1 WA1 RB0 RB1 RC0 RC1
One example of trying to build such a total order arb P3 Val Val Val P5 arb Val oops, violates P3’s pgm ord. WC0 RC0 WA0 RA0 WA1 RA1 RC0 WC1 RC1 P3 P4 P5 P5 P4 P2 P3 P4 P5 P6 P7 P8 P1 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WA0 RA0 RA1 WA1 RB0 RB1 RC0 RC1
A more systematic attempt • Let’s choose WC0 before WC1 and WB0 before WB1 and see what happens • This violates no read-value outcomes, and so such an ordering has to be attempted • (and if it works, we are done; if not, try all 4! = 24 permutations. • The diagram below incorporates the ordering WC0 -> WC1 and WB0 -> WB1 WC0 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WC1 WA0 RA0 RA1 WA1 WC0 RB0 RB1 RC0 RC1 WC1
A more systematic attempt (contd) • Now, any ordering imposed on A (arbitrary orderings again) causes trouble WA edge WC0 WC0 WC1 WC1 RC0 RC1 RB0 RB1 WB0 WB1 WC0 WC1 WA0 RA0 RA1 WA1 WC0 RB0 RB1 RC0 RC1 WC1
Simulating this example in Itanium:Fence everywhere ld.acq r1 = [b] <1> mf st.rel [a] = 1 mf ld r2 = [c] <1> ld.acq r1 = [c] <2> mf st.rel [a] = 2 mf ld r2 = [b] <2> ld.acq r1 = [c] <1> mf ld.acq r2 = [a] <2> mf ld r3 = [b] <1> ld.acq r1 = [b] <2> mf ld.acq r2 = [a] <1> mf ld r3 = [c] <2> st.rel [c] = 1 st.rel [b] = 2 st.rel [b] = 1 st.rel [c] = 2
Simulating this example in Itanium:Forgotten StRel ld.acq r1 = [b] <1> st [a] = 1 mf ld r2 = [c] <1> ld.acq r1 = [c] <2> st [a] = 2 mf ld r2 = [b] <2> ld.acq r1 = [c] <1> ld.acq r2 = [a] <2> ld r3 = [b] <1> ld.acq r1 = [b] <2> ld.acq r2 = [a] <1> ld r3 = [c] <2> st.rel [c] = 1 st.rel [b] = 2 st.rel [b] = 1 st.rel [c] = 2
Simulating this example in Itanium:Min MF ld.acq r1 = [b] <1> st.rel [a] = 1 mf ld r2 = [c] <1> ld.acq r1 = [c] <2> st.rel [a] = 2 mf ld r2 = [b] <2> ld.acq r1 = [c] <1> ld.acq r2 = [a] <2> ld r3 = [b] <1> ld.acq r1 = [b] <2> ld.acq r2 = [a] <1> ld r3 = [c] <2> st.rel [c] = 1 st.rel [b] = 2 st.rel [b] = 1 st.rel [c] = 2
Conclusions • (Tool available ; ported to UPC also) • Tricky! • Need formal tools • Mistakes re-discovered (in perpetuity) • Those w/o formal basis in their minds will “reel” each time they read something in this area • I can help you with literature in this area • MPEC paper will be written with this prep!