400 likes | 415 Views
This paper from the University of Wisconsin-Madison addresses challenges in hardware signatures, proposing solutions to reduce overhead and conflicts, including utilization of entropy analysis and a privatization interface. The study explores the effectiveness of different hashing functions and the impact of entropy on signature performance. Results demonstrate improvements in signature hashing efficiency and conflict detection accuracy. The proposed methodology offers insights into optimizing signature systems for various workloads, enhancing overall signature reliability and performance.
E N D
Notary: Hardware Techniques to Enhance Signatures • Luke Yen • Collaborator: Prof. Stark C. Draper • Advisor: Prof. Mark D. Hill • University of Wisconsin, Madison • MICRO-41 - November 11, 2008 • www.cs.wisc.edu/multifacet/papers/micro08_notary.pdf
Executive Summary University of Wisconsin-Madison Tackle 2 problems with hardware signatures: • Problem 1: Best signature hashing (i.e., H3) has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost hashing (Page-Block-XOR, PBX) that performs similar to H3 • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: Avoid inserting private stack addrs, propose privatization interface for higher performance
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Signature background University of Wisconsin-Madison • Signatures (hardware Bloom filters) used to summarize and detect conflicts with a transaction’s read- and write-sets • Inspired by Bulk system [Ceze,ISCA’06] • Implemented in LogTM-SE [Yen,HPCA’07] • Can have false positives, but never false negatives • Also proposed for non-TM purposes (e.g., SC violation detection, atomicity violation detection, race recording) • Ex: Use k Bloom filters of size m/k, with independent hash functions
Signature hash functions LogTM-SE w/ 2kb signatures • Result: H3 better with >=2 hash functions • However, H3 uses many multi-level XOR trees • Can we improve this? University of Wisconsin-Madison • Which hash function is best? [Sanchez, MICRO’07] • Bit-selection? Hash simply decodes some number of input bits • H3? Each bit of a hash value is an XOR of (on avg.) half of the input address bits
H3 implementation University of Wisconsin-Madison Num XOR Ex: 2kb signatures, k=2, c=10, 32-bit addr = 160 XOR gates per signature Can we reduce the total gate count?
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Entropy overview University of Wisconsin-Madison • Not all address bits have equal randomness • Ex: High-level address bits unlikely to change if working set size is small • Key insight: If input bits are random and those bits are used as inputs to hash functions, random hash values result • Use entropy to measure bit randomness • Entropy – measure of the uncertainty of a random variable x
Entropy formally defined n bits 0 bits Other cases max min Entropy value of n-bit field All bit patterns in n-bit field equally likely n-bit field has constant value University of Wisconsin-Madison • Entropy = • p(xi) = the probability of the occurrence of value xi • N = number of sample values random variable x can take on • Entropy = amount of information required on average to describe outcome of variable x (in bits) • Ex: What is the best possible lossless compression?
Our measures of entropy Local entropy 6 6 31 31 Addr Addr Global entropy NSkip University of Wisconsin-Madison • For our workloads, we care about: • Q1: What is the best achievable entropy? • Global entropy – upper bound on entropy of address • Q2: How does entropy change within an address? • Local entropy – entropy of bit-field within the address
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Entropy results University of Wisconsin-Madison • Workloads to be described later • Global entropy is at most 16 bits • Bit-window for local entropy is 16 bits wide (NSkip from 0-10) • Smaller windows (<16b) may not reach global entropy value • Larger windows (>16b) hides some fine-grain info
Entropy results summary University of Wisconsin-Madison • More entropy results in our MICRO paper • In summary, for our workloads entropy monotonically decreases when moving towards high-order bits • We calculate the average entropy across the entire workload’s execution • May miss entropy changes due to program phase behavior • Our Page-Block-XOR (PBX) hash takes advantage of this overall trend
Page-Block-XOR (PBX) University of Wisconsin-Madison • Motivated by 3 findings: • (1) Lower-order bits have most entropy • Follows from our entropy results • (2) XORing two bit-fields produces random hash values • From prior work on XOR hashing (e.g., data placement in caches, DRAM) • (3) Bit-field overlaps can lead to higher false positives • Correlation between the two bit-fields can reduce the range of hash values produced (worse for larger signatures)
PBX implementation • PPN and Cache-index fields not tied to system params: • Use entropy to find two non-overlapping bit-fields with high randomness University of Wisconsin-Madison • For 2kb signatures with 2 hash functions: • 20 XOR gates for PBX vs 160 XOR gates for H3!
Summary thus far University of Wisconsin-Madison • Problem 1: H3 has high area & power overheads • Solution 1: Use entropy analysis to guide lower-cost PBX • Ex: 160 gates for H3 vs 20 gates for PBX • Problem 2: Spurious signature conflicts caused by signature bits set by private memory addrs • Solution 2: To be described
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Motivation University of Wisconsin-Madison • False conflicts caused by thread-private addrs • Avoid conflicts if addrs not inserted in thread’s signatures
Privatization solutions University of Wisconsin-Madison • Two solutions proposed: • (1) Remove private stack references from sigs. • Very little work for programmer/compiler • Benefits depend on fraction of stack addresses versus all transactional references • (2) Language-level interface (e.g., private_malloc(), shared_malloc()) • Even higher performance boost • For skilled programmer • WARNING: Incorrectly marking shared objects as private can lead to program errors!
Page-based implementation University of Wisconsin-Madison • Each page is assigned a status, private or shared • Invariant: Page is shared if any object is shared • If stack is private, library marks stack pages as private • If using privatization heap functions, mark heap pages accordingly
OS support University of Wisconsin-Madison • OS allocates different physical page frames for shared and private pages • Sets a per-frame bit in translation entry if shared • Reduce number of page frames used by packing objects with same status together • Signatures insert memory addresses of transactional references to shared pages • Query page sharing bit in HW TLB & current transactional status
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Methodology University of Wisconsin-Madison • Full-system simulation using Simics and Wisconsin GEMS timing modules • Transistor-level design for area & power of XOR gates • CACTI for Bloom filter bit array area & power • Simulated system • Single-chip CMP • 16 single-threaded,in-order cores • 32kB, 4-way private L1 I & D, write-back • 8MB, 8-way shared L2 cache • MESI directory protocol • Signatures from 64b-64kb (8B-8kB) & “Perfect”
Workloads University of Wisconsin-Madison • Micro-benchmarks • BTree – read and write ops on shared tree • Sparse Matrix – algorithm from dense column vector multiplication kernel • SPLASH-2 apps • Barnes & Raytrace – exert most signature pressure • Stanford STAMP apps • Vacation, Genome, Delaunay, Bayes, Labyrinth • DNS server • BIND
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
PBX vs H3 area & power University of Wisconsin-Madison Area & power overheads (2kb, k=4):
PBX vs H3 execution time PBX performs similar to H3 Additional workload results in paper University of Wisconsin-Madison
Privatization results summary University of Wisconsin-Madison • Removing private stack references from signatures did not help much • Most addr references not to stack • Most likely because running with SPARC ISA. Other ISAs (e.g., x86) likely has more benefits • Privatization interface helps four workloads • Remainder either does not have private heap structures or does not have high transactional duty cycle
Privatization interface results University of Wisconsin-Madison
Outline University of Wisconsin-Madison Signature background Entropy Entropy results & PBX Privatization Methodology & workloads Results Conclusions & Future Work
Conclusions University of Wisconsin-Madison • Tackle 2 problems with signature designs: • (1) Area and power overheads of H3 hashing • E.g., 160 XOR gates for H3, 20 for PBX • (2) False conflicts due to signature bits set by private memory references • Our solutions: • (1) Use entropy analysis to guide hashing function (PBX), a low-cost alternative that performs similarly to H3 • (2) Prevent private stack references from entering signatures, and propose a privatization interface for heap allocations • Notary can be applied to non-TM uses: • PBX hashing can directly transfer • Privatization may transfer if addr filtering applies
Future Work University of Wisconsin-Madison • Dynamic entropy calculation: • How to adapt PBX hashing to entropy changes over time? • Dynamic privatization characteristics: • How common is it for objects to change sharing status (i.e., from private to shared, and vice versa)?
BACKUP SLIDES University of Wisconsin-Madison
Privatization interface University of Wisconsin-Madison
Dynamic privatization University of Wisconsin-Madison • Dynamically switch from private to shared, and vice versa • If transitioning from private -> shared, safe to mark page as shared (at cost of performance) • If transitioning from shared -> private, default policy is to disallow if there exists other shared objects on same page • Otherwise, trap to user software and let programmer call shared_free(), followed by private_malloc() on object
Bit-field overlaps harmful for PBX University of Wisconsin-Madison
Removing stack refs doesn’t help significantly University of Wisconsin-Madison
Entropy of commercial workloads University of Wisconsin-Madison
Signature Operation Example Program: xbegin LD A ST B LD C LD D ST C … External ST E External ST F A C D B FALSE POSITIVE: CONFLICT! ALIAS Hash Function(s) NO CONFLICT 00100100 00000100 00100100 00000000 00100100 00100100 R W 00100010 00000000 00100010 00000010 00100010 University of Wisconsin-Madison
Type of Hash Functions Bit-selection H3 [Carter, CSS79] (inexpensive, low quality) (moderate, higher quality) University of Wisconsin-Madison In real programs, addresses neither independent nor uniformly distributed (key assumptions to derive PFP(n)) But can generate hash values that are almostuniformly distributed and uncorrelated with good (universal/almost universal) hash functions Hash functions considered: