Accusation probabilities in Tardos codes

Accusation probabilities in Tardos codes Antonino Simone and Boris Škorić Eindhoven University of Technology CWG, Dec 2010

Outline • Introduction to forensic watermarking • Collusion attacks • Aim • Attack models • Tardos scheme • Code length history • q-ary version • Properties • New parameterization • Majority voting effect • Performance of the Tardos scheme • False accusation probability • Results & Summary

originalcontent originalcontent content withhidden payload payload WM secrets payload WM secrets Detector Embedder Forensic Watermarking ATTACK Payload = some secret code indentifying the recipient

"Coalition of pirates"  = "detectable positions" pirate #1 1 1 1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 #2 1 0 1 0 1 0 1 0 0 0 1 1 #3 1 1 1 0 0 0 1 1 0 0 0 1 #4 AttackedContent 1 0/1 1 0 0/1 0 1 0/1 0/1 0 0/1 1 Collusion attacks

Aim Trace at least one pirate from detected watermark BUT Resist large coalition  longer code Low probability of innocent accusation (FP) (critical!)  longer code Low probability of missing all pirates (FN) (not critical)  longer code AND Limited bandwidth available for watermarking code

Attack models Once pirates detect watermark positions, what can they do? • Restricted digit model • Choice from available symbols only • Unreadable digit model • Erasure allowed • Arbitrary digit model • Arbitrary symbol (but not erasure) • General digit model Alphabet={A,B,C,D} equivalent for binary symbols • More realistic scenario • Simpler to analyze

Staddon et al 2001: Tardos 2003: Boneh and Shaw 1998: Chor et al 2000: Tardos 2003: Boneh and Shaw 1998: Code length history Construction Huang + Moulin; Amiri + Tardos 2009: c0 = #pirates n = #usersm = code length in symbols q = alphabet size 1 = Prob[accuse specific innocent]  = Prob[not all accused are guilty] 2 = False Negative prob. Lower bound

q-aryTardos scheme (2008) m content segments biases Symbol biases drawn from distribution F embedded symbols • Arbitrary alphabet size q • Dirichletdistribution F • Symbol-symmetric n users c pirates Symbols allowed =y watermark after attack

Tardos scheme continued • Accusation: • Every user gets a score • User is accused if score > threshold • Sum of scores per content segment • Given that pirates have y in segment i: • Symbol-symmetric p g0(p) g1(p) p

Properties of the Tardos scheme • Asymptotically optimal • Random code book • No framing • No risk to accuse innocent users if coalition is larger than anticipated • F, g0 and g1 chosen ‘ad hoc’ (can still be improved)

Accusation probabilities m = code length c = #pirates μ̃= expected coalition score per segment Pirates want to minimize μ̃ and make longer the innocent tail threshold • Curve shapes depend on: • F, g0, g1 (fixed ‘a priori’) • Code length • # pirates • Pirate strategy guilty innocent total score (scaled) Central Limit Theorem  asymptotically Gaussian shape (how fast?) 2003  2010: innocent accusation curve shape unknown… till now!

New parameterization Necessary a new parameterization! Kb=quantity depends on pirate strategy Kb can be pre-computed Which strategy minimizes μ̃? Symbol-symmetric  we take care only the symbol occurrences  = pirate occurrences vector α = #α in segment c pirates   α α = c W(b) b

Some attack definitions • Majority voting • yi = symbol that occurs most in segment i • Interleaving attack • Prob[yi=α] = α /c Example:

Majority voting • Theorem: Majority voting strategy minimizes μ̃ • Proof (intuitive): • Case 1: • only 2 symbols detected W(b) c=19 Best choice b

Majority voting • Theorem: Majority voting strategy minimizes μ̃ • Proof (intuitive): • Case 2: • more than two symbols detected • one symbol occurs more than c/2 times W(b) c=19 Best choice b

Majority voting • Theorem: Majority voting strategy minimizes μ̃ • Proof (intuitive): • Case 3: • more than two symbols detected • all symbols occur less than c/2 times W(b) c=19 b Best choice

Innocent curve behaviour • Motivations: • Most critical part in the Tardos scheme (FP ≈ 10-10) • Still unknown • Unknown innocent curve  unknown real code length • Is Gaussian approximation good?

Approach Fourier transform property: • Steps: • S = iSi • Si  •  = pdf of total score S • S   = InverseFourier[ ] • Compute • Depends on strategy • New parameterization for attack strategy • Compute • Taylor • Taylor • Taylor Trouble doing numerics (integral does not converge)

Main result: false accusation probability curve Example: interleaving attack threshold/√m exact FP log10FP Result from Gaussian

Main result: false accusation probability curve Example: interleaving attack Conclusion: Gaussian approximation is worse for larger q Better than Gaussian!

Main result: false accusation probability curve Example: majority voting attack threshold/√m exact FP Result from Gaussian log10FP FP is 70 times less than Gaussian approx in this example But  Code 2-5% shorter than predicted by Gaussian approx

Summary Results: • introduced a new parameterization of the attack strategy • majority voting minimizes μ̃ • first to compute the innocent score pdf • quantified how close FP probability is to Gaussian • sometimes better then Gaussian! • safe to use Gaussian approx • larger q  Gaussian approximation less good Future work: • study more general attacks • different parameter choices Thank you for your attention!

Accusation probabilities in Tardos codes

Accusation probabilities in Tardos codes

Presentation Transcript

Probabilities

Accusation Part I

An Asymmetric Fingerprinting Scheme based on Tardos Codes

Asymptotically false-positive-maximizing attack on non-binary Tardos codes

Accusation Part II

Expected Probabilities

Probabilities

Accusation probabilities in Tardos codes

Limiting probabilities

Adding Probabilities

Asymptotically false-positive-maximizing attack on non-binary Tardos codes

Probabilities

Róbert Tardos

Statistical properties of Tardos codes

Discrete Probabilities

Statistics Probabilities

Probabilities

Genetic Probabilities

Calculating Probabilities

Pal-Tardos Mechanism

Poisson Probabilities