Randomness Extraction: A Survey

Randomness Extraction: A Survey David Zuckerman University of Texas at Austin

Randomness in Computer Science • Many uses of randomness in CS. • Randomized algorithms • Cryptography • Distributed computing • But: high-quality randomness expensive. • Can low-quality (weak) randomness suffice?

Models for Weak Randomness • Independent bits with same, unknown bias • [von Neumann ’51] • Semirandom sources [Santha-Vazirani ‘84] • δ < Pr[Xi|X1=x1,…,Xi-1=xi-1] < 1-δ • Block sources [Chor-Goldreich ‘85] • Bit-fixing sources [CFGHRS ‘85,…] • k uniform bits; others set by adversary.

General Weak Random Source [Z ‘90] • Random variable X on {0,1}n. • General model: min-entropy • Flat source: • Uniform on A, |A| ≥ 2k. {0,1}n |A| ³ 2k

General Weak Random Source [Z ‘90] • Can arise in different ways: • Physical source of randomness. • Cryptography: condition on adversary’s information, e.g. bounded storage model. • Pseudorandom generators (for space s machines): condition on TM configuration.

Goal: Extract Randomness m bits n bits Ext statistical error  Problem: Impossible, even for k=n-1, m=1, ε<1/2.

Impossibility Proof • Suppose f:{0,1}n{0,1} satisfies ∀sources X with H∞(X) ≥ n-1, f(X) ≈ U. f-1(0) f-1(1) Take X=f-1(0)

Randomness Extractor: short seed[Nisan-Z ‘93,…, Guruswami-Umans-Vadhan ‘07] d=O(log (n/ε)) random bit seed Y m =.99k bits n bits Ext statistical error  Strong extractor: (Ext(X,Y),Y) ≈ Uniform

Outline • Seeded Extractors • Basic applications • Alternate view with applications • Sketch of two constructions • Seedless Extractors for Structured Sources • Algebraic sources: independent, affine, … • Applications in cryptography • Complexity-theoretic sources • Crypto-tailored Extractors

Simulating Randomized Algorithms • Randomized algorithm R using m random bits. • Assume only random bits X have H∞(X)≥k>m. • No high-quality randomness available. • Given Ext for H∞(X)≥k • seed length d, output length m. • Simulation with factor 2d blowup: • Run R with random string Ext(x,y1),…,Ext(x,y2d). • Take majority vote or median.

Use in Privacy Amplification[Bennett, Brassard, Robert 1985] public • Goal: convert weak shared secret X to uniform secret. • Unbounded passive adversary. Y Pick Shared secret = Ext(X,Y). Correct by strong extractor definition.

PRGs for Space-Bounded Machines • Basic PRG: G(x,y) = (x,Ext(x,y)) [Nisan-Z] • Condition on configuration v after read x. • Whp • Hence whp Ext(X,Y) close to uniform. • G:{0,1}O(s){0,1}poly(s) fools space s TMs [NisanZ] • Sometimes can avoid union bound! • O(log n log log n) bit seed fools read-once polylog-width “regular” BPs [BRRY ‘10,BV ‘10] • O(log n) bit seed fools read-once O(1)-width permutation BPs [KNP].

PRGs from Shrinkage • Hardness vs. Randomness paradigm: • Lower bounds give PRGs [Nisan-Wigderson,…]. • But: need superpolynomial lower bounds. • Known: polynomial lower bounds for restricted models. • E.g., formulas Ω(n3/polylog n) [Andreev, Hastad]. • [Impagliazzo, Meka, Z 2012]: polynomial lower bounds proved via shrinkage give PRGs. • E.g., seed length s1/3+o(1) fools size s formulas.

Graph-Theoretic View: “Expansion” N=2n output  uniform  K=2k M=2m Ext(x,y) x y  (1-)M D=2d Can use this to construct expanders beating eigenvalue bound [WZ]

K-Expanding Graphs N |A|≥K |Γ(A)|>N-K K Useful for sorting, networks Goal: minimize degree D D>N/K Random graphs: D=O((N/K) log (N/K)) 2nd Eigenvalue: D≥(N/K)2/2 Extractors: D=N1+o(1)/K [Wigderson-Z ‘93] K

Extractors K-Expanding Graphs N K M  (1-)M  (1-)M K K-Expanding Graph: V=[N] E=Paths of length 2 in Ext

Alternate View M=2m N=2n D=2d S BADS x Other direction: ErrorS ≤ |BADS|2-k + ε

Averaging Sampler via Alternate View [Z ‘96] • Goal: Estimate mean μ of • Black box access to f. Algorithm: Pick x randomly in {0,1}n. Sample f at Γ(x) = {x1,…,xD}. Output μf. Pr[error] = |BADf|/2n. Can use 1.01m random bits with Pr[error]=2-Ω(m).

Extractor Perspective Helps • Proposition: Sampler using O(m) random bits implies sampler using 1.01m random bits. • Equivalent Statement: Extractor outputting Ω(k) bits implies extractor outputting .99k bits. • Ext(x,(y1,y2)) = Ext(x,y1)Ext(x,y2) [Wigderson-Z] • Conditioned on Ext(X,y1) of length m, still ≈k-m bits of entropy in X.

Extractor Codes via Alt-View[Ta-Shma-Z 2001] • List recovery – generalizes list decoding. S=(S1,…,SD), agreement = |{i|xi in Si}| |{Codewords with agreement ≥(μ(S) + ε)D}| ≤ |BADS|. Extractor codes with efficient decoding give hardcore bits Ext(x,y) wrt 1-way (f(x),y). Codes Extractors [Tre,TZS, SU, GUV].

Max Clique and Chromatic Number • [FGLSS,…,Hastad]: Max Clique inapproximable to n1-, any >0, assuming NP  ZPP. • [LY,…,FK]: Same for Chromatic Number. • Derandomize with linear degree extractors: Thm [Z]: Both inapproximable to n1-, any >0, assuming NP  P.

Constructions of Strong Extractors

Pseudorandom Generators random seed pseudorandom PRG • Cryptographically secure PRGs: • Run in time less than adversary. • Exist iff one-way functions exist [HILL]. • PRGs for derandomization: • Can take slightly more time than adversary. • Exist iff “hard” functions exist [Nisan-Wigderson ...]

PRGs from Hard Functions[Nisan-Wigderson 1988 …] hard function random seed comp. error ε PRG

NW-Style PRGs Give Extractors[Trevisan 1999] • View x as hard function f:{0,1}lgn{0,1} • Most functions hard • Set Ext(x,y) = NW-PRG(f,y) • Better: Ext(x,y) = NW-PRG(Code(f),y) seed n bits Ext statistical error 

Linear Degree Extractor [Z] (Sketch)  + O(1) random bits Condense: .9 Extract: + lg n+O(1) random bits  uniform

Condensing via Incidence Graph • 1-Bit Somewhere Condenser: • Input: edge • Output: random endpoint • Condenses rate  to rate (1+), some  > 0. • Proof uses bound on incidences [BKT]+ probabilistic lemma. • Combine with technique of [Raz] to get actual condenser. lines points = Fq2 • (L,P) an edge iff P on L |P|3/2edges P L

High Entropy Extractor • Chernoff bound for random walks on expanders [Gillman,Kahale] • Implies Sampler • Implies Extractor.

Seeded Extractor Techniques/History • Hashing based: Z ’90-91, Nisan-Z ‘93, Wigderson-Z ‘93, Srinivasan-Z ’94, Z ‘96, Ta-Shma ‘96, Raz-Reingold-Vadhan ‘99, Reingold-Shaltiel-Wigderson ‘00, • NW-PRG based: Trevisan’99, Raz-Reingold-Vadhan ‘99, Impagliazzo-Shaltiel-Wigderson ‘99-00, Ta-Shma-Umans-Z ‘01 • Algebraic/coding theory based: Ta-Shma-Z-Safra’01, Shaltiel-Umans ‘01, Lu-Reingold-Vadhan-Wigderson ‘03, Gurusmami-Umans-Vadhan ‘07, Ta-Shma-Umans’12 • Additive combinatorics based: Barak-Kindler-Shaltiel-Sudakov-Wigderson’05, Raz ‘05, Z ’07, Dvir-Wigderson ‘08, Dvir-Kopparty-Sharaf-Sudan ‘09

Seedless (Deterministic) Extractors for Structured Sources • Probabilistic Method: If ≤ sources of min-entropy k: Can deterministically extract m=(1-α)k bits with error 2-αk/3. • Algebraic sources: • Bit-fixing, affine. • Independent sources. • Complexity-theoretic sources: • AC0 sources, small-space sources.

Oblivious Bit-Fixing Sources • Example: ?0010?111??11. • ? = uniform on {0,1}. • (n-k) bits fixed by adversary; k uniform bits. • Parity extracts 1 bit. • For k≥logc n, can extract k-o(k) bits [GRS, Rao]. • Application: Exposure Resilient Cryptography. • Adversary learns many bits of secret key. • Can still do cryptography.

Affine Extractors • X = random element from affine subspace. • Generalizes bit-fixing sources. • Extractor for min-entropy αn, any α>0 [Bourgain]. • 1-bit disperser for min-entropy exp(log.9 n) [Shaltiel]. • Large fields: any k>0 [Gabizon-Raz].

Independent Sources n bits n bits Ext m =Ω(k) bits statistical error 

Classical: entropy rate > 1/2 • Lindsey Lemma: H∞ (X) + H∞ (Y) > n+t implies X.Y ≈ U, error 2-t/2.

Independent Sources

Cryptography with Weak Sources • Players have independent weak sources. • Allow Byzantine faults. • For 2 players, impossible [DOPS]. • For more players, possible!

Network Extractor Protocol[Goldwasser-Sudan-Vaikunthanatan05, Dodis-Oliveira03] 010101010 01001 Input: x1,…,xp2 {0,1}nfrom independent weak random sources 011011011 11010 01010101 01001 Byzantine faults:can send arbitrary messages 001010101 01001 100100101 10100 Output: z1,…,zp2 {0,1}mprivate nearly-uniformrandom strings (for honest parties) 010111101 10101 011110101 11001 010100101 10110

Network Extractor Protocols • After running network extractor protocol, run standard protocol, e.g., Byzantine Agreement. • Naïve idea to design protocol: • A few players broadcast sources. • Remaining players apply independent-source extractor to those sources and own source. • Problem: what if only malicious players broadcast?

Network Extractor Constructions • Information-theoretic setting [Kalai-Li-Rao-Z]: • For k ≥ exp(logα n), can still tolerate linear number of faults in BA and leader election, any α>0. • Computational setting [Kalai-Li-Rao]: • Under certain crypto assumptions, for k = αn, secure multiparty computation if ≥ 2 honest players. • Under certain crypto assumptions, 2-source extractors for k = αn, any α>0.

Complexity-Theoretic Sources • X=f(U), complexity(f) small. • Deterministic extraction possible under assumptions [Trevisan-Vadhan ‘00]. • No assumptions: • NC0 [De-Watson ‘11, Viola ‘11] • AC0 [Viola ‘11] • Proofs reduce to low-weight affine extractors [Rao ‘09].

0.1,0 1,1 1-1/, 0 0.3,0 0.8,1 0.5,1 0.1,0 1/, 0 0.1,1 0.1,0 Small Space Sources • Space s source: min-entropy k source generated by width 2s branching program. n+1 layers width 2s 1 1 0 1 0 0 1

Bit Fixing Sources can be modelled by Space 0 sources 0.5,1 0.5,1 0.5,1 1,1 1,0 1,1 0.5,0 0.5,0 0.5,0 ? 1 ? ? 0 1

Extractors for Small Space Sources • For k ≥ αn, any α>0, space αβn, β>0 sufficiently small, can extract k-o(k) bits [Kamp-Rao-Vadhan-Z ‘06]. • Proof reduces to variants of independent sources by conditioning on intermediate states.

Crypto-Tailored Extractors • Fuzzy extractors • Noise tolerant [Dodis-Ostrovsky-Reyzin-Smith ‘04] • Correlation extractors • [Ishai-Kushilevitz-Ostrovsky-Sahai ‘09]. • Non-malleable extractors [Dodis-Wichs’09]

Privacy Amplification With Active Adversary public • Problem: Active adversary could change Y to Y’. Y Pick Shared secret = Ext(X,Y).

Active Adversary • Can arbitrarily insert, delete, modify, and reorder messages. • E.g., can run several rounds with one party before resuming execution with other party.

Non-Malleable Extractor[Dodis-Wichs 2009] • Strong extractor: (Ext(X,Y),Y) ≈ (U,Y). • nmExt is a non-malleable extractor if for arbitrary A:{0,1}d{0,1}d with y’ = A(y) ≠ y. (nmExt(X,Y),nmExt(X,Y’),Y) ≈ (U,nmExt(X,Y’),Y) • Can’t ignore a bit of the seed. • Existence: k > log log n + c, d = log n + O(1), m = (k-log d)/2.01. • Gives privacy amplification with active adversary in 2 rounds with optimal entropy loss.

Explicit Non-Malleable Extractor • Even k=n-1, m=1 nontrivial. • E.g., Ext(x,y) = x.y. X=0??...?, y’=A(y) flips first bit, x.y’= x.y. • Dodis-Li-Wooley-Z 2011: H∞ (X) > n/2. • Cohen-Raz-Segev 2012: Seed length O(log n). • Li 2012: H∞ (X) > .499n. • Connection with 2-source extractors.

A Simple 1-Bit Construction [Li] • Sidon set: set S with all s+t, s,t in S, distinct. • Example: S={(x,x3)|x in F2n/2}. • Thm [Li]: f(x,y) = x.y, y uniform from S, nonmalleable extractor for H∞ (X) > n/2. • Proof: H∞ (Y) = n/2, so X.Y ≈ U (Lindsey’s lemma). • Suffices to show X.Y+X.A(Y) ≈ U (XOR lemma). • X.Y+X.A(Y) = X.(Y+A(Y)). • H∞ (Y+A(Y)) = H∞ (Y) = n/2.

Conclusions Crypto • Interesting mathematics used in constructions: additive combinatorics, coding theory, random walks on expander graphs, hashing, … Expanders Coding Theory Extractors Inapproximability PRGs

Randomness Extraction: A Survey

Randomness Extraction: A Survey

Presentation Transcript

Statistics

Information Extraction and Integration: an Overview

Feature Extraction

Extraction Site Ridge Preservation

Toward Unified Models of Information Extraction and Data Mining

Explanation and Extraction of Orders

Liquid-Liquid Extraction

Information Extraction from Scientific Texts

Survey Experiments: Past, Present, Future

Information Extraction

Information Network Analysis and Extraction Extraction and Integration of the Semi-Structured Web

Information Extraction

Information Extraction

Extraction Metallurgy

Information Extraction

Relation Extraction and Machine Learning for IE

Presentation of the survey

The New Survey Process Quality Indicator Survey (QIS)

Feature Extraction for speech applications

Extraction Metallurgy of Copper

CERATOPS Center for Extraction and Summarization of Events and Opinions in Text