250 likes | 274 Views
Explore the duality in information geometry through reference and representational aspects, including Bregman divergence, generalized metrics, and connections in Banach space. Learn about canonical divergence, statistical manifold structures, and a-Hessian geometry. Understand the extension to infinite-dimensional function space and monotone scaling. Discover examples of D(a) divergences, monotone embeddings, and submanifolds under convex functions.
E N D
Information Geometry: Duality, Convexity, and Divergences Jun Zhang* University of Michigan Ann Arbor, Michigan 48104 junz@umich.edu *Currently on leave to AFOSR under IPA
Clarify two senses of duality in information geometry: Reference duality: choice of the reference vs comparison point on the manifold; Representational duality: choice of a monotonic scaling of density function; Lecture Plan • A revisit to Bregman divergence • Generalization (a-divergence on Rn) and a-Hessian geometry 3) Embedding into infinite-dimensional function space 4) Generalized Fish metric and a-connection on Banach space
Bregman Divergence i) Quadri-lateral relation: Triangular relation (generalized cosine) as a special case: ii) Reference-representation biduality:
Canonical Divergence and Fenchel Inequality An alternative expression of Bregman divergence is canonical divergence or explicitly: That A is non-negative is a direct consequence of the Fenchel inequality for a strictly convex function: where equality holds if and only if
Convex Inequality and a-Divergence Induced by it By the definition of a strictly convex function F, It is easy to show that the following is non-negative for all , Conjugate-symmetry: Easily verifiable:
Proposition: For a smooth function F:Rn -> R, the following are equivalent: Significance of Bregman Divergence Among a-Divergence Family
Statistical Manifold Structure Induced From Divergence Function (Eguchi, 1983) Given a divergence D(x,y), with D(x,x)=0. One can then derive the Riemannian metric and a pair of conjugate connections: Expanding D(x,y) around x=y: In essence, is satisfied by such identification of derivatives of D. i) 2nd order: one (and the same) metric ii) 3rd order: a pair of conjugated connections
i) The metric and conjugate affine connections are given by: ii) Riemann curvature is given by: a-Hessian Geometry (of Finite-Dimension Vector Space) Theorem. D(a) induces the a-Hessian manifold, i.e.
iii) The manifold is equi-affine, with the Tchebychev potential given by: and a-parallel volume form given by iv) There exists biorthogonal coordinates: with
A General Divergence Function(al) From Vector Space to Function Space Question: How to extend the above analysis to infinite-dimensional function space? for any two functions in some function space, and an arbitrary, strictly increasing function . Remark: Induced by convex inequality
A Special Case of D(a): Classic a-Divergence For parameterized pdf’s, such divergence induces an a-independent metric, but a-dependent dual connections:
Other Examples ofD(a) Jensen Difference U-Divergence (a=1)
A Short Detour: Monotone Scaling Define monotone embedding (“scaling”) of a measurable function p as the transformation r(p), where is a strictly monotone function. Therefore, monotone embeddings of a given probability density function form a group, with functional composition as group operation: Observe: i) r is strictly monotone iff r-1 is strictly monotone; ii) r(t) = t as the identity element; We recall that for a strictly convex function f : iii) r1, r2 are strictly monotone, so is
DEFINITION: r-embedding is said to be conjugated to t-embedding with respect to a strictly convex function f (whose conjugate is f*) if : Example: a-embedding
A sub-manifold is said to be r-affine if there exists a countable set of linearly independent functions li(z) over a measurable space such that: Here, q is called the “natural parameter”. The “expectation parameter” is defined by projecting the conjugated t-embedding onto the li(z): Example: For log-linear model (exponential family) The expectation parameter is: Parameterized Functions as Forming a Submanifold under Monotone Scaling
i) The following potential function is strictly convex: F(q) is called the generating (partition) functional. ii) Define, under the conjugate representations then is Fenchel conjugate of . F*(h) is called the generalized entropy functional. Proposition. For the r-affine submanifold: Theorem. The r-affine submanifoldis a-Hessian manifold.
An Application: the (a,b)-Divergence Take f=r-(b), where: called “alpha-embedding”, now denoted by b. a: parameter reflecting reference duality b: parameter reflecting representation duality They reduce to a-divergence proper A(a) and to Jensen difference E(a):
Proposition 1. Denote tangent vector fields which are, at given p on the manifold, themselves functions in Banach space. The metric and dual connections induced by take the forms: Written in dually symmetric form: Information Geometry on Banach Space
Corollary 1a. For a finite-dimensional submanifold (parametric model), with The metric and dual connections associated with are given by: with Remark: Choosing reduces to the forms of Fisher metric and the a-connections in classical parametric information geometry, where
Remark: The ambient space B is flat, so it embeds, as proper submanifolds, • the manifold Mmof probability density functions (constrained to be • positive-valued and normalized to unit measure); • the finite-dimensional manifold Mqof parameterized probability models. Mq Mm B(ambient manifold) Proposition 2. The curvature R(a) and torsion tensors T(a)associated with any a-connection on the infinite-dimensional function space Bare identically zero. CAVEAT: Topology? (G. Pistone and his colleagues)
Proposition 3. The (a,b)-divergence for the parametric models gives rise to the Fisher metric proper and alpha-connections proper: Remark: The (a,b)-divergence is the homogeneous f-divergence As such, it should reproduce the standard Fisher metric and the dual alpha- connections in their proper form. Again, it is the ab that takes the role of the conventional “alpha” parameter.
Summary of Current Approach Divergence a-divergence equiv to d-divergence (Zhu & Rohwer, 1985) includes KL divergence as a special case f-divergence (Csiszar) Bregman divergence equivalent to the canonical divergence U-divergence (Eguchi) Geometry Riemannian metric Fisher information Conjugate connections a-connection family Equi-affine structure cubic form, Tchebychev 1-form Curvature Convex-based a-divergence for vector space of finite dim function space of infinite dim Generalized expressions of Fisher metric a-connections
References Zhang, J. (2004). Divergence function, duality, and convex analysis. Neural Computation, 16: 159-195. Zhang, J. (2005) Referential duality and representational duality in the scaling of multidimensional and infinite-dimensional stimulus space. In Dzhafarov, E. and Colonius, H. (Eds.) Measurement and representation of sensations: Recent progress in psychological theory. Lawrence Erlbaum Associates, Mahwah, NJ. Zhang, J. and Hasto, P. (2006) Statistical manifold as an affine space: A functional equation approach. Journal of Mathematical Psychology, 50: 60-65. Zhang, J. (2006). Referential duality and representational duality on statistical manifolds. Proceedings of the Second International Symposium on Information Geometry and Its Applications, Tokyo (pp 58-67). Zhang J. (2007). A note on curvature of a-connections of a statistical manifold. Annals of the Institute of Statistical Mathematics. 59, 161-170. Zhang, J. and Matsuzuo, H. (in press). Dualistic differential geometry associated with a convex function. To appear in a special volume in the Springer series of Advances in Mechanics and Mathematics. Zhang, J. (under review) Nonparametric information geometry: Referential duality and representational duality on statistical manifolds.