630 likes | 747 Views
Theory of Differentiation in Statistics. Mohammed Nasser Department of Statistics. Relation between Statistics and Differentiation. Relation between Statistics and Differentiation. Relation between Statistics and Differentiation. Monotone Function. f(x). Monotone Decreasing.
E N D
Theory of Differentiation in Statistics Mohammed Nasser Department of Statistics
Monotone Function f(x) Monotone Decreasing Monotone Increasing Non Increasing Strictly Increasing Strictly Decreasing Non Decreasing
Maximum/Minimum a b Is there any sufficient condition that guarantees existence of global max/global min/both?
Some Results to Mention If the function is continuous and its domain is compact, the function attains its extremum It’s a very general result It holds for any compact space other compact set of Rn. Any convex ( concave) function attains its global min ( max). Without satisfying any of the above conditions some functions may have global min ( max). Calculation of extremum Firstly, proof of existence of extremum Then
What Does Say about f Fermat’s Theorem: if f has local maximum or minimum at c, and if exist, then but converse is not true
Concavity • If for all x in (a,b), then the graph of f concave on (a,b). • If for all x in (a,b), then the graph of f concave on (a,b). • If then f has a point of inflection at c. Concave Convex c Point of inflection
Maximum/Minimum Let f(x) be a differential function on an interval I • f is maximum at • f is maximum at • If for all x in an interval, then f is maximum at first end point of the interval if left side is closed and minimum at last end point if right side is closed. • If for all x in an interval, then f is minimum at first end point of the interval if left side is closed and maximum at last end point if right side is closed.
Normal Distribution The probability density function is given as, point of inflection • continuous on R • f(x)>=0 • Differentiable on R Concave Convex Convex
Normal Distribution Now, Take log both side Put first derivative equal to zero
Normal Distribution Therefore f is maximum at
Normal Distribution Put 2nd derivative equal to zero Therefore f has point of inflection at
Logistic Distribution The distribution function is defined as, Concave Convex
Logistic Distribution Take first derivative with respect to x Therefore F is strictly increasing Take2nd derivative and put equal to zero Therefore F has a point of inflection at x=0
Logistic Distribution Now we comment that F has no maximum and minimum. Since, Therefore F is convex on and concave on
Variance of a Function of Poisson Variate Using Taylor’s Theorem We know that, We are interested to find the Variance of
Variance of a Function of Poisson Variate Using Taylor’s Theorem The Taylor’s series is defined as, Therefore the variance of
Risk Functional Risk functional, RL,P(g)= Population Regression functional /classifier, g* • P is chosen by nature , L is chosen by the scientist • Both RL,P(g*) and g* are uknown • From sample D, we will select gD by a learning method(???)
Empirical risk minimization Empirical Risk functional, = • Problems of empirical risk minimization
What Can We Do? • We can restrict the set of functions over which we minimize empirical risk functionals • modify the criterion to be minimized (e.g. adding a penalty for `complicated‘ functions). We can combine two. Regularization Structural risk Minimization
In linear regression, we minimize the error function: Regularized Error Function Replace the quadratic error function by Є-insensitive error function: An example of Є-insensitive error function:
Linear SVR: Derivation Meaning of equation 3
Linear SVR: Derivation vs. ● ● Complexity Sum of errors ● ● ● ● ● Case I: “tube” complexity Case II: “tube” complexity
C is big ● ● ● ● ● ● ● Linear SVR: Derivation C is small • The role of C ● ● ● ● ● ● ● Case I: “tube” complexity Case II: “tube” complexity
Linear SVR: derivation ● ● ● ● Subject to: ● ● ●
Lagrangian Minimize: Dual var. αn,αn*,μn,μ*n >=0 f(x)=<w,x>=
Dual Form of Lagrangian Maximize: Prediction can be made using: ???
How to determine b? Karush-Kuhn-Tucker (KKT) conditions implies( at the optimal solutions: Support vectors are points that lie on the boundary or outside the tube These equations implies many important things.
Dual Form of Lagrangian(Nonlinear case) Maximize: Prediction can be made using:
Non-linear SVR: derivation Subject to:
Non-linear SVR: derivation Subject to: Saddle point of L has to be found: min with respect to max with respect to
What is Differentiation? f,a nonlinear function V, Another B-space U A Banach Space Differentiation is nothing but local linearization In differentiation we approximate a non-linear function locally by a (continuous) linear function
Fréchet Derivative Definition 1 It can be easily generalized to Banach space valued function, f: is a linear map. It can be shown,. every linear map between infinite-dimensional spaces is not always continuous.
Frécehet Derivative We have just mentioned that Fréchet recognized , the definition 1 could be easily generalized to normed spaces in the following way: Where and the set of all continuous linear functions between B1and B2 If we write, the remainder of f at x+h, ; Rem(x+h)= f(x+h)-f(x)-df(x)(h)
S Derivative Then 2 becomes Soon the definition is generalized (S-differentiation ) in general topological vector spaces in such a way ; i) a particular case of the definition becomes equivalent to the previous definition when , domain of f is a normed space, ii) Gateaux derivative remains the weakest derivative in all types of S-differentiation.
S Derivatives Definition 2 Let S be a collection of subsets of B1 , let t R. Then f is S-differentiable at x with derivative df(x) if Definition 3 When S= all singletons of B1, f is called Gâteaux differentiable with Gâteaux derivative . When S= all compact subsets of B1, f is called Hadamard or compactly differentiable with Hadamard or compact derivative . When S= all bounded subsets of B1, f is called or boundedly differentiable with or bounded derivative .
Equivalent Definitions of Fréchet derivative (a) For each bounded set, as in R, uniformly (b) For each sequence, and each sequence
(c) (d) Uniformly in (e) Uniformly in Statisticians generally uses this form or its some slight modification
Relations among Usual Forms of Definitions • Set of Gateaux differentiable function at set of Hadamad differentiable function at set Frechet differentiable function x. In application to find Frechet or Hadamard derivative generally we shout try first to determine the form of derivative deducing Gateaux derivative acting on h,df(h) for a collection of directions h which span B1. This reduces to computing the ordinary derivative (with respect to R) of the mapping which is much related to influence function, one of the central concepts in robust statistics. It can be easily shown that, • When B1=R with usual norm, they will three coincide • When B1, a finite dimensional Banach space, Frechet and Hadamard derivative are equal. The two coincide with familiar total derivative.
Properties of Fréchet derivative • Hadamard diff. implies continuity but Gâteaux does not. • Hadamard diff. satisfies chain rule but Gâteaux does not. • Meaningful Mean Value Theorem, Inverse Function Theorem, Taylor’s Theorem and Implicit Function Theorem have been proved for Fréchet derivative
Lebesgue Counting
Mathematical Foundations of Robust Statistics d 1(F,G) <δ d 2(T(F),T(G)) <ε T(G)≈T(F)+ (T(G)-T(F))≈