Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors

Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors Presenter: LwandoKondlo Supervisor: Prof. C. Koen SKA Postgrad Bursary Conference December 5, 2009

Background • The model for variable X measured with error is • Estimation of the density/distributionfunction of X is often important. • This is a classical deconvolution problem. • The specific case where X has a Pareto form is discussed.

Pareto distribution • Pareto distribution – model for positive data. • Example includes the • Distribution of income and wealth among individuals • Masses of molecular clouds, etc.

Pareto distribution • The Finite-Support Pareto distribution (FSPD) is

Pareto distribution • Distributional parameters are estimated by fitting the FSPD to a set of data. • This is not appropriate if the data are contaminated by errors

Objective • To develop methodology for deconvolution when X is known to be of Pareto form. • Apply the methodology to the real (radio astronomical) data.

Convolution • If X has the PDF g(.) andhas the PDF h(.). Then Y has the PDF • Then the convolved PDF (CPDF)

Convolution • The CPDF could differ substantially from FSPD. • Probability-Probability plots (compares observed and theoretical distribution functions) can be used.

Simulation Study • Simulated data with • are used.

Histograms

Effects • The contaminated data extend beyond the interval [L,U] over which the error-free data occur • The shape of the distribution is changed • This will lead to biased estimates of L, U and power-law exponent a.

Maximum Likelihood Estimation • Based on maximising the likelihood (or log-likelihood) of the observed data given the model. • Log-likelihood of CPDF

Application • Application to the data in the histogram leads • N.B: CPDF fitted to the data with errors gives favourable MLEs with true parameter values 3; 6 and 1.5.

Application • The methodology is illustrated by fitting CPDF to a sample of giant molecular clouds masses in the galaxy M33 (Engargiola et. al., 2003).

Molecular Clouds Fukui et. al. 2008

Results The unit mass is solar masses. Good agreement with the Engargiola et al (2003) estimates. More especially a = 1.6 +/- 0.3.

Probability-probability plot • The linear form of the P-P plot indicates that the estimated distribution fits the sample of giant molecular clouds very well.

CONCLUSION • Deconvolution is a useful statistical method for recovering an unknown distribution of X in the presence of errors. • The methodology for deconvolution when X is known to be of Pareto form is developed • Satisfactory results were found by MLE method. • The price paid is that the analysis is more complicated

Acknowledgements • Everyone contributed to the work presented. • Prof. C. Koen (Supervisor) • Funding: SKA SA (Kim, Anna and Daphne) • University of the Western Cape (Leslie and Rennet)

Thank You Ndiyabonga Obrigado Merci

Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors

Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors

Presentation Transcript

Measurement Errors

Chapter 2: Measurement Errors

Estimation of Distribution Algorithms

Discrete Distribution Functions

4. MEASUREMENT ERRORS

Errors in Measurement

Parton Distribution Functions from Global Fits

Errors in Measurement

Estimation of Distribution Algorithms (EDA)

§3.5 Distribution of Special Functions

Data Errors, Model Errors, and Estimation Errors

Formant Measurement Errors From Real Speech

Parton Distribution Functions

Errors in Measurement

Parton Distribution Functions

Bootstrap Estimation of Disease Incidence Proportion with Measurement Errors

Measurement of polarized distribution functions at HERMES

Data Errors, Model Errors, and Estimation Errors