200 likes | 298 Views
Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors. Presenter: Lwando Kondlo Supervisor: Prof. C. Koen SKA Postgrad Bursary Conference December 5, 2009. Background. The model for variable X measured with error is
E N D
Estimation of Pareto Distribution Functions from Samples Contaminated by Measurement Errors Presenter: LwandoKondlo Supervisor: Prof. C. Koen SKA Postgrad Bursary Conference December 5, 2009
Background • The model for variable X measured with error is • Estimation of the density/distributionfunction of X is often important. • This is a classical deconvolution problem. • The specific case where X has a Pareto form is discussed.
Pareto distribution • Pareto distribution – model for positive data. • Example includes the • Distribution of income and wealth among individuals • Masses of molecular clouds, etc.
Pareto distribution • The Finite-Support Pareto distribution (FSPD) is
Pareto distribution • Distributional parameters are estimated by fitting the FSPD to a set of data. • This is not appropriate if the data are contaminated by errors
Objective • To develop methodology for deconvolution when X is known to be of Pareto form. • Apply the methodology to the real (radio astronomical) data.
Convolution • If X has the PDF g(.) andhas the PDF h(.). Then Y has the PDF • Then the convolved PDF (CPDF)
Convolution • The CPDF could differ substantially from FSPD. • Probability-Probability plots (compares observed and theoretical distribution functions) can be used.
Simulation Study • Simulated data with • are used.
Effects • The contaminated data extend beyond the interval [L,U] over which the error-free data occur • The shape of the distribution is changed • This will lead to biased estimates of L, U and power-law exponent a.
Maximum Likelihood Estimation • Based on maximising the likelihood (or log-likelihood) of the observed data given the model. • Log-likelihood of CPDF
Application • Application to the data in the histogram leads • N.B: CPDF fitted to the data with errors gives favourable MLEs with true parameter values 3; 6 and 1.5.
Application • The methodology is illustrated by fitting CPDF to a sample of giant molecular clouds masses in the galaxy M33 (Engargiola et. al., 2003).
Results The unit mass is solar masses. Good agreement with the Engargiola et al (2003) estimates. More especially a = 1.6 +/- 0.3.
Probability-probability plot • The linear form of the P-P plot indicates that the estimated distribution fits the sample of giant molecular clouds very well.
CONCLUSION • Deconvolution is a useful statistical method for recovering an unknown distribution of X in the presence of errors. • The methodology for deconvolution when X is known to be of Pareto form is developed • Satisfactory results were found by MLE method. • The price paid is that the analysis is more complicated
Acknowledgements • Everyone contributed to the work presented. • Prof. C. Koen (Supervisor) • Funding: SKA SA (Kim, Anna and Daphne) • University of the Western Cape (Leslie and Rennet)
Thank You Ndiyabonga Obrigado Merci