1 / 10

Figure 1 – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006

AMERICAN ASTRONOMICAL SOCIETY Continuous Probability Distribution as an Alternative to Binning of Survey Data JANUARY 6, 2010 David J. Corliss. 16. 14. 12. 10. 8. 6. 4. 2. 0. 30 - 35,000 K. < 30,000 K. 35 - 40,000 K. 40 - 45,000 K. > 45,000 K. A Typical Example of Binned Data.

arien
Download Presentation

Figure 1 – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMERICAN ASTRONOMICAL SOCIETYContinuous Probability Distribution as an Alternative to Binning of Survey DataJANUARY 6, 2010David J. Corliss

  2. 16 14 12 10 8 6 4 2 0 30 - 35,000 K < 30,000 K 35 - 40,000 K 40 - 45,000 K > 45,000 K A Typical Example of Binned Data Population of Hot DB White Dwarfs in the Sloan Digital Sky Survey Figure 1 – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006

  3. Some Amount of Information if Lost as All Points in a Given Bin Are Treated the Same There is Also Some Uncertainty as to Which Bin a Given Point Belongs LOWER DB GAP MIDDLE DB GAP UPPER DB GAP Figure 2A – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006b

  4. Kernel Density Estimate (KDE) Process: Represent Each Point as a Normal and Sum Figure 2B – Population Distribution of hot DB white dwarfs described by Eisenstein et al. 2006 b

  5. Summary and Conclusions: Kernel Density Estimation • Creates a Continuous Probability Density Distribution • by summing over Gaussian Distributions for Each • Data Point, Where μ is the Observed Value and σ is the • σ of the Individual Measurement. • Prevents Loss of Information From Relatively • Accurate Measurements Being Placed into Larger Bins • Incorporates the Uncertainty Associated with • Measured Values into Population Distributions • Provides a Viable Alternative to Binning in Developing • Population Distributions for Survey and Other Data

  6. References Babu, G. Jogesh, Summer School in Statistics for Astronomers V lecture Notes, Pennsylvania State University 2009 Barnes, George R., Cerrito, Patricia B., The Visualization of Continuous Data Using PROC KDE and PROC CAPABILITY , SUGI, 26, 2001 Corliss, David J., MS Thesis, Wayne State University, 2008 Eisenstein, D.J., et al., 2006, ApJS, 167, 40 (Eisenstein et al. 2006a) Eisenstein, D.J., et al., 2006, ApJ, 132, 676 (Eisenstein et al. 2006b) Sall, John – Personal Communication re. the SAS KDE Procedure

  7. A Final Thought - “Essentially, all models are wrong, but some are useful.” George E. P. Box (Norman R. Draper (1987). Empirical Model-Building and Response Surfaces, p. 424, Wiley.)

  8. libname project 'C:\SAS\Conferences'; data work.kde; input month 4.0 day 4.0 year 4.0 volume 8.0; cards; 1 1 1962 589 2 1 1962 561 3 1 1962 640 4 1 1962 656 5 1 1962 727 6 1 1962 697 7 1 1962 640 8 1 1962 599 run; DATA WORK.TSERIES; SET WORK.CRYER; IF MONTH = 1; DUMMY = 1; ATTRIB T INFORMAT=8.0 FORMAT=8.0; T = YEAR; ATTRIB Y INFORMAT=8.0 FORMAT=8.1; Y = VOLUME; RUN; PROC MEANS DATA=WORK.TSERIES NOPRINT; VAR VOLUME; OUTPUT OUT=WORK.RANDOM_TERM; RUN; %GLOBAL LAMBDA SIGMA; %MACRO ASSIGNMENT; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = MEAN; %LET LAMBDA = VOLUME; RUN; DATA _NULL_; SET WORK.RANDOM_TERM; IF _STAT_ = STD; %LET SIGMA = VOLUME; RUN; %ASSIGNMENT; %PUT LAMBDA = &LAMBDA.; DATA WORK.TEST; SET WORK.TSERIES; LAMBDA = &LAMBDA.; SIGMA = &SIGMA.; RUN;

  9. %MACRO AC(N); PROC SORT DATA=WORK.TSERIES; BY DUMMY; RUN; DATA WORK.LAST; SET WORK.TSERIES; BY DUMMY; IF LAST.DUMMY; RECENT = _N_ - &N. + 1; KEEP DUMMY RECENT; RUN; DATA WORK.RECENT; MERGE WORK.TSERIES WORK.LAST; BY DUMMY; IF _N_ GE RECENT; DROP RECENT; RUN; PROC REG DATA=WORK.RECENT NOPRINT; MODEL Y=T; OUTPUT OUT=WORK.TREND PREDICTED=FORECAST RESIDUAL=RESIDUAL; RUN; DATA WORK.TREND; SET WORK.TREND; OUTPUT; T_PREVIOUS = T; Y_PREVIOUS = FORECAST + RAND(SIGMA,LAMBDA); RETAIN T_PREVIOUS Y_PREVIOUS; RUN; DATA WORK.NEW; SET WORK.TREND; BY DUMMY; IF LAST.DUMMY; DELTA_T = T - T_PREVIOUS; T = T + DELTA_T; DELTA_Y = Y - Y_PREVIOUS + 1; Y = Y + DELTA_Y; KEEP T Y DUMMY; RUN; DATA WORK.TSERIES; SET WORK.TSERIES WORK.NEW; RUN; %MEND AC; %AC(5);

More Related