220 likes | 235 Views
This toolkit provides tools for modeling parametric fit problems, including unbinned maximum likelihood fits, PDF parameters, and toy Monte Carlo simulations. It also allows for the definition of custom PDFs and generators, as well as transformations of variables.
E N D
A Toolkit for the modeling ofMulti-parametric fit problems Luca Lista INFN Napoli
Motivation • Initially developed while rewriting a fortran fitter for BaBar analysis • Simultaneous estimate of: • B(BJ/) / B(BJ/K) • direct CP asymmetry • More control on the code was needed to justify a bias appeared in the original fitter • As much as possible of the code has to be under control and testable separately
Requirements • Provide Tools for modeling parametric fit problems • Unbinned Maximum Likelihood (UML[*]) fit of: • PDF parameters • Yields of different sub-samples • Both, mixed • 2 fits • Toy Monte Carlo to study the fit properties • Fitted parameter distributions • Pulls, Bias, Confidence level of fit results [*] not Unified Modeling Language … …
Design issues • Trying to optimize as much as possible the PDF code • Gets called a large number of times • Yes, it can be done in C++: • Addressed with Template Metaprogramming • No need to use virtual functions • The underlying minimization engine is Minuit, as always • Wrapped in different flavours (ROOT, …)
class PdfFlat { public: typedef double type; enum { variables = 1 }; PdfFlat( double a, double b ) : min( a ), max( b ) { } double operator()( type * v ) const { type x = *v; return ( x < min || x > max ? 0 : 1 / ( max - min ) ); } double min, max; }; class PdfPoissonian { public: typedef int type; enum { variables = 1 }; PdfPoissonian( double m ) : mean( m ) { } double operator()( type * v ) const { type n = *v; return ( exp( - mean ) * pow( mean, n ) / TMath::Factorial( n ) ); } double mean; }; PDF interface Variable type Returns P(x) Variable set Returns dP(x)/dx The user can define its own pdfs with the above interface
template< class Pdf, class Generator = RootRandom > class RandomGenerator { public: typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ); void generate( type * v ) const; }; template< class Generator > class RandomGenerator< PdfFlat, Generator > { public: typedef PdfFlat Pdf; typedef typename Pdf::type type; RandomGenerator( const Pdf& pdf ) : _min( pdf.min ), _max( pdf.max ) { } void generate( double * v ) const { v[ 0 ] = Generator::shootFlat( _min, _max ); } private: const double& _min, &_max; }; Random number generators Partial specialization Generic Random engine: Root, CLHEP, … The user can define its own generators with the preferred method
Combining PDFs template<class Pdf1, class Pdf2> class PdfIndependent2 { public: typedef typename Pdf1::type type; enum { variables = Pdf1::variables + Pdf2::variables }; PdfIndependent2( const Pdf1& pdf1, const Pdf2& pdf2 ) : _pdf1( pdf1 ), _pdf2( pdf2 ) { } double operator()( double * val ) const { return _pdf1( val ) * _pdf2( val + Pdf1::variables ); } private: const Pdf1 &_pdf1; const Pdf2 &_pdf2; }; template<class Pdf1, class Pdf2, class Pdf3> class PdfIndependent3 { ... }; template<class Pdf1, class Pdf2, class Pdf3, class Pdf4> class PdfIndependent4 { ... }; n1 + n2 variables Product of pdfs
Transformation of variables The Jacobian must be 1! template<class Pdf, class Transformation> class PdfTransformed { public: typedef typename Pdf::type type; enum { variables = Pdf::variables }; PdfTransformed( const Pdf& pdf, const Transformation& trans ) : _pdf( pdf ), _trans( trans ) { } double operator()( double * val ) const { double x[ variables ]; copy( val, val + variables, x ); _trans( x ); return _pdf( x ); } private: const Pdf &_pdf; Transformation _trans; }; typedef PdfIndependent2<PdfGaussian, PdfGaussian> PdfBasic; typedef PdfTransformed<PdfBasic, Rotation2D> Pdf; PdfGaussian g1( 0, 0.1 ), g2( 0, 1 ); Pdf pdf( PdfBasic( g1, g2 ), Rotation2D( M_PI / 4 ) ); Client code example
A data sample to be fitted template< int n, class type = double > class Sample { public: typedef std::vector<type *> container; typedef typename container::size_type size_type; typedef typename container::iterator iterator; typedef typename container::const_iterator const_iterator; ~Sample() { for( iterator i = begin(); i != end(); i ++ ) delete [] *i; } size_type size() const { return _v.size(); } const type* operator[]( int i ) const { return _v[ i ]; } iterator begin() { return _v.begin(); } iterator end() { return _v.end(); } const_iterator begin() const { return _v.begin(); } const_iterator end() const { return _v.end(); } type * extend() { _v.push_back( new type[ n ] ); return _v.back(); } private: container _v; }; Fixed number of variables Basically, a vector of double*
UML PDF Parameter fit const int sig = 100; double mean = 0, sigma = 1; PdfConstant p( sig ); // alternative: PdfPoissonian PdfGaussian q( mean, sigma ); Experiment<PdfConstant, PdfGaussian> experiment( p, q ); PdfGaussian pdf( mean, sigma ); Likelihood<PdfGaussian> like( pdf ); UMLParameterFitter<Likelihood<PdfGaussian> > fitter( like ); fitter.addParameter( "mean", & pdf.mean ); fitter.addParameter( "sigma", & pdf.sigma ); for ( int i = 0; i < 5000; i++ ) { Sample< 1 > sample; experiment.generate( sample ); double par[ 2 ] = { mean, sigma }, err[ 2 ] = { 1, 1 }, logLike; logLike = fitter.fit( sample.begin(), sample.end(), par, err ); double pullm = ( par[ 0 ] - mean ) / err[ 0 ]; double pulls = ( par[ 1 ] - sigma ) / err[ 1 ]; } Fixed PDF for MC generation Variable PDF For fitting Parameters linked to the fitter
Parameter fit Results (Pulls) There is a bias (as expected): 2 = 1/ni(xi-)2 1/n-1i(xi-)2
UML Yield fit const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; // alternative: PdfConstant Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfSig pdfSig( PdfGaussian( 0, 1 ), PdfGaussian( 0, 0.5 ) ); PdfBkg pdfBkg( PdfFlat( -5, 5 ), PdfFlat( -5, 5 ) ); typedef Experiment< Fluctuation, PdfSig > ToySig; typedef Experiment< Fluctuation, PdfBkg > ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2< ToySig, ToyBkg > toy( toySig, toyBkg ); typedef ExtendedLikelihood2< PdfSig, PdfBkg > Likelihood; Likelihood like( pdfSig, pdfBkg ); UMLYieldFitter< Likelihood > fitter( like ); for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg }, err[] = { 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); double pull1 = ( s[0] - sig ) / err[0] ), pull2 = ( ( s[1] - bkg ) / err[1] ); } In 2 dimensions: Flat background Gaussian signal
Yield fit Results (Pulls) <s> = 10 <b> = 5 Discrete structure because of low statistics Poisson fluctuation
const int sig = 10, bkg = 5; typedef PdfPoissonian Fluctuation; Fluctuation fluctuationSig( sig ), fluctuationBkg( bkg ); typedef PdfIndependent2< PdfGaussian, PdfGaussian > PdfSig; typedef PdfIndependent2< PdfFlat, PdfFlat > PdfBkg; PdfGaussian g1( 0, 1 ), g2( 0, 0.5 ); PdfFlat f1( -5, 5 ), f2( -5, 5 ); PdfSig pdfSig( g1, g2 ); PdfBkg pdfBkg( f1, f2 ); typedef Experiment<Fluctuation, PdfSig> ToySig; typedef Experiment<Fluctuation, PdfBkg> ToyBkg; ToySig toySig( fluctuationSig, pdfSig ); ToyBkg toyBkg( fluctuationBkg, pdfBkg ); Experiment2<ToySig, ToyBkg> toy( toySig, toyBkg ); typedef ExtendedLikelihood2<PdfSig, PdfBkg> Likelihood; PdfGaussian G1( 0, 1 ); PdfSig pdfSig1( G1, g2 ); Likelihood like( pdfSig1, pdfBkg ); UMLYieldAndParameterFitter<Likelihood> fitter( like ); fitter.addParameter( "mean", & G1.mean ); double pull1, pull2, pull3; for ( int i = 0; i < 5000; i++ ) { Sample< 2 > sample; toy.generate( sample ); double s[] = { sig, bkg, 0 }; double err[] = { 1, 1, 1 }; double logLike = fitter.fit( sample.begin(), sample.end(), s, err ); pull1 = ( s[ 0 ] - sig ) / err[ 0 ]; pull2 = ( s[ 1 ] - bkg ) / err[ 1 ]; pull3 = ( s[ 2 ] - 0 ) / err[ 2 ]; } Combined Yield and parameter fit
Combined fit Results (Pulls) <s> = 10 <b> = 5
Support for 2 fit class Line { public: Line( double A, double B ) : a( A ), b( B ) { } double operator()( double v ) const { return a + b * v; } double a, b; }; { Line line( 0, 1 ); Chi2<Line> chi2line( line, partition ); Chi2Fitter<Chi2<Line> > fitter1( chi2line ); fitter1.addParameter( "a", &line.a ); fitter1.addParameter( "b", &line.b ); Parabola para( 0, 1, 0 ); Chi2<Parabola> chi2para( para, partition ); Chi2Fitter<Chi2<Parabola> > fitter2( chi2para ); fitter2.addParameter( "a", ¶.a ); fitter2.addParameter( "b", ¶.b ); fitter2.addParameter( "c", ¶.c ); // ... } Still no support for Correlated errors!
Application to B(BJ/) / B(BJ/K) • Four variables: • B reconstructed mass • Beam - B energy in the mass hypothesis • Beam – B energy in the K mass hypothesis • B meson charge • Two samples: • J/ , J/ ee • Simultaneous fit of: • Total yield of BJ/, BJ/K and background • Charge asymmetry • Resolution and energy shitfs separately for J/ , J/ ee
B(BJ/) / B(BJ/K), cont. • “Peculiar” distribution of kinematical variables • Non trivial variable transformation to factorize the pdfs • Different samples factorize w.r.t. different variable combinations • Real experience: • Code much more manageable and under control • Different components are testable separately • Pdf, random generation • Different approaches to the fits • Getting confidence in the fit results require massive testing
Model for independent PDFs EK D D E
Dealing with kinematical pre-selection -120 MeV < E, EK < 120MeV B A C D A B C D The area is preserved after the trasformation
Extracting the signal B J/K B J/ J/y ee events Background Likelihood projection J/y mm events
Possible improvement • Tools for upper limit extraction based on Toy Monte Carlo • Adaptable code available (from B analysis) • Support for 2 fit with full covariance matrix • Provide more “standard” pdfs and random generators • Exponential, Argus, Crystal ball, … • Generic Hit-or-miss generator, etc. • Managing singular PDF • Delta-Dirac components • Factorize out dependence on ROOT, CLHEP, etc. • Done for random generators • Can be improved for Minuit interface • More thoughts about the interface • Is passing a double* as set of PDF variables suitable?