1 / 134

Understanding Validation and Hypothesis Testing in Statistics

Explore the concept of validity and ground truth in statistics, learn about different types of validity, and mechanisms for assessing validity. Recommended readings provided for in-depth knowledge.

gcedeno
Download Presentation

Understanding Validation and Hypothesis Testing in Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Week 4. Validation and hypothesistesting MScMethodologySeminar I Felipe Orihuela-Espina

  2. VALIDATION INAOE

  3. Contents • Ground truth • Goldstandard • Types of validity • Mechanismsforshowingvalidity INAOE

  4. Recommendedreadings • ¡Wikipedia! • http://en.wikipedia.org/wiki/Validity_%28statistics%29 • For a gentleoverview. • A bit more formal: • Cronbach, L. J.; Meehl, P. E. (1955). "Constructvalidity in psychologicaltests". PsychologicalBulletin 52 (4): 281–302 • >6000 citas • Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. • http://www.socialresearchmethods.net/kb/introval.php INAOE

  5. Validity and validation • Validity and validation: • Validityisthedegreetowhich a certainmeasurementormodelrepresentswhatitissupposedtorepresent; itsequivalent in the real worldor in nature. [selfdefinition] • Validityisthebestapproximationpossibletothetruth of a givenproposition, inferenceorconclusion [http://www.socialresearchmethods.net] • Validationistheprocessormechanismforassessingvalidity. INAOE

  6. Validity and validation • Validity and validation: • Strictly; “Measures, samples and designs don't 'have' validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can 'have' validity.” • [http://www.socialresearchmethods.net/kb/introval.php] • Yet, forthisunit I willabstractmyself of thisprecision and willtalkaboutvalidity as correspondingtomeasurementsormodels. INAOE

  7. GROUND TRUTH INAOE

  8. Groundtruth • Thegroundtruthreferstothe real truth of the observable phenomenonortherealityoccuring in nature. [Selfdefinition] • Somealternativedefinitions: • “theclassificationtruth of eachvoxel” [Zou KH et al (2002), MICCAI, LNCS 2488, pp. 315–322,], • …thatis, thetruth in each experimental unitfortheeffects of classification (positive ornegative). • Thisisparticularlynicefor machine learning. • “themostusefulrepresentation of theinformationthatonewishestoconvey”. [Kolaczyk ED (2009) Statistical Analysis of Network Data: Methods and Models, Springer, pg.76] • I likethisonebutrequirestheknowledge of theconstruct (i.e.that “informationthatonewishestoconvey”) INAOE

  9. Groundtruth • Some more alternativedefinitions: • “información collectedonlocation” [http://en.wikipedia.org/wiki/Ground_truth] • Personally I do notlikethisdefinition as itassumesthatwhatyoumeasure in thefield (“onlocation”) ismeasuredwithoutbias, and ifthereisbias, thenthatwouldimplythatthegroundtruthdoesincorporatethosemeasurementerrors • …plus, itdoesnotallowroomforsyntheticdatasets. • Fordifferentreasons, butotherauthorsalso share someconcernsaboutthisdefinition: • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: “madebyhand” herereferstotheinformationcollectedbytheresearcheronlocation; nottiosynthetic data • “The usual way to obtain the ground truth is fragile, inefficient and not directly comparable from one’s work to another” [Canini et al 2009, TMA] INAOE

  10. Groundtruth • Evensome more alternativedefinitions : • “ancillary data” [https://www.fas.org/irp/imint/docs/rst/Sect13/Sect13_1.html] • BesttranslationtoSpanish of theword “ancillary” is auxiliar, subordinado or secundario • …I cannotbe more in disagreementwiththisdefinition… • “a representation of the agreed correct result of the ideal layout analysis method (i.e. the result of the method that, if existed, would put an end to the research problem).” [Antonacopoulos et al (2006) Document Analysis Systems, 11pgs] • Thisdefinition has 2 clearweaknesses: • Assumesthattruthdependsonwhattheannotatorsthink (in total oppositiontoobjectivism). • Forinstance; if a fewcliniciansagreedthat a womanisnotpregnant, thensheisnotpregnantdespitetheobviousfactthatthisdoesnotdependontheiropinion. • Assumesthateverymethod has itsowngroundtruth, whichforobviousreasonsdoesnothold… INAOE

  11. Groundtruth • Yet a few more alternativedefinitions: • “the actual facts of a situation and is used to determine, with certainty, whether information is accurate” [Vrij 2000 en Toma et al Pers Soc Psychol Bull 2008; 34; 1023] • Hot! Bestto my taste, and ifI’mallowedcloseto my owndefinition INAOE

  12. Groundtruth • Regardless of thedefinition: theavailability of a groundtruthpermitsobjectiveevaluation of criteria. • Ifitisavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegroundtruth (groundtruthapproach). • Withsynthetic data, groundtruthisalwaysavailable. INAOE

  13. Groundtruth • Problem: • With real data, thegroundtruthisnotalwaysavailable, and oftenitsacquisitionisfarfrom trivial. • Severalmethodshavebeendevelopedforestimatingthegroundtruthwhenitisnotavailable: • Beware! No matterhowgoodisyourestimationitis no longerthegroundtruth. • A fewmethodsforestimatingthegroundtruth [LiX 2010]: • Votingbyseveralannotatorsorjudges. • Often, experts in thefields • Maximum posterior probability • Minimization of variance INAOE

  14. Groundtruth • Syntheticgroundtruth: • Althoughbeingabletogeneratesyntheticgroundtruthmeansthatyou are truly in posession of thegroundtruth, thisisnotexempt of problems; • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: Heretheauthors are referringstrictlytoinformationcollectedonlocationby a researcher INAOE

  15. GOLD Standard INAOE

  16. Goldstandard • Thegoldstandardrefersto a test, metric, ormodelthatiswidelyacceptedbythecommunity as a currentlyvalidrepresentation of thereality. [Selfdefinition] • Otherdefinitions: • “another measure that has been used and accepted in the field” [Young et al 1995, Arch Phys Med Rehabil 76:913-918] • Relativelycloseto mine. INAOE

  17. Goldstandard • More alternativedefinitions : • “a relatively irrefutable standard that constitutes recognized and accepted evidence that a certain disease exists.” [Brown et al 1996, NEJM 335(14):1049-1053] • ¡Verygood! ...shouldn’titwasbecausethey use theword “standard” to define “standard”. • …yet, itis NEJM! • “a benchmark that is regarded as definitive” [Noel et al 2009, ATM] • No goldstandardisdefinitive; onlyuntilweget a more accurateone… INAOE

  18. Goldstandard • A few more alternativedefinitions: • “the best available method, offering accuracy, reproducibility, feasibility and a justifiable cost-benefit interrelation” [Mariath et al (2007), JAOS 15(6):529-33] • I miss theneedtocommunityconsensus • Also, I do notundestandverywellwhyincludingthecost-efficiencyrelation. • “based on the judgments of expert PIs and represent the carrier’s definition of what constitutes acceptable/unacceptable aircrew performance” [Baker y Dismukes (2003), NASA, NASA/TM—2003–212809] • A bit limitedtothecontext, butreadingbetweenlines, a verygooddefinition. INAOE

  19. Goldstandard • Yet a few more definitions: • “the most accurate method, procedure, or measurement that is known to represent the true value of what is being tested.” [Hudson, MSc thesis, Texas A&M University, pg 5] • Verycloseto mine. (NOTE: I gave mine wellbeforeknowingthisone…) • “a diagnostic test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best possible test for the condition in absolute terms.” [http://en.wikipedia.org/wiki/Gold_standard_(test)] • I miss theneedtocommunityconsensus. INAOE

  20. Goldstandard • Whengroundtruthisnotavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegoldstandard (goldstandardapproach). • Thisoften links toconvergentvalidity and to a extentnomologicalvalidity • Note thatyou compare yourselfnotagainstthestate of the art, butagainstwhatitisacceptedbythecommunity. • Problem: Still, thereisn’t a goldstandardalwaysacceptedbythecommunity. INAOE

  21. TYPES OF VALIDITY INAOE

  22. Types of validity • Types of validity: • Constructvalidity • Convergentvalidity • Criterionvalidity • Concurrentvalidity • Predictive o empiricalvalidity • Discriminantvalidity • Content validity • Facevalidity • Representationvalidity • Intrinsicvalidity • Internalvalidity • Externalvalidity • Logicalvalidity • Statisticalconclusionvalidity • Ecologicalvalidity • Diagnosticvalidity • Nomologicalvalidity INAOE

  23. Types of validity Fidelitytothephenomenon Representative of thepopulation Fidelitytoothermetrics INAOE

  24. Types of validity Figura de: [http://www.socialresearchmethods.net/kb/introval.php] INAOE

  25. Types of validity Figura de: [www.analytictech.com] INAOE

  26. Types of validity • Constructvalidity: • A constructis “somepostulatedattribute of peopleassumedtobereflected in the test performance” [Cronbach y Meehl, 1955]. • Cronbach and Meehl seminal paperwaswrittenforpsychology, so whentheyspeak of: • people, theyrefertothephenomenonunderstudy. • Test performance, theyreferto a modelormetriccapturingsomeaspect of thephenomenon • Thedevelopment of theconstructisnoteasy. Toonarrowortoobroad and itmaynullifyyourexperiment. INAOE

  27. Types of validity • Constructvalidity: • Construct: • Example: Researchprocess and validationaccordingtopositivistresearchtradition Figure from: [Johnston y Smith http://epress.anu.edu.au/apps/bookworm/view/ Information+Systems+Foundations%3A+The+Role+of+Design+Science/5131/ch02.xhtml] INAOE

  28. Types of validity • Constructvalidity: • “…constructvalidity […] perceived as themost fundamental and embracing of alltypes of validity” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Capacity of a metricormodeltomeasureorrepresentfaithfullythephenomenonunderstudy and itslegitimateinferences. • In otherwords, thatyou are measuringormodellingwhatyoushouldbemeasuringormodelling, and thatyou are free of bias (seealsointernalvalidity) • Itbacomesspeciallycriticalwhen a concomitantcriterionoruniverse of contentislacking. [Cronbach y Meehl, 1955] INAOE

  29. Types of validity • Constructvalidity: • Constructvalidityshouldbeequivalentto: “Tellthetruth, allthetruth, and nothingbutthetruth” • [http://www.socialresearchmethods.net/kb/considea.php] • Wewantourmetricormodeltorepresent “theconstruct, alltheconstruct [contentvalidity], and nothingbuttheconstruct” • [http://www.socialresearchmethods.net/kb/considea.php] INAOE

  30. Types of validity • Constructvalidity: • Constructvalidityis a process, not a method [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Itrequiresmanylines of evidence • Itcannotbeexpressedby a single figure orcoefficient • Itrequiresbothquantitative and qualitativeevidence INAOE

  31. Types of validity Figure from: [http://www.socialresearchmethods.net/kb/considea.php] INAOE

  32. Types of validity • Constructvalidity: • Constructvalidityiswhatallowsustobuild a universal truth. Consequentlyit can becompromisedbymanyfactors: • Inadequateorambiguousdefinition of theconstruct • Alteredbehaviour of theconstruct (e.g.appearance of new treatments) • Biasincludingthat of theresearcher • Confusionorcontaminationby non-controlledorlatent variables • Confusionorcontaminationby factor interaction (non-maineffects) • Confusionorcontaminationbyotherconstructs • Breaching of theblindingbyhypothesisguessingfrompatientorco-researchers • Mono-operationbias (thatismeasuring a single dependent variable) • A single dimensionwillbeinsufficienttoexpresstheconstruct • Ofuscation/apprehensionbyevaluation (changes un the responses of thephenomenonjustbecauseitisbeingmeasured/observed) • Remember; youcan’tmeasure a phenomenonwithoutdistortingit! INAOE

  33. Types of validity INAOE Figure from: [http://cmapspublic.ihmc.us/rid=1148264198734_1533533750_4916/Validez%20de%20Constructo.cmap]

  34. Types of validity • Convergentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Watchout! Theremaybedifferentconstructsfromthesamephenomenon, and thusthemodelmaybeexplainingjustslightlydifferentthings • Itis a subtype of constructvalidity • …but note howitis similar tocriterionvalidity. • Allmetrics/models of thesamephenomenonmust converge to a uniquegroundtruth. INAOE

  35. Types of validity • Criterionvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Yep! Apparentlythesame as convergentvalidity • Dependingonthe temporal relation of thesamplingbetweenthemetricsormodels, itmaybe: • Concurrent: Allobservations are taken at once • Predictive:Observations of thedifferentmetricsormodels are acquired at different times • Allmetrics/models of thesamephenomenonmustagree (to a extent) withthegoldstandard. INAOE

  36. Types of validity • Criterionvalidity: • Example: Supposethatyouacquire a number of tests (salivarycortisol, heartratevariability, skinconductance, pupildilation), toestimate a person’s stress. Let’sassumeyougetstrongcorrelationsbetweeneachpair of these. • Itisreasonabletothinkthatthesemetrics share somecommon factor. • Observation 1: Do nottakeforgranted, thatthecommon factor istheoneyouintendedtomeasureoriginally (construct) • Observation 2: Themetrics do notmeasurethesamefeature of theconstruct (stress). Indeed, theymeasuredifferentconstructs, yetallevidencepointstothesamedirection. Figure from [positivemed.com] INAOE

  37. Types of validity • Predictiveorempiricalvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenontaken at different times. • A subtype of criterionvalidty. • Closelyrelatedtoconcurrentvalidity; • Predictivevalidity has to do with a priori predictionsmadebythemodel • Ontheotherhand, concurrentvalidity has to do withthecorrelationbetweenalreadyexistingobservations (orthosemade at thesame time) and thea posterioriestimation of themodel INAOE

  38. Types of validity • Predictiveorempiricalvalidity: • Althoughmost times predictive and empiricalvalidity are defined as one [e.g.http://www.britannica.com/EBchecked/topic/186144/empirical-validity], personally I thinktheymaybesubtlydifferent: • Predictivevalidityisoftenregarded in terms of observationsmadewithothermetrics/models • Empiricalvalidityisoftenregarded in terms of observationsmade in differentcontexts (e.g.different experimental conditions), and in thisviewisrelatedtoexternalvalidity. INAOE

  39. Types of validity • Concurrentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenonacquired at thesame time. INAOE

  40. Types of validity • Discriminantvalidity: • Degreetowhichobservationsobtainedfromourmetricormodeldifferfromthoseobtainedfromothermetricsormodelbuiltfor a differentconstruct. • Implies a lowcorelationorhighstatisticalindependencewithothermetrics. • i.e.youshould resemble thosemetricsbuilttomeasureyoursameconstructbutdifferclearlyfromthosebuiltfordifferentconstructs. • Thisvalidityiscriticaltodelimittheconstruct. • “discriminantvalidityis […] perhaps a stronger test […] thanconvergentvalidity, becuaseitimplies a challengefrom a plausible rival hypothesis” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 27)] INAOE

  41. Types of validity • Content validity: • Capacity of themetricormodeltorepresentthewholeuniverse (orpopulation) of thephenomenon. • Note thatyoumayhave a constructintendedtoonlypartially describe thephenomenon. • Maybeyourmodelisvalidonlyfor a smallportion of thesamplespaceoruniverse. Itis a partial, non universal, truth, althoughstillvalidwithinitsboundaries. • Thisisoftenobtainedby non-statisticalmethods, and itisnotnecessarilysolvedbythe experimental paradigm/design. • Severalexperts decide onwhethertheobservations are representative of the target universeorpopulation. • ¡Watchout! Expertsmaystillbewrong … INAOE

  42. Types of validity • Facevalidity: • Degreetowhich a metricormodelappearstobemeasuringtheconstructorphenomenon. • Facevalidityisonlytheentrancepointtothecontentvalidity. Itdoesnotguaranteethatyou are reallymeasuringthephenomenon.. • Itoftenincorporates a subjective load fromexpert/s • NOTE: It has beensuggestedthatfacevalidityshouldbeexpressedoragreedspecificallyby non-expertsratherthanexperts. • [Holden, Ronald B. (2010). "Face validity". In Weiner, Irving B.; Craighead, W. Edward. The CorsiniEncyclopedia of Psychology (4th ed.). Hoboken, NJ: Wiley. pp. 637–638.] INAOE

  43. Types of validity • Representationvalidity: • Limits of theconversionfrom a theoreticalconstructto a practicalspecificmetricormodel. • Representationvalidityis a measure of abstraction; howfeasibleisthemodel as a surrogate of thetheoreticalconstruct? INAOE

  44. Types of validity • Intrinsicvalidity: • (Cor-)relationwith a criterion (expert) that has beenaccepted as correct. [Gulliksen (1950) American Psychologist, 5(10):511-517] • Closelyrelatedtofacevalidity, althoughherethereseemstobe a consensusthattherelationmustbewithanexpert. • …subjectivity has beenreduced in thesensethatit has been “accepted” bythecommunity. • Theexpertmaybe a goldstandard. INAOE

  45. Types of validity • Internalvalidity: • Quality of a metricormodeltoallowsampling free of bias. • Differentlyfromtheconstructvalidity, itdoesnotimplythatwhatyou are measuringisrelatedtothemodelledphenomenon; itisjustconcernedwithmeasurementormodellingbias. • Internalvalidityisachievedfullywhenthere are irrefutable argumentsshowingthattheintervention has had (orhadn’t) a certaineffect. • More oftenthannot, itrequires a controlledexperiment (with a control group) • Remember, theremaybeconfusion; e.g.otheralternativehypothesis, and thustherewillbe no constructvalidity, butyoustillmayhaveinternalvalidity. • Itconfirmsthatyourexperimentiscorrectlyperformed • Itisconcernedwithcausality (yetitdoesnotrepresentcausality!) • Example: • Everytimeyouchange A underconditions C lead to a change in B (internalvalidity). That’sdifferent of sayingthat B iscausedby A (constructvalidty). INAOE

  46. Types of validity • Internalvalidity: • Internalvalidityguaranteesthatevidence can becommunicateddirectly. • Internalvaliditymaybe at riskwhen: • Theanalysisdoesnotsupport causal relationsadequately • Groupsbeingcompared are notsufficientlyhomogeneous • Resultsmaynotreachstatisticalsignificance • [http://ec.europa.eu/europeaid/evaluation/methodology/methods/mth_vld_es.htm#05] INAOE

  47. Types of validity • Externalvalidity: • Quality of a metricormodeltopermitobservationsthat can begeneralizedtoothermetrics, models, groups, areas, periods, etc • Externalvalidityisachievedfullywhenitisdemonstratedthat a similar interventionwillget similar effects in a differentcontextbutstillunderthesameconditions. • Normally, itrequireslargenumber of observations, multi-center studies, randomeffectsmodels, differentdatasets, etc. • Externalvaliditypermit transfer of knowledge and scientificevidence INAOE

  48. Types of validity • Internal and externalvalidityseemstobe in conflict; • Internalvalidityrequiresyouto control as much as you can (e.g.allintervining variables) • …butthat reduces thegeneralizationcapabilities, i.e.theexternalvalidity. • (and sometimescollaterallytheecologicalvalidity) Figure from: [http://prpj.wordpress.com/2012/03/11/threats-to-validity-of-experimental-research/] INAOE

  49. Types of validity • Logicalordeductivevalidity: • A metricormodel has logicalvalidityif and onlyif can bededuced/ abduced/ inducedby a logicalsystem. INAOE

  50. Types of validity • Logicalsystem: • A set of elements and objectsallowingustakingdecisions. • Itiscomposedby: • Analphabet of symbols orprimitives • A grammarwithconstruction rules madefromelements of thealphabet • A set of axioms • …which in turn are alsowellformed rules • A set of inference rules • A formal interpretation Syntactic Semantic Dr. Felipe Orihuela Espina

More Related