1.34k likes | 1.36k Views
Explore the concept of validity and ground truth in statistics, learn about different types of validity, and mechanisms for assessing validity. Recommended readings provided for in-depth knowledge.
E N D
Week 4. Validation and hypothesistesting MScMethodologySeminar I Felipe Orihuela-Espina
VALIDATION INAOE
Contents • Ground truth • Goldstandard • Types of validity • Mechanismsforshowingvalidity INAOE
Recommendedreadings • ¡Wikipedia! • http://en.wikipedia.org/wiki/Validity_%28statistics%29 • For a gentleoverview. • A bit more formal: • Cronbach, L. J.; Meehl, P. E. (1955). "Constructvalidity in psychologicaltests". PsychologicalBulletin 52 (4): 281–302 • >6000 citas • Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. • http://www.socialresearchmethods.net/kb/introval.php INAOE
Validity and validation • Validity and validation: • Validityisthedegreetowhich a certainmeasurementormodelrepresentswhatitissupposedtorepresent; itsequivalent in the real worldor in nature. [selfdefinition] • Validityisthebestapproximationpossibletothetruth of a givenproposition, inferenceorconclusion [http://www.socialresearchmethods.net] • Validationistheprocessormechanismforassessingvalidity. INAOE
Validity and validation • Validity and validation: • Strictly; “Measures, samples and designs don't 'have' validity -- only propositions can be said to be valid. Technically, we should say that a measure leads to valid conclusions or that a sample enables valid inferences, and so on. It is a proposition, inference or conclusion that can 'have' validity.” • [http://www.socialresearchmethods.net/kb/introval.php] • Yet, forthisunit I willabstractmyself of thisprecision and willtalkaboutvalidity as correspondingtomeasurementsormodels. INAOE
GROUND TRUTH INAOE
Groundtruth • Thegroundtruthreferstothe real truth of the observable phenomenonortherealityoccuring in nature. [Selfdefinition] • Somealternativedefinitions: • “theclassificationtruth of eachvoxel” [Zou KH et al (2002), MICCAI, LNCS 2488, pp. 315–322,], • …thatis, thetruth in each experimental unitfortheeffects of classification (positive ornegative). • Thisisparticularlynicefor machine learning. • “themostusefulrepresentation of theinformationthatonewishestoconvey”. [Kolaczyk ED (2009) Statistical Analysis of Network Data: Methods and Models, Springer, pg.76] • I likethisonebutrequirestheknowledge of theconstruct (i.e.that “informationthatonewishestoconvey”) INAOE
Groundtruth • Some more alternativedefinitions: • “información collectedonlocation” [http://en.wikipedia.org/wiki/Ground_truth] • Personally I do notlikethisdefinition as itassumesthatwhatyoumeasure in thefield (“onlocation”) ismeasuredwithoutbias, and ifthereisbias, thenthatwouldimplythatthegroundtruthdoesincorporatethosemeasurementerrors • …plus, itdoesnotallowroomforsyntheticdatasets. • Fordifferentreasons, butotherauthorsalso share someconcernsaboutthisdefinition: • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: “madebyhand” herereferstotheinformationcollectedbytheresearcheronlocation; nottiosynthetic data • “The usual way to obtain the ground truth is fragile, inefficient and not directly comparable from one’s work to another” [Canini et al 2009, TMA] INAOE
Groundtruth • Evensome more alternativedefinitions : • “ancillary data” [https://www.fas.org/irp/imint/docs/rst/Sect13/Sect13_1.html] • BesttranslationtoSpanish of theword “ancillary” is auxiliar, subordinado or secundario • …I cannotbe more in disagreementwiththisdefinition… • “a representation of the agreed correct result of the ideal layout analysis method (i.e. the result of the method that, if existed, would put an end to the research problem).” [Antonacopoulos et al (2006) Document Analysis Systems, 11pgs] • Thisdefinition has 2 clearweaknesses: • Assumesthattruthdependsonwhattheannotatorsthink (in total oppositiontoobjectivism). • Forinstance; if a fewcliniciansagreedthat a womanisnotpregnant, thensheisnotpregnantdespitetheobviousfactthatthisdoesnotdependontheiropinion. • Assumesthateverymethod has itsowngroundtruth, whichforobviousreasonsdoesnothold… INAOE
Groundtruth • Yet a few more alternativedefinitions: • “the actual facts of a situation and is used to determine, with certainty, whether information is accurate” [Vrij 2000 en Toma et al Pers Soc Psychol Bull 2008; 34; 1023] • Hot! Bestto my taste, and ifI’mallowedcloseto my owndefinition INAOE
Groundtruth • Regardless of thedefinition: theavailability of a groundtruthpermitsobjectiveevaluation of criteria. • Ifitisavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegroundtruth (groundtruthapproach). • Withsynthetic data, groundtruthisalwaysavailable. INAOE
Groundtruth • Problem: • With real data, thegroundtruthisnotalwaysavailable, and oftenitsacquisitionisfarfrom trivial. • Severalmethodshavebeendevelopedforestimatingthegroundtruthwhenitisnotavailable: • Beware! No matterhowgoodisyourestimationitis no longerthegroundtruth. • A fewmethodsforestimatingthegroundtruth [LiX 2010]: • Votingbyseveralannotatorsorjudges. • Often, experts in thefields • Maximum posterior probability • Minimization of variance INAOE
Groundtruth • Syntheticgroundtruth: • Althoughbeingabletogeneratesyntheticgroundtruthmeansthatyou are truly in posession of thegroundtruth, thisisnotexempt of problems; • “ground-truth data made by hand is usually inefficient and hard to compare from one’s work to another” [Jun-qiang et al (2011), The Journal of China Universities of Posts and Telecommunications, 18(Suppl. 1): 106-111] • NOTE: Heretheauthors are referringstrictlytoinformationcollectedonlocationby a researcher INAOE
GOLD Standard INAOE
Goldstandard • Thegoldstandardrefersto a test, metric, ormodelthatiswidelyacceptedbythecommunity as a currentlyvalidrepresentation of thereality. [Selfdefinition] • Otherdefinitions: • “another measure that has been used and accepted in the field” [Young et al 1995, Arch Phys Med Rehabil 76:913-918] • Relativelycloseto mine. INAOE
Goldstandard • More alternativedefinitions : • “a relatively irrefutable standard that constitutes recognized and accepted evidence that a certain disease exists.” [Brown et al 1996, NEJM 335(14):1049-1053] • ¡Verygood! ...shouldn’titwasbecausethey use theword “standard” to define “standard”. • …yet, itis NEJM! • “a benchmark that is regarded as definitive” [Noel et al 2009, ATM] • No goldstandardisdefinitive; onlyuntilweget a more accurateone… INAOE
Goldstandard • A few more alternativedefinitions: • “the best available method, offering accuracy, reproducibility, feasibility and a justifiable cost-benefit interrelation” [Mariath et al (2007), JAOS 15(6):529-33] • I miss theneedtocommunityconsensus • Also, I do notundestandverywellwhyincludingthecost-efficiencyrelation. • “based on the judgments of expert PIs and represent the carrier’s definition of what constitutes acceptable/unacceptable aircrew performance” [Baker y Dismukes (2003), NASA, NASA/TM—2003–212809] • A bit limitedtothecontext, butreadingbetweenlines, a verygooddefinition. INAOE
Goldstandard • Yet a few more definitions: • “the most accurate method, procedure, or measurement that is known to represent the true value of what is being tested.” [Hudson, MSc thesis, Texas A&M University, pg 5] • Verycloseto mine. (NOTE: I gave mine wellbeforeknowingthisone…) • “a diagnostic test or benchmark that is the best available under reasonable conditions. It does not have to be necessarily the best possible test for the condition in absolute terms.” [http://en.wikipedia.org/wiki/Gold_standard_(test)] • I miss theneedtocommunityconsensus. INAOE
Goldstandard • Whengroundtruthisnotavailable, validationinvolvescomparinghowfarisourmetricormodelfromthegoldstandard (goldstandardapproach). • Thisoften links toconvergentvalidity and to a extentnomologicalvalidity • Note thatyou compare yourselfnotagainstthestate of the art, butagainstwhatitisacceptedbythecommunity. • Problem: Still, thereisn’t a goldstandardalwaysacceptedbythecommunity. INAOE
TYPES OF VALIDITY INAOE
Types of validity • Types of validity: • Constructvalidity • Convergentvalidity • Criterionvalidity • Concurrentvalidity • Predictive o empiricalvalidity • Discriminantvalidity • Content validity • Facevalidity • Representationvalidity • Intrinsicvalidity • Internalvalidity • Externalvalidity • Logicalvalidity • Statisticalconclusionvalidity • Ecologicalvalidity • Diagnosticvalidity • Nomologicalvalidity INAOE
Types of validity Fidelitytothephenomenon Representative of thepopulation Fidelitytoothermetrics INAOE
Types of validity Figura de: [http://www.socialresearchmethods.net/kb/introval.php] INAOE
Types of validity Figura de: [www.analytictech.com] INAOE
Types of validity • Constructvalidity: • A constructis “somepostulatedattribute of peopleassumedtobereflected in the test performance” [Cronbach y Meehl, 1955]. • Cronbach and Meehl seminal paperwaswrittenforpsychology, so whentheyspeak of: • people, theyrefertothephenomenonunderstudy. • Test performance, theyreferto a modelormetriccapturingsomeaspect of thephenomenon • Thedevelopment of theconstructisnoteasy. Toonarrowortoobroad and itmaynullifyyourexperiment. INAOE
Types of validity • Constructvalidity: • Construct: • Example: Researchprocess and validationaccordingtopositivistresearchtradition Figure from: [Johnston y Smith http://epress.anu.edu.au/apps/bookworm/view/ Information+Systems+Foundations%3A+The+Role+of+Design+Science/5131/ch02.xhtml] INAOE
Types of validity • Constructvalidity: • “…constructvalidity […] perceived as themost fundamental and embracing of alltypes of validity” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Capacity of a metricormodeltomeasureorrepresentfaithfullythephenomenonunderstudy and itslegitimateinferences. • In otherwords, thatyou are measuringormodellingwhatyoushouldbemeasuringormodelling, and thatyou are free of bias (seealsointernalvalidity) • Itbacomesspeciallycriticalwhen a concomitantcriterionoruniverse of contentislacking. [Cronbach y Meehl, 1955] INAOE
Types of validity • Constructvalidity: • Constructvalidityshouldbeequivalentto: “Tellthetruth, allthetruth, and nothingbutthetruth” • [http://www.socialresearchmethods.net/kb/considea.php] • Wewantourmetricormodeltorepresent “theconstruct, alltheconstruct [contentvalidity], and nothingbuttheconstruct” • [http://www.socialresearchmethods.net/kb/considea.php] INAOE
Types of validity • Constructvalidity: • Constructvalidityis a process, not a method [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 26)] • Itrequiresmanylines of evidence • Itcannotbeexpressedby a single figure orcoefficient • Itrequiresbothquantitative and qualitativeevidence INAOE
Types of validity Figure from: [http://www.socialresearchmethods.net/kb/considea.php] INAOE
Types of validity • Constructvalidity: • Constructvalidityiswhatallowsustobuild a universal truth. Consequentlyit can becompromisedbymanyfactors: • Inadequateorambiguousdefinition of theconstruct • Alteredbehaviour of theconstruct (e.g.appearance of new treatments) • Biasincludingthat of theresearcher • Confusionorcontaminationby non-controlledorlatent variables • Confusionorcontaminationby factor interaction (non-maineffects) • Confusionorcontaminationbyotherconstructs • Breaching of theblindingbyhypothesisguessingfrompatientorco-researchers • Mono-operationbias (thatismeasuring a single dependent variable) • A single dimensionwillbeinsufficienttoexpresstheconstruct • Ofuscation/apprehensionbyevaluation (changes un the responses of thephenomenonjustbecauseitisbeingmeasured/observed) • Remember; youcan’tmeasure a phenomenonwithoutdistortingit! INAOE
Types of validity INAOE Figure from: [http://cmapspublic.ihmc.us/rid=1148264198734_1533533750_4916/Validez%20de%20Constructo.cmap]
Types of validity • Convergentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Watchout! Theremaybedifferentconstructsfromthesamephenomenon, and thusthemodelmaybeexplainingjustslightlydifferentthings • Itis a subtype of constructvalidity • …but note howitis similar tocriterionvalidity. • Allmetrics/models of thesamephenomenonmust converge to a uniquegroundtruth. INAOE
Types of validity • Criterionvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenon. • Yep! Apparentlythesame as convergentvalidity • Dependingonthe temporal relation of thesamplingbetweenthemetricsormodels, itmaybe: • Concurrent: Allobservations are taken at once • Predictive:Observations of thedifferentmetricsormodels are acquired at different times • Allmetrics/models of thesamephenomenonmustagree (to a extent) withthegoldstandard. INAOE
Types of validity • Criterionvalidity: • Example: Supposethatyouacquire a number of tests (salivarycortisol, heartratevariability, skinconductance, pupildilation), toestimate a person’s stress. Let’sassumeyougetstrongcorrelationsbetweeneachpair of these. • Itisreasonabletothinkthatthesemetrics share somecommon factor. • Observation 1: Do nottakeforgranted, thatthecommon factor istheoneyouintendedtomeasureoriginally (construct) • Observation 2: Themetrics do notmeasurethesamefeature of theconstruct (stress). Indeed, theymeasuredifferentconstructs, yetallevidencepointstothesamedirection. Figure from [positivemed.com] INAOE
Types of validity • Predictiveorempiricalvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenontaken at different times. • A subtype of criterionvalidty. • Closelyrelatedtoconcurrentvalidity; • Predictivevalidity has to do with a priori predictionsmadebythemodel • Ontheotherhand, concurrentvalidity has to do withthecorrelationbetweenalreadyexistingobservations (orthosemade at thesame time) and thea posterioriestimation of themodel INAOE
Types of validity • Predictiveorempiricalvalidity: • Althoughmost times predictive and empiricalvalidity are defined as one [e.g.http://www.britannica.com/EBchecked/topic/186144/empirical-validity], personally I thinktheymaybesubtlydifferent: • Predictivevalidityisoftenregarded in terms of observationsmadewithothermetrics/models • Empiricalvalidityisoftenregarded in terms of observationsmade in differentcontexts (e.g.different experimental conditions), and in thisviewisrelatedtoexternalvalidity. INAOE
Types of validity • Concurrentvalidity: • (Cor-)relation of theobservations of ourmetricormodelwiththosecomingfromothermetricsormodelsdevelopforthesamephenomenonacquired at thesame time. INAOE
Types of validity • Discriminantvalidity: • Degreetowhichobservationsobtainedfromourmetricormodeldifferfromthoseobtainedfromothermetricsormodelbuiltfor a differentconstruct. • Implies a lowcorelationorhighstatisticalindependencewithothermetrics. • i.e.youshould resemble thosemetricsbuilttomeasureyoursameconstructbutdifferclearlyfromthosebuiltfordifferentconstructs. • Thisvalidityiscriticaltodelimittheconstruct. • “discriminantvalidityis […] perhaps a stronger test […] thanconvergentvalidity, becuaseitimplies a challengefrom a plausible rival hypothesis” [Weiner H y Braun HI (1988) “Test validity” Routledge, 267 pgs. (pg. 27)] INAOE
Types of validity • Content validity: • Capacity of themetricormodeltorepresentthewholeuniverse (orpopulation) of thephenomenon. • Note thatyoumayhave a constructintendedtoonlypartially describe thephenomenon. • Maybeyourmodelisvalidonlyfor a smallportion of thesamplespaceoruniverse. Itis a partial, non universal, truth, althoughstillvalidwithinitsboundaries. • Thisisoftenobtainedby non-statisticalmethods, and itisnotnecessarilysolvedbythe experimental paradigm/design. • Severalexperts decide onwhethertheobservations are representative of the target universeorpopulation. • ¡Watchout! Expertsmaystillbewrong … INAOE
Types of validity • Facevalidity: • Degreetowhich a metricormodelappearstobemeasuringtheconstructorphenomenon. • Facevalidityisonlytheentrancepointtothecontentvalidity. Itdoesnotguaranteethatyou are reallymeasuringthephenomenon.. • Itoftenincorporates a subjective load fromexpert/s • NOTE: It has beensuggestedthatfacevalidityshouldbeexpressedoragreedspecificallyby non-expertsratherthanexperts. • [Holden, Ronald B. (2010). "Face validity". In Weiner, Irving B.; Craighead, W. Edward. The CorsiniEncyclopedia of Psychology (4th ed.). Hoboken, NJ: Wiley. pp. 637–638.] INAOE
Types of validity • Representationvalidity: • Limits of theconversionfrom a theoreticalconstructto a practicalspecificmetricormodel. • Representationvalidityis a measure of abstraction; howfeasibleisthemodel as a surrogate of thetheoreticalconstruct? INAOE
Types of validity • Intrinsicvalidity: • (Cor-)relationwith a criterion (expert) that has beenaccepted as correct. [Gulliksen (1950) American Psychologist, 5(10):511-517] • Closelyrelatedtofacevalidity, althoughherethereseemstobe a consensusthattherelationmustbewithanexpert. • …subjectivity has beenreduced in thesensethatit has been “accepted” bythecommunity. • Theexpertmaybe a goldstandard. INAOE
Types of validity • Internalvalidity: • Quality of a metricormodeltoallowsampling free of bias. • Differentlyfromtheconstructvalidity, itdoesnotimplythatwhatyou are measuringisrelatedtothemodelledphenomenon; itisjustconcernedwithmeasurementormodellingbias. • Internalvalidityisachievedfullywhenthere are irrefutable argumentsshowingthattheintervention has had (orhadn’t) a certaineffect. • More oftenthannot, itrequires a controlledexperiment (with a control group) • Remember, theremaybeconfusion; e.g.otheralternativehypothesis, and thustherewillbe no constructvalidity, butyoustillmayhaveinternalvalidity. • Itconfirmsthatyourexperimentiscorrectlyperformed • Itisconcernedwithcausality (yetitdoesnotrepresentcausality!) • Example: • Everytimeyouchange A underconditions C lead to a change in B (internalvalidity). That’sdifferent of sayingthat B iscausedby A (constructvalidty). INAOE
Types of validity • Internalvalidity: • Internalvalidityguaranteesthatevidence can becommunicateddirectly. • Internalvaliditymaybe at riskwhen: • Theanalysisdoesnotsupport causal relationsadequately • Groupsbeingcompared are notsufficientlyhomogeneous • Resultsmaynotreachstatisticalsignificance • [http://ec.europa.eu/europeaid/evaluation/methodology/methods/mth_vld_es.htm#05] INAOE
Types of validity • Externalvalidity: • Quality of a metricormodeltopermitobservationsthat can begeneralizedtoothermetrics, models, groups, areas, periods, etc • Externalvalidityisachievedfullywhenitisdemonstratedthat a similar interventionwillget similar effects in a differentcontextbutstillunderthesameconditions. • Normally, itrequireslargenumber of observations, multi-center studies, randomeffectsmodels, differentdatasets, etc. • Externalvaliditypermit transfer of knowledge and scientificevidence INAOE
Types of validity • Internal and externalvalidityseemstobe in conflict; • Internalvalidityrequiresyouto control as much as you can (e.g.allintervining variables) • …butthat reduces thegeneralizationcapabilities, i.e.theexternalvalidity. • (and sometimescollaterallytheecologicalvalidity) Figure from: [http://prpj.wordpress.com/2012/03/11/threats-to-validity-of-experimental-research/] INAOE
Types of validity • Logicalordeductivevalidity: • A metricormodel has logicalvalidityif and onlyif can bededuced/ abduced/ inducedby a logicalsystem. INAOE
Types of validity • Logicalsystem: • A set of elements and objectsallowingustakingdecisions. • Itiscomposedby: • Analphabet of symbols orprimitives • A grammarwithconstruction rules madefromelements of thealphabet • A set of axioms • …which in turn are alsowellformed rules • A set of inference rules • A formal interpretation Syntactic Semantic Dr. Felipe Orihuela Espina