580 likes | 640 Views
I n t r o d u ct i on t o M i c r o a r r ay D a t a A na l ys i s. O u t l i n e. I n t r o d u c ti o n M i c r o a r rays T e c h n o l o g y T y p e s a n d Us es o f M ic r o a r rays M i c r o a r rays f o r t h e S t u dy o f G e n e E x p r e s si o n F a b r i c a ti o n
E N D
Outline • Introduction • MicroarraysTechnology • TypesandUsesofMicroarrays • MicroarraysfortheStudyofGeneExpression • Fabrication • Spottedmicroarrays • 2.Oligonucletidemicroarrays • ExperimentswithMicroarrays • Flowchartofaexperimentwithmicroarrays • SoftwareforMicroarrayDataAnalysis
Introduction(1) Briefreviewofmolecularbiology... Mostlifeformsaremadeofcells.Eachindividualhasaverylargeindefinitenumberofcells. Eachcell containschromosomes (e.g. human cells contain23pairs ofchromosomes). These organized structuresofDNAandinheritedinformation. proteins arethecarriersof AchromosomeisasinglepieceofcoiledDNAcontainingmanygenes,regulatoryelementsandothernucleotidesequences.
Introduction(2) Whatelse? ThegenomeofanorganismisinscribedinDNAorRNAinsomevirus AgeneisthebasicunitofheredityinalivingorganismandistheportionoftheDNAthatcodesforaproteinoranRNA Eachprotein-codinggeneisagenetranscribedintoRNAinsomemoleculesandinturnmRNAistranslatedintoatleastoneproteininsomecells
Introduction(3) TheCentralDogmaofMolecularBiology • Information flowfromDNAtoRNAto proteinoccursinfourstages Replication TheDNAreplicatesitsinformationinaprocessthatinvolvesmanyenzymes Transcription TheDNAcodesfortheproductionofmessengerRNA(mRNA) Splicing Ineucaryoticcells,themRNAisprocessedandmigratesfromthenucleustothecytoplasm. Translation MessengerRNAcarriescodedinformationtoribosomes.Theribosomesreadthisinformationanduseitforproteinsynthesis.
Introduction(4) TechniquesinMolecularBiology • MolecularbiologyhasdevelopedmultipletechniquestomeasurelevelsofRNA,DNA,proteinsormetabolites,suchas – – – – – SouthernBlotNorthernBlotDifferentialdisplaySAGE … • Post-genomicseraisperformandtoanalyze characterized byitscapabilityto data sets from large-scale experimentssimultaneously
Introduction(5) Theparadigmshift • Withthesameresourcesweobtainapicturewithlowerresolutionbutwithaviewofthewholecontext vs Basedon“Theparadigmshift”slidefromJ.Dopazo(CNIO)
Introduction(6) Todrawananalogywithpre-genomicsera • Biologyusedto“spy”ongeneseverythingindeepandindividually(i.e.genebygene)
Introduction(7) Todrawananalogywithpost-genomicsera Nowadays,alotofgenescanbe“spied”atthesametime...but... …Howcanwesplitthewheatfromthechaff?
MicroarraysTechnology(1) Broadlyspeaking... Microarraysareavarietyofplatforms inwhichhighdensityassaysperformedinparallelonasupport. aresolid PublicationsinPubMedwithmicroarraywordinthetitle 10911080 1000 1000986 988 Thistechnologyhaschangedtheway 920 biologistsapproach problemsandnewchallengesfor hugeeach 800 introducesstatisticiansquantityofexperiment numberofpublications 747 becauseofthedatageneratedin 600 544 400 Theyhavebeenusedforallkindsofbiologicalproblems 259 200 171 83 24 5 0 Theliteraturecontainsalmost8000papersusingmicroarraywordinthetitle 1998 2000 2002 2004 2006 2008 2010 year
MicroarraysTechnology(2) Thebiologicalprincipleofmicroarraysinvolvedin... • ItisthesameonethatallowsDNAdoublehelicesto providethebasisforheredity • SequencesofDNAorRNAmoleculescontainingcomplementarybasepairshaveanaturaltendencytobindtogether. ...AAAAAGCTAGTCGATGCTAG... ...TTTTTCGATCAGCTACGATC... • IfweknowthemRNAsequence,wecanbuildaprobeforitusingthecomplementarysequence.
MicroarraysTechnology(3) But...Whatisamicroarray? Itconsistofalargeset(thousandstotenofthousands)ofspecificsequences(knownasprobesorfeatures)attachedinorder(array)tomicroscopicspotsonasolidsupport(nylonorsiliconglass,...). ... moleculesample1 moleculesample2 moleculesampler Aprobe(thatcanbeagene,aprotein,ametabolite,...)isusedtohybridizeamoleculeofanucleicacidsample(calledtarget)underhigh-stringencyconditions. probeprobeprobeprobe gene1gene2gene3gene4 1 2 3 4 probeprobeprobeprobe5678 spots probeprobeprobeprobe9101112 Probe-Target determinetherelative hybridizationis usedtoof abundance ... nucleicacidsequencesinthetargets(e.g. todeterminesequences,to detect variationsingenesequences,levels,genemapping,...). expression probek-3 probeprobeprobek-2k-1k Microarray
TypesandUsesofMicroarrays(1) Typesofmicroarrays Microarraysspatiallyarrangedonasolidsurfacearemostwidelyused.` Beadarraysarecreatedby • eitherimpregnatingbeadswithdifferentconcentrationsoffluorescentdye, • orsometypeofbarcodingtechnology. Thebeadsareaddressableandusedtobindingeventsthatoccurontheirsurface. identify specific
TypesandUsesofMicroarrays(2) UsesofMicroarrays(1) Expressionanalysis –TheprocessofmeasuringgeneexpressionviaRNA(orcDNAafterreversetranscription)iscalledexpressionanalysisorexpressionprofiling. Inthisexperimentstheexpressionlevelsofthousandsofgenesaresimultaneouslymonitoredtostudytheeffectsofcertaintreatments,diseases,anddevelopmentalstagesongeneexpression. – ComparativeGenomicHybridization – Comparativegenomichybridization(CGH)orChromosomalMicroarrayAnalysis(CMA)isusedfortheanalysisofcopynumberchanges(increasesordecreases)oftheimportantchromosomalfragmentsharboringgenesinvolvedindiseases. Mutationanalysis –AsinglebasedifferencebetweentwosequencesisknownasSingleNucleotidePolymorphism(SNP)anddetectingthemisknownasSNPdetection. WithgDNAthiskindofarraystrytodetectgenesthatmightdifferfromeachotherbyaslessasasinglenucleotidebase. –
TypesandUsesofMicroarrays(2) UsesofMicroarrays(2) ProteinArray TissueArray CGHArrays SNPArrayAffymetrix CNVArrayIllumina ExpressionArrays cDNANylonMembraneArray GeneChipAffymetrixArray cDNAAgilentArray
TypesandUsesofMicroarrays(3) ApplicationofMicroarrays Genediscovery Identificationofnewgenes,knowabouttheirfunctioningandexpressionlevelsunderdifferentconditions. Molecularclassificationofcomplexdiseases Toclassifythetypesofcanceronthebasisofthepatternsofgeneactivityinthetumorcells,todevelopmoreeffectivedrugs. Drugdiscovery Comparativeanalysisofthegenesfromadiseasedandanormalcellhelptheidentificationofthebiochemicalconstitutionoftheproteinssynthesizedbythediseasedgenes.Thisinformationcanbeusedtosynthesizedrugsthatcombatwiththeseproteinsandreducetheireffect. Toxicologicalresearch Microarraytechnologyprovidesarobustplatformfortheresearchoftheimpactoftoxinsonthecellsandtheirpassingontotheprogeny.
MicroarraysfortheStudyofGeneExpression(1) Whatisthegeneexpression? • Thegeneexpressionisthepresenceofthegeneproductsofagene,intheformofmRNA(orprotein),inacell • Toputitstraight:Sincecellscontainthesamegeneticinformation,whatmakesdifferentbraincellsfromheartcellsisthegeneexpression.
MicroarraysfortheStudyofGeneExpression(2) FindingDifferentiallyExpressedGenes(DEG) Tofindgenesthatdisplayalargedifferenceingeneexpressionbetweentwoconditionsandarehomogeneouswithinthem – Typicallystatisticaltests(t-test,Wilcoxontest)areused Iftherearemorethantwoconditions,orifconditionsarenested,theappropriatestatisticalmethodisANOVA pvaluesfromthesetestshavetobecorrectedformultipletesting
MicroarraysfortheStudyofGeneExpression(3) Exploratorydataanalysis(1) Tofindgroupsthatarenotdefinedyet(e.g.noveldiseasesubtypes)Methods – – – – fromthisfieldwerethefirsttobeusedformicroarraydata shouldbeusedonlyifnopriorknowledgeexiststhatcouldbeincorporated findpatternsinthedata,butanypatterns,whethertheyaremeaningfulornotinclude • • Clustering(hierarchicalandpartitioning)Projection(PCA,MDS) Alizadehetal.Nature403:503–511(2000)
MicroarraysfortheStudyofGeneExpression(4) Timeseries,partitioningclusteringandcorrelation • Usuallyusedtofindpatternsofco-expressedgenesThemeaningoftimeseriesisdifferentfor • Biologists:2-10timepoints • statisticians:>200timepoints • “Non-optimal”solution:touseclusteringmethodstofindsuchpatterns Notethattheyarebynomeansexhaustive,andthatnosignificancemeasurecanbeattachedtothem IncontrasttoEstimationofDistribuitonMethods(EDA),partitioningclustermethodsaremorepopular(e.g.K-meansorSelf-organizingmaps) Toseekgeneswhoseexpressionprofileissimilartothatofaparadigmaticgene,correlationscanbecalculated,andsortbythem.Thereisnoneedforclustering. Specialmethodsexistforperiodicchanges(⇒cellcycle),e.g.Fourieranalysis
MicroarraysfortheStudyofGeneExpression(5) Classification Wheninformationaboutgroupingofthesamplesisavailable,itcan(andshould)beusedtogetimprovedresults Groupingsmaybe: – – – – – – – – TreatmentandControlDiseaseandNormalDiseasestage1,2,3MutantandWildTypeGoodandPoorOutcome,Therapysuccessorfailure ... Onelearnscharacteristicpatternsfromatrainingsetandevaluatebypredictingclassesofatestset
MicroarraysfortheStudyofGeneExpression(6) SurvivalAnalysis Tofindpatternsthatareassociatedwithprolongedpatients’survivaltime Insteadoftreatingoutcomeasabinaryvariable,canbeused – – TheoverallsurvivaltimeorTheeventfreesurvivaltime ascontinuousvariables,andtrytoestimateitbyregression Sincetherisktosufferfromrelapseisdecreasingwithtime,linearregressionmodelsarealmostalwaysinappropriatespecializedmodelswouldbebetter – – CoxregressionRegressiontrees
MicroarraysfortheStudyofGeneExpression(7) Pharmacogenomics Tofindmolecularpredictorsthattellaboutprobablesuccess(orfailure)ofacertaintherapy.e.g. – – estrogenreceptorstatusfortamoxifen(antihormone)therapyHER2/NEUstatusforherceptintherapyinbreastcancer Onemayregardtreatmentoutcomeasadiscretevariableanduseclassificationmethods Sometimes,it’sconvenientnottowaitforthefinalendpoint(whichmaybeyearsaway),buttousesurrogatevariables,e.g. – – thedropofthebloodlevelofacertainproteinreductionintumorvolume
Fabrication Twomaintechnologies Therearemanytypesoftechnologies,butprinciplesarethesame ThemostusedarespottedarraysandInsituarrays Spottedarrays(akacDNAarraysorStanfordarrays) – PreviouslysynthesizedcDNAsoroligonucleotidesaredepositedonthechip Basedon“printing-like”technologies – Insituarrays(akaoligoarraysorAffyarrays) – – – ProbesaresynthesizeddirectlyonthechipBasedonphotolithographictechniques Affymetrixarraysarethebest-known...butnottheonlyone!
SpottedArrays(1) Fromthechipstotheimages ChipDesignandProduction SamplePreparation Hybridization ScanningandCapturingImages ImageAnalysis Quantification
SpottedArrays(2) Chipdesignandconstruction • Productionbeginswiththeselectionofthe"probes"tobeprintedonthearrayIngeneral:chosenfrom • GenBank(http://www.ncbi.nlm.nih.gov/) • dbEST(http://www.ncbi.nlm.nih.gov/UniGene/index.html) • cDNA’sareprintedonthearray • Eachspotcancontainuniquesequences • Printing”meansadheringsequencestothespots Amovieoftheprintingprocessisavailablehere
SpottedArrays(3) Samplepreparation RNAisextractedfromthesamples ThisRNAisconvertedtofluorescentlylabeledcDNAbyreversetranscriptioninpresenceoffluorescentlylabelednucleotideprecursors RNAfromeachsamplesare labelledfluorescentCy-5)to withdyes different(e.g.Cy-3, allowdirect comparison 4.Afterlabeling,theyaremixed andhybridizedsequencesonthe(probes) witharray
SpottedArrays(4) Hybridizationwithprobes Targetslabeledandcombined Amovieofthehybridizationprocessisavailablehere
SpottedArrays(5) Scanningandcapturingimage AfterhybridizationeachDNAspotisilluminatedandfluorescencemeasurestakenforeachdyeseparately Thesemeasurementswillbeused,aftertheappropriatequalitycontrols,todeterminetherelativeabundance,ofthesequenceofeachspecificgeneinthetwomRNAorDNAsamples
SpottedArrays(6) Imageanalysis(1) TIFFimagesareprocessedbyimageanalysisprograms – – – SPOT, GenePix ... toacquireintensityvaluesforeachspot Thesemeasureswillbeused,aftertheappropriatequalitycontrols,todeterminetherelativeabundance,ofthesequenceofeachspecificgeneinthetwomRNAorDNAsamples
SpottedArrays(7) Imageanalysis(2) StepsinImageProcessing Addressing:Estimatelocationofspotcenters Segmentation:Classifyeachspota foreground(signal)background(noise) ● ● 3.Informationextraction(quantification) Foreachspotonthearray,andeachdyeobtain Signalmeasurements(R,G) – gg Backgroundmeasurements(bgR,bgG) gg – – Qualityindicators
SpottedArrays(8) Quantification Genemeasuredmeasures expressionis ● fromintensityasthe relative (corrected) intensityofonedyevsthe(corrected)relativeintensityoftheother M=Rg,M Rg−bgRg = Corrected Gg Gg−bgGg Background correction ● maybeaccordingquality needed, ornot,array tothe
SpottedArrays(8) Overviewoftheprocess Amovieofthewholeprocessisavailablehere
InsituChips(1) Fromthechipstotheimages MainConcepts SynthesisofOligosontheChip SamplePreparation HybridizationProcess ScanningImages OutputImages QuantificationandExpressionMeasures
InsituChips(1) Mainconcepts(1) MoreadvanceddesignthanspottedcDNAarrays – – TheyareNOTbasedoncompetitivehybridization.Thatis,onechip,onesampleTheyareNOTaddedonthechipafterbeingsynthesizedinvitro Mainidea:Probesaresynthesizedinsitu(onthechip) Sequencesarebuiltuponthechipsurfacebysequentiallyelongatingagrowingchainwithasinglenucleotideusingphotolithography Chemicalyieldofthestepwiseelongationislimited – SequencescanNOTgrowtomorethan25merslength(oligo) – Need16-20different25mersequencestouniquelycharacterizeagene • • Probe=Individual25mersequence Probeset=Setof25merscorrespondingtoaparticulargene/EST
InsituChips(2) Mainconcepts(2) Affymetrix(http://www.affymetrix.com)istheleadercompanyofthesekindsofchips.TheycallthemGeneChips Eachgeneisrepresentedbyasetofshortsequences Someofthesechipscontainwholegenomes,thatis>50.000probesets Aprobeset(usuallydenotedprobeset)isusedtomeasurethemRNAlevelsofauniquegene Eachprobesetismadeupofmultipleprobecells – – withmillonsofcopiesofoneoligodecopiasdeunoligo(25bp)Organizedinprobepairswith • • aPerfectMatch(PM):matchperfectlywithapieceofagene aMismatch(MM):itisthesametoPMbutwiththecentralnucleotidechangebythecomplementary
InsituChips(3) Mainconcepts(1) MoreadvanceddesignthanspottedcDNAarrays – – TheyareNOTbasedoncompetitivehybridization.Thatis,onechip,onesampleTheyareNOTaddedonthechipafterbeingsynthesizedinvitro Mainidea:Probesaresynthesizedinsitu(onthechip) Sequencesarebuiltuponthechipsurfacebysequentiallyelongatingagrowingchainwithasinglenucleotideusingphotolithography Chemicalyieldofthestepwiseelongationislimited – SequencescanNOTgrowtomorethan25merslength(oligo) – Need16-20different25mersequencestouniquelycharacterizeagene • • Probe=Individual25mersequence Probeset=Setof25merscorrespondingtoaparticulargene/EST
InsituChips(4) GeneChip®expressionarraydesign
InsituChips(5) Onegene,oneprobeset Probesareselectedtobespecificoftherepresentedgene Themusthavegoodpropertiesofhybridization genesequence
InsituChips(6) Synthesisofoligosonthechip(1) GeneChip®probearraysaremanufacturedthroughauniqueandrobustprocess,acombinationofphotolithographyandcombinationalchemistry ImagecourtesyofAffymetrix
InsituChips(7) Synthesisofoligosonthechip(2) mask mask mask mask mask mask mask C A T C mask T T T A C GA TC AG A GeneChip ImagefromacourseofDanNettleton
InsituChips(8) Synthesisofoligosonthechip(3) Severalcopiesofasinglefeaturearedepositedineachcell ImagecourtesyofAffymetrix
InsituChips(9) Samplepreparation
InsituChips(8) Hybridizationprocess OncetheoligoshavebeensynthesizedhybridizationisperformedbyaddingmRNAfromthetissuetoanalyzeonthechip ImagecourtesyofAffymetrix
InsituChips(9) ScanningImages Scanningoftaggedandun-taggedprobesonanAffymetrixGeneChip®microarray ImagecourtesyofAffymetrix
InsituChips(10) OutputImage DatafromanexperimentshowingtheexpressionofthousandsofgenesonasingleGeneChip®probearray ImagecourtesyofAffymetrix
InsituChips(11) Quantification Intensitiesfromeachelementareextracted QuantitativeanalysisofthehybridizationresultsisperformedbyanalyzingthehybridizationpatternofthesetofPMandMMprobesofeverygene Incontrastwithspottedchipsexpressionmeasuresusedhereareabsoluteones.Thatis,eachchipishybridizedwithonlyonetissueatatime
InsituChips(12) Absoluteexpressionmeasures MeasurestodeterminethequantitativeRNAabundance,i.e.theexpressionlevelbasedontheaverageofthedifferencesPMminusMMforeachprobefamily Avg.Diff=1¿j∈APM−MM ∣A∣ Manyalternativeshavebeenintroduced
SpottedvsInsituArrays PRO'sandCON's cDNAmicroarraysOligomicroarrays PRO's PRO's • • • • • • Cheaper Flexibilitywiththeexperimentaldesign Highsignalintensity(largesequences) Quickmanufacture(automated)Highreproducibility Highspecificity Alotofprobes/genes • CON's CON's • Requiresmorespecializedequipment ExpensivesLowflexibility • Lowreproducibility • Cross-hybridization(lowspecificity) • Highmanupulation(ssibilityofcontamination) • •
InsituChips(13) Overviewoftheprocess Amovieofthewholeprocessisavailablehere ImagecourtesyofAffymetrix