220 likes | 320 Views
Flow Cytometry and Reproducible Analysis. Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC. Reproducible Analysis. Can someone in a different lab replicate your results? Can someone else in your lab replicate your results? Can you replicate your own results 6 months later?
E N D
Flow Cytometry and Reproducible Analysis Cliburn Chan Department of Biostatistics and Bioinformatics, DUMC
Reproducible Analysis • Can someone in a different lab replicate your results? • Can someone else in your lab replicate your results? • Can you replicate your own results • 6 months later? • When FlowJo goes from version 10.0 to 11.0? • When your lab catches fire and all your computers melt into toxic waste?
Complexity of flow analysis • Experimental design • Running the experiment • Raw data (FCS files) • Compensation • Transformation • Gating strategy • Gates MFI and relative frequencies • Statistical analysis – e.g. outcome correlation
Experimental design • Is randomization done correctly? • Is the sample size sufficient? • Is there an SOP for annotating the experiment? • MIATA • MiFlowcyt • What is the informatics strategy to ensure that data is recorded accurately and backed-up safely?
Running the experiment • Stuff I know little about … • Janet and Jennifer will teach in this workshop • Instrument calibration • Bridging studies • Reagent qualification • Use of appropriate biological controls • Use of appropriate technical controls
Raw data (FCS files) • Is there a file naming SOP that is followed? • Is there an SOP for recording FCS metadata? • Channel labels – fluorochrome, antibody, FMO
Compensation, transformation and gating strategy • Compensation is Real = Spillover-1 × Observed • Transformation is complicated – can think of as linear (low values) and log (high values) • Gating strategy is hard to replicate, but can be stored as a template and “re-used” with tweaking • Compensation, transformation and gating should be done on a per-batch and not per-file basis • Would recommend storing workspace containing this data in both .jo and .xml formats
Working with statisticians • At some point, a statistician is likely to be asked to analyze your data. This can lead to much unhappiness. • Statisticians do not like Excel • The first thing they will try to do is export to a CSV or delimited file, for import into SAS or R • If this is difficult to do, they will not like you
Excel rules for happy statisticians • 1 worksheet = 1 table • 1 cell = 1 value • Data/metadata = comprehensive & consistent • Formatting = None • Validation = Yes
1 worksheet = 1 table • A table has column headers and a number of rows and nothing else – it is RECTANGULAR • Do not put more than 1 table in a worksheet • Do not use non-rectangular tables • Example of good worksheet
1 cell = 1 value • Easy to filter by tube, sample or subject • Easy to write validation rules or lookup table
1 cell = 1 value • ID column has 3 different values • Need to do text parsing to recover information – very error prone
Data: column names • Consistent column names across worksheets • Singlets/Lymphocytes • Singlet/Lymphs • Singlets / Lymphocytes • Singlets/Lymphoctyes • Use full gating path for column name • Singlets/Lymphocytes/Viable/CD4+/CM/IFN+
Data: What to record • Better to have more data than less data • Sample type (PBMC, whole blood) • Recovery • Viability • Better to have basic than derived data • Counts better than relative frequencies • Keep link to raw data for reproducibility • Path to FCS and workspace files on server • Use special indicator for missing data (e.g. NAN), not zero • Use as many columns as you need and name them sensibly and consistently
Data: Versioning • Do not change the data in the worksheet once it has been handed to statistician. • If there are errors that must be corrected, make a new copy, label the filename with date and version, and send that to statistician • ArcticRatExperiment_07May2013_Version01.xlsx • ArcticRatExperiment_17May2013_Version02.xlsx
Formatting • Don’t do it. • Avoid putting information via: • Highlighting • Fancy spacing • Different fonts and font effects • Merging cells • Comments • Will it survive a round-trip from Excel to CSV and back again?
Formatting - After Comments are lost Highlighting is lost Bad cell formatting is lost Merged cells become missing information
Summary of Reproducible Analysis • Know what you are doing from PBMC to Excel • SOPs are important • Annotation is important • Excel is OK if you use NONE of its features • Keep all necessary data in the same place • Keep a remote backup • Talk with your statistician
Biologist talks to Statistician http://www.youtube.com/watch?v=Hz1fyhVOjr4