270 likes | 461 Views
HEEBO/MEEBO arrays in SMD . Entering doping control data into SMDQuality control graphsSynthetic gene tool to compare data from cDNA and HEEBO/MEEBO arraysMerge pcl files toolCurrent state of annotation. Source:http://alizadehlab.stanford.eduSequence_id: hSQnnnnnnIn SMD: cloneid meaningless, but unique to a given oligo sequenceOligo_id: hXXnnnnnn unique to a well, not a sequence)in SMD: oligo_idthe XX codes have meaning:.
E N D
1. Molecular Profiling Colloqium Janos Demeter
December 15, 2006
2. HEEBO/MEEBO arrays in SMD
Entering doping control data into SMD
Quality control graphs
Synthetic gene tool to compare data from cDNA and HEEBO/MEEBO arrays
Merge pcl files tool
Current state of annotation
3. Source:http://alizadehlab.stanford.edu
Sequence_id: hSQnnnnnn
In SMD: cloneid
meaningless, but unique to a given oligo sequence
Oligo_id: hXXnnnnnn unique to a well, not a sequence)
in SMD: oligo_id
the XX codes have meaning: HEEBO/MEEBO arrays in SMD:Nomenclature
4. HEEBO/MEEBO arrays in SMD:Connect to SOURCE with oligo/seqID
5. HEEBO/MEEBO arrays in SMD: Entering doping control data Heebo/meebo arrays contain a lot of various controls
To take advantage of the doping controls, it is essential to know the amounts that were added to your samples
SFGF tells you how much is in 1 microliter of doping control mix, but amplification/ dilution might change that
SMD needs to know how much you add in the sample compared to how much SFGF tells you to add
Added problem: 4 tubes from SFGF: MJ and Ambion_Stratagene, Cy3 and Cy5
6. HEEBO/MEEBO arrays in SMD: Entering doping control data Experiment entry form can capture all this
DCV2.1 = DCV2.1_Ambion_Stratagene + DCV2.1_MJ
If no amplification, follow SFGF suggestion, enter:
DCV2.1, factor1=1, factor2=1
If amplified/diluted controls, enter values for each tube:
DCV2.1_MJ, factor1=1.5, factor2=1.6
DCV2.1_A_S, factor1=1.932, factor2=0.8
7. Heebo/meebo arrays in SMD: Entering doping control data Experiment entry form can capture all this
DCV2.1 = DCV2.1_Ambion_Stratagene + DCV2.1_MJ
If no amplification, follow SFGF suggestion, enter:
DCV2.1, factor1=1, factor2=1
If amplified/diluted controls, enter values for each tube:
DCV2.1_MJ, factor1=1.5, factor2=1.6
DCV2.1_A_S, factor1=1.932, factor2=0.8
8. HEEBO/MEEBO arrays in SMD: Quality control graphs HEEBO/MEEBO quality assessment graphs from BioConductor package (Agnes Paquet/UCSF)
Per array graphs that use doping, tiling mismatch and negative controls
For batch/uploaded gpr files: can be reached from main page
For individual expts: from data display page
For new expts with doping control: graphs are automatically created at data loading
The last set of graphs are available from view expt page
9. HEEBO/MEEBO arrays in SMD:Quality control graphs Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column
In batch for a result set list on loader.stanford.edu
If called for a specific experiment, the values are already filled in.
Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs
Background subtraction methods - same story as normalization
Job is placed in the job-queue - email is sent with link
10. HEEBO/MEEBO arrays in SMD :Quality control graphs Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column
In batch for a result set list on loader.stanford.edu
If called for a specific experiment, the values are already filled in.
Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs
Background subtraction methods - same story as normalization
Job is placed in the job-queue - email is sent with link
11. HEEBO/MEEBO arrays in SMD:Quality control graphs Can be used for a gpr file uploaded from desktop - print has to be present in SMD and oligo_ids in the id/name column
In batch for a result set list on loader.stanford.edu
If called for a specific experiment, the values are already filled in.
Normalization options available from limma. Note that this will NOT change your data in SMD, but is only used to generate the graphs
Background subtraction methods - same story as normalization
Job enqueued in the job-queue - email is sent with link
13. MA-plots before and after normalization
A = 1/2*(log2(Cy5) + log2(Cy3))
M = log2(Cy5 / Cy3)
Loess lines are shown for sectors if print-tip normalization was selected
Distribution should be centered around M=0, with no intensity dependence
14. Tiling probes were designed along the transcript: 17 human genes (actin - 6 … LRP1 - 89 oligos
Non-normalized signal intensities (Cy5 and Cy3) vs. probe’s distance from 3’-end
Quick drop in signal indicates problem in sample (degradation/ivt)
15. Mismatch and tiling probes are used to test the degree of cross-hybridization among homologous probes
Mutations are anchored (at the extremities) or distributed (along transcript)
Calculated binding energies vs. normalized (i.e. divided by median of corresponding wild type probes) raw intensities
16. Observed vs. expected log-ratios (normalized and bg corrected) for each doping control group
Ratios should be aligned on the diagonal
Graphs for individual doping controls as well
Shows the range where the log(mass ratio) vs. log(intensity ratio) is linear
17. HEEBO/MEEBO arrays in SMD:Synthetic gene tool There is a help page for using synthetic gene tool:
http://smd.stanford.edu/help/synthGenes.shtml
A "synthetic gene" is a group of "reporters" (clones, oligos, ORFs, etc.), together with some method of combining their expression vectors. Very useful tool, great flexibility in combining data rows.
One use of it: compare data from various platforms, e.g. oligo to cDNA prints.
Available from repository and applicable to a pcl file.
18. HEEBO/MEEBO arrays in SMD:Synthetic gene tool How to use it to compare heebo and cDNA arrays?:
Select experiments from cDNA and heebo prints
Selected biological annotation is not important for collapsing data
What is important: include uid
Save the pcl file in your repository
19. HEEBO/MEEBO arrays in SMD:Synthetic gene tool Pcl file sorted by name column
synthetic gene tool only looks at the first column
20. HEEBO/MEEBO arrays in SMD:Synthetic gene tool To access the tool, click the “synth” icon in the repository
Rows can be collapsed based on a number of prepared lists - now LocusLink should be selected
The default option will remove the original ids and annotations and replace the rows with the average
21. HEEBO/MEEBO arrays in SMD:Synthetic gene tool The default option averages the rows and removes the original annotations
22. HEEBO/MEEBO arrays in SMD:Synthetic gene tool Collapse of rows by any arbitrary grouping of genes
Prepared lists are available for
chromosomal locations
cytobands
locusid
clusterid
transcript length groups
cancer modules (E. Segal)
tissue types
processes
any other genelist in user’s genelist directory on loader
Name of genelist will become the name of synthetic gene.
Individual reporters can be weighted ( -1 to 1 )
23. HEEBO/MEEBO arrays in SMD:Synthetic gene tool Average rows (reporters) by synthetic gene and:
don’t remove original data rows
remove averaged data rows (but keep the ones that don’t belong to any synth gene)
remove all original data rows
Don’t average, only annotate the rows with synthetic gene annotation (prepend name column):
keep/don’t keep original annotation
24. HEEBO/MEEBO arrays in SMD:Merge PCL files Combine two (or more) pcl files into single pcl file
files can be on the desktop or in repository
In the process:
average (optionally) columns (experiments) with the same name
average (optionally) rows (genes) based on a translation file
Averaging can be mean or median
25. HEEBO/MEEBO arrays in SMD:Merge PCL files
26. HEEBO/MEEBO arrays in SMD:State of annotation Meebo: anotation complete and is in SMD
Heebo: anotation complete, but some oligo annotations are not in SMD yet.
Annotations: geneid (locusid)
gene name
gene symbol
chromosome location (in gff file)
GB accession (RefSeq/est)
Problem: ~500 oligos are annotated to more than one gene (~1000 spots involved) - these cases can’t be correctly represented in the database currently. The fields that have conflict are not entered into SMD.
27. For each sequence (sequence_id) we can have only one set of annotations.
We have developed a new biosequence schema for SMD, to model the relationships between sequences, genes and genomes in a more biologically meaningful manner. Among other things, the new schema will allow us to map one sequence to more than one gene.
We are currently migrating existing sequence annotations to tables using the new biosequence schema. Once this is finished (soon), all the biological annotations for the HEEBO arrays will be available in SMD. HEEBO/MEEBO arrays in SMD:State of annotation
28. Updates
Genome coordinates: When a new genome version is released, the oligos need to be BLASTed anew (last time: spring of 2005, meebo: 2004) to find the coordinates of oligos. New releases have been made 1-3 times a year. Result: oligos to chromosomal locations.
Biological annotations: Annotations need to be updated to capture new knowledge. Result: chromosomal coordinates to genes.
Currently, no updates are done for the sequences on the HEEBO/MEEBO arrays. They will be worked out after we have the new biosequence tables in place.
HEEBO/MEEBO arrays in SMD:State of annotation