340 likes | 540 Views
Quiz #1. Describe the structure of the nucleotides and list the bases participating in their formation Living things create and maintain order by Releasing heat in the environment Absorbing light Communicating with each other None of the above. Quiz #1.
E N D
Quiz #1 • Describe the structure of the nucleotides and list the bases participating in their formation • Living things create and maintain order by • Releasing heat in the environment • Absorbing light • Communicating with each other • None of the above
Quiz #1 • Describe the goals and methods of Genomic Signal Processing • Transcription is the process of • Using the genetic code to produce protein • Copying segments of DNA into RNA strands • Attachment of DNA polymerase to DNA • All of the above
Quiz #1 • Define and describe RNA post-transcriptional modifications • A codon is • Small size protein • Part of the ribosome • a word made of 3 nucleotides • None of the above
Quiz #1 • Discuss how gene regulation is achieved • A DNA probe is • short single-stranded DNA sequence • Small RNA molecule • A specific transcription factor • None of the above
Quiz #1 • Describe the work-flow of microarray experimentation • Which variation in data has to be reduced • From the chip platform • From the imaging procedures • From the methods of hybridization • All of the above
Pattern Recognition “Mathematical, statistical, and computational methods that attempt to automatize the way humans routinely recognize familiar patterns.” • Machine Learning • Decision Theory • Pattern Classification • Pattern Analysis • Data Mining • Artificial Intelligence
Human PR is Sometimes “Too Good” Picture taken by Viking spacecraft in 1976 This is what the human mind “sees”
Applications of Pattern Recognition • Image Analysis • Remote Sensing • Medical Imaging Diagnostics • Speech Recognition • Artificial Noses/Taste Buds • Robotics • Genomic Signal Processing
Functional Genomics Here, the patterns correspond to gene expression values corresponding to 4 types of gliomas: OL, GM, AA, AO. Single genes can distinguish OL and GM types of Glioma. From: Kim et al., “Identification of Combination Gene Sets for Glioma Classification,” Molecular Cancer Therapeutics, 1:1229-1236, 2002
Functional Genomics Combination of three genes, or features, accomplishes discrimination of AO and AA
Basic Mathematical Setting of PR • In Pattern Recognition, we have: • A feature vector X, which contains relevant attributes of the observed entity (the process of obtaining X is called feature extraction/selection). • A label discrete variable Y (the “state of nature”). E.g., for binary classification, Y = {0, 1} • In a complete-information scenario, there is a function f such that Y = f(X). • Such is rarely the case, however, due to noise (sensor imprecision, latent variables, etc.)
Stochastic Setting Due to noise, the relationship between Y and X is given by a joint probability distribution FXY .
Classification Error • Therefore, there is an inevitable element of error involved in Pattern Recognition: Optimal classification error (minimum possible error). True error of designed classifier. Estimated error of designed classifier. • Assessment of classification error (called error estimation) is a key component of the PR design cycle.
Classification Example Classes: salmon or sea-bass. Y = 0 : salmon Y = 1 : sea-bass
Stochastic Setting Due to noise, the relationship between Y and X is given by a joint probability distribution FXY .
Classification Example Classes: salmon or sea-bass. Y = 0 : salmon Y = 1 : sea-bass
Classification without predictors The case where there are no predictor features to help classify the fish. The natural thing to do is to guess the fish that is most often seen, that is, the one the has the highest a-priori probability P(Y = i), for i = 0, This predictor has classification error
Classification with predictors One will always call the same kind of fish. If the fish occurrences are close to equally likely, then this is no good at all, as the classification error will be close to 0.5 (flipping a coin). Luckily, this is a very rare scenario. We almost always have access to predictor variables to help classification. These predictors are specified by a feature vector
The histograms in the previous slides are approximations to underlying class-conditional densities p(x|Y = i), for i = 0, 1.
A classifier is a function from the feature space into the binary set of labels and partitions the feature space into two regions The classification error is the probability of misclassification The feature-label distribution: the joint probability distribution
Formulate the question Organizing and cleaning data Interpretation of results Normalize data Analyze data
The question • Experimental design: 3x2x2 factorial design • How many animals? • How many arrays? • Replicates?
Organizing and Cleaning Data • Label data • Remove labels • Quality of data: missing, low or good
Normalize Data • http://www.microarrays.in/Data_Norm.html • http://www.jcvi.org/cms/research/software/ • http://www.tm4.org/midas.html • BRB Array Tools (Biometric Research Branch - NCI) • caGEDA (University of Pittsburgh) • arraytrack (NCTR Center for Toxicoinformatics) • Agilent (GeneSpring) • Applied Maths (GeneMaths XT) • Axon Instruments (Acuity 4.0 DEMO) • BioDiscovery (Nexus CGH) • BioSieve (ExpressionSieve) • CytoGenomics (SilicoCyte) • Microarray Data Analysis (GeneSifter) • Molmine (J-Express Pro) • Optimal Design (ArrayMiner) • Partek (Partek Pro) • Strand Genomics (Avadis)
The question(s) • What are the differences between gene profiles of AOM injected group and control group? • What are the changes of gene expression in time dependent manner? • What are the differences of gene profiles between fish oil group and corn oil group in AOM injected rats? (same for the olive and the corn oil groups) • Is there any protective effect of any one of the diets in terms of suppressing carcinogenesis? • Which gene is up or down regulated post AOM treatment?
The question(s) Find (and rank) groups of genes discriminating between the two different treatments (AOM vs. saline). Find (and rank) groups of genes discriminating between each possible pair of diets in the AOM treatment group. Find (and rank) groups of genes discriminating between each possible pair of diets in the saline treatment group.