60 likes | 146 Views
CISC 841 Bioinformatics (Fall 2008) Review Session. Basics of Molecular biology. Central dogma Transcription Translation Genetic code DNA Double helix Watson-Crick binding RNA Secondary structures Genes Compositional Structure Reading frames Proteins
E N D
Basics of Molecular biology • Central dogma • Transcription • Translation • Genetic code • DNA • Double helix • Watson-Crick binding • RNA • Secondary structures • Genes • Compositional Structure • Reading frames • Proteins • Secondary structure: alpha helices, beta sheets, and coils • 3-d structure • Gene Regulations (Operons) • DNA Microarray • Cloning • PCR • Gel electrophoresis • 2D gel + MS • Yeast 2 hybrid system
Computational methods Kernel based methods • Linear SVM • Rosenblatt algorithm (Primal and dual forms) • Novikoff theorem (You do not need to memorize the proof) • Maximum margin (primal and dual form) • Lagrangian multiplier, KKT condition • Gradient descent algorithm for the dual form • Support vectors • Nonlinear SVM • Mapping to feature space (high dimension) • Kernel functions: generic kernels • Mercer’s theorem • Soft margin (slack variables) • Principal component analysis (PCA) • Dimension reduction (projection onto a few most differentiating directions) • Kernel based PCA (capable of nonlinear projection) • Binary versus multiclass classification • Applications: classifying genes based on expression profiles
Computational methods (cont’d) Bayesian networks • Joint probability, factorization based on chain rules • Bayes’ rule • Bayesian networks • Conditionally independence, D-separation, Markov condition • Model construction (scored-based, maximum posterior probability) • Parameter estimation (Maximum likelihood) • Model averaging • Bootstrap • Applications: inferring regulatory networks from gene expression data
Computational methods (cont’d) Hidden Markov models • Three major problems • Decoding • Likelihood • Training : Parameter estimation • Model structure • Incorporating domain knowledge • Genetic algorithm • Model equivalence • Mutual entropy • NP-Hard • Heuristics: quasi-consensus based • Applications: predicting transmembrane topology and classifying protein families Gradient descent algorithm Genetic algorithm Evaluation metrics • Sensitivity • Specificity • ROC
About the Exam • Time and Place: • 3:30PM-4:45PM ,Thursday, November 6. • 102A Smith Hall • closed-book • Four parts • Basics of Molecular Biology[10 points] • Kernel Based Methods [40 points] • Bayesian Networks [35 points] • Hidden Markov models [15 points]