130 likes | 149 Views
Learn the core concepts of bioinformatics, including computing with sequences and structures, macromolecular and cellular simulation, next-gen sequencing, and biological databases. Explore algorithms for comparison, prediction techniques, and the importance of integration and interoperation. Discover the historical perspective of bioinformatics and its applications in molecular biology information.
E N D
BIOINFORMATICSSummary Mark Gerstein, Yale University gersteinlab.org/courses/452 (last edit in spring '10, not including in-class changes)
You'll Forget… [From S Harris's Science Cartoons, http://www.sciencecartoonsplus.com]
What is Bioinformatics? • (Molecular)Bio - informatics • One idea for a definition?Bioinformatics is conceptualizing biology in terms of molecules (in the sense of physical-chemistry) and then applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale. • Bioinformatics is “MIS” for Molecular Biology Information. It is a practical discipline with many applications.
Sequences Structures FunctionalGenomics Data Types
"Core" Bioinformatics • Core Stuff • Computing with sequences and structures • Macromolecular Simulation • Cellular Simulation • Next-gen Sequencing and Personal Genomics • Biological databases and unsupervised mining of them • What we missed • Supervised mining techniques • Network analysis
Hierarchical Structure of Course Information • Memorize the previous summary • Good familarity with main points in lectures (quizzes) • Rest of overheads and readings for reference on projects and …
Cross-cutting Themes • Algorithms for Comparison • Dynamic programming • Different measures of similarity(RMS vs. Structural similarity; PAM & Blossum vs %ID) • Generalized similarity matrix in threading • Statistical scoring schemes (with P-values) • For sequences, structures, sequence to structure, and even expression data • Time complexity of the comparisons • Predictions • LOD scores (# with features / expectation ) • Progressive more complex features • Amount of features information IN vs. prediction OUT • Testing against benchmarks with cross-validation(sec. struc. prediction, seq. comparison scoring, datamining) • Other methods, need for heuristics
Cross-cutting Themes • Increasing the chemically reality and complexity of genes • Character strings, fold (just CAs), volumes and surfaces from all atom representation, energy and minimization, dynamics (time and velocity) • Simulation • Vector configuration boiled down a scalar E through potential • Compute intensive exploration of configurations (MC, MD) • Averages over correctly weighted configurations • Importance of simplification • The Survey Mode • Collecting information in DB tables • Importance of integration and interoperation • Doing datamining to find more tenuous relationships
Single Structures Modeling & Geometry Forces & Simulation Docking Sequences, Sequence-Structure Relationships Alignment Structure Prediction Fold recognition Genomics Dealing with many sequences Gene finding & Genome Annotation Databases Integrative Analysis Expression & Proteomics Data Datamining Simulation again…. 1980 1985 1990 1995 2000 2005 Historical Perspective
(from CooperToons, http://members.aol.com/ChipCooper/cartoon26.html)