800 likes | 1.93k Views
Introduction to Systems Biology. Tom MacCarthy Math Tower 1-101 maccarth@ams.sunysb.edu Office hours Tue/Fri 10-12 Course website: www.ams.sunysb.edu/faculty/~maccarth. Systems Biology. Systems Biology implies holistic (whole system) view of biological systems
E N D
Introduction to Systems Biology Tom MacCarthy Math Tower 1-101 maccarth@ams.sunysb.edu Office hours Tue/Fri 10-12 Course website: www.ams.sunysb.edu/faculty/~maccarth
Systems Biology • Systems Biology implies holistic (whole system) view of biological systems • The study of the interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system • Antithesis of reductionist approach (study of components in isolation) • In practice, Systems Biology usually involves • Mathematical modeling • Generating (lots of) experimental data • Statistical data analysis • Here we will be dealing with computational aspects
Growth of biological data • First draft human genome published June 26, 2000 • Cost: approximately $3 billion • Consists of ~3 billion nucleotides (C,G,A,T) • First phase of 1000 genomes project published Oct 28, 2010 • Cost: $30 - 50 million
Growth of biological data The reduced cost of sequencing las led to many other projects such as: The Cancer Genome Atlas The 1000 Plant Genomes project Comparable increases in other types of data, for example gene expression data, now increasingly performed via sequencing technologies.
Systems Biology • The availability of large and varied amounts of biological data has created a need for computational tools for manipulation and analysis. • Mathematical modeling can be used to generate or test novel hypotheses • Example: Transcription factor networks in blood cell differentiation *** This course is introductory and inter-disciplinary therefore my apologies to specialists ***
DNA → RNA → Protein • Most cells contain DNA (deoxyribonucleic acid) • Genes are segments of DNA thatcontainthenecessary • informationformakingproteins. • Proteins are moleculeswithspecificcellularfunctions
Gene regulation TF1 Gene expression TF2 TF3 TF4 • At any given moment genes may or may not be producing protein • Proteins called transcription factors (TFs) control the level of activation (or “expression”) of each gene. • Gene have regulatory regions which contain short DNA sequences (or “motifs”) that are recognized by the TFs. • → in this way TFs activate or repress gene expression
TF Z Gene regulatory networks • Transcription factors themselves are proteins • They are activated/repressed by other TFs (or by themselves) • In this way they form gene regulatory networks activation TF binding site Gene X coding region X TF Y Transcription/ Translation Intermediate/s activation/ repression TF X Y TF binding site Gene Y coding region
Blood cell differentiation During blood cell differentiation GATA-1 and PU.1 are transcription factors that control erythroid and myeloid development, respectively. The two proteins have been shown to function in an antagonistic fashion, with GATA-1 repressing PU.1 activity during erythropoiesis (red blood cells) and PU.1 repressing GATA-1 function during myelopoiesis(macrophages, etc.)
Where are GATA-1 and Pu.1 binding? ChIP-Seq was used to detect where in the (mouse genome) GATA-1 and Pu.1 are binding
Where are GATA-1 and Pu.1 binding? Find 151 myelo-lymphoid genes that are occupied by GATA-1 and PU.1 and that are positively regulated by PU.1 and repressed by GATA-1, for example:
Mathematical modeling • Already known that GATA-1 and Pu.1 are mutually antagonistic. • Also known before that Pu.1 represses GATA-1 targets. • Last piece of puzzle: GATA-1 also represses Pu.1 targets • Question: What are the consequences of mutual repression of the targets on gene expression dynamics? • Can compare a mathematical model with and without the repression of the targets
Mathematical modeling • A system of four coupled non-linear ordinary differential equations is used to model the GATA-1-PU.1 regulatory network GATA-1 Pu.1 GATA-1 target Pu.1 target We manipulated the rate constants to evaluate the different network architectures. For example, as we increase Kir→ ∞ then mutual antagonism (GATA-1↔Pu.1) disappears. Similarly, Kit modulates the cross-regulation of targets
Mathematical modeling We used matlab to simulate the system
Mathematical modeling Systematically modulated the mutual antagomism between GATA-1 and Pu.1 mutual antagonism between the targets For every point in the plane we evaluate the steady state ratio gT/pT The model behavior illustrates that mutual inhibition and repression of opposing downstream targets act synergistically to maximize the GT/PT ratio
Systems Biology in practice These results suggest that the dual mechanism provides, in comparison to either cross-inhibition or target inhibition alone, more robust suppression of an alternative gene expression program during lineage-specification. The example illustrates the highly multi-disciplinary nature of much modern biological research, here combining: 1. High-throughput techniques (ChIP-Seq) 2. Data analysis 3. Mathematical modeling to test hypothesis Many times, the hypothesis might come first from the mathematical model
Why Matlab and R? • Computational tools are indispensable for doing this kind of research • In many cases students are held back by lack of computational skills • Matlab and R are both interpreted languages, i.e. no compiler • This makes them slower than compiled languages • Both have an enormous number of extension packages • Octave is free Matlab “clone” and is available for Windows, Mac and Linux • Both languages can be used interactively, but it is more powerful to write programs.
Matlab • Advantages • Matlab allows one to easily perform numerical calculations and visualize the results. • Many additional libraries for statistics, signal processing, image processing, etc. • Note Matlab has Symbolic Toolbox, Octave does not • Disadvantages • Slow, but can be improved via vectorization • Matlab not good for complex software projects (not OO)
Octave download and libraries To download octave for your home PC or laptop, go to: http://octave.sourceforge.net/ To install a package, from within octave, run: pkg install package_file_name.tar.gz For list of packages choose “Packages” from top menu:
Course outline • 1. Learning to program in Matlab/octave • 2. Applications in Mathematical Biology, including: • Elementary image processing • Linear regression • Markov processes and Fisher-Wright model • Difference equations • Ordinary differential equations • 3. R programming • 4. Statistics and Bioinformatics using R • Linear models • Statistical hypothesis testing and linear models • Expression data analysis • Analysis of high-throughput sequencing data
Further reading • There isn’t yet a good Systems Biology textbook that I’m aware of. • I do not recommend this one →
Further reading • “MATLAB Programming for Engineers” by Stephen J. Chapman (Brooks/Cole) • “Mathematical Models in Biology” by Elizabeth S. Allman and John A. Rhodes (Cambridge Univ Press) • “Introductory Statistics with R” by Peter Dalgaard (Springer) • “Bioinformatics and Functional Genomics” by Jonathan Pevsner (Wiley)
Octave • To start octave, open a terminal window and enter the command “octave”
Octave basics • Getting help • Within octave type • help <command>, e.g. “help sort” • User-friendly online help available at http://www.mathworks.com/help/techdoc/ • GNU Octave help: http://www.gnu.org/software/octave/doc/interpreter/
Octave basics • Files and directories • A MATLAB script file (Called an M-file) is a text (plain ASCII) file that contains one or more MATLAB commands and, optionally, comments. • The file is saved with the extension ".m". • When the filename (without the extension) is issued as a command in MATLAB, the file is opened, read, and the commands are executed as if input from the keyboard. • Download the file calc_area.m from the course website • http://www.ams.stonybrook.edu/~maccarth/teaching.shtml • Place the file in subdirectory “work”
MATLAB Script Files • The preceding file is executed by issuing a MATLAB command: >> calc_area • This single command causes MATLAB to look in the current directory, and if a file calc_area.m is found, open it and execute all of the commands. • If MATLAB cannot find the file in the current working directory, an error message will appear.
MATLAB Script Files • When the file is not in the current working directory, a cd or chdir command may be issued to change the directory. >> cd ~/work >> calc_area
Octave basics • The search path • Matlab/Octave also uses a search path to find M-files • The m-files are organized in directories which matlab searches • To add a directory to the search path: • addpath(‘<directory_name>’); e.g. addpath(‘~/work’) • savepath; • You should now be able to run calc_area.m even if it is not your current directory, simply type: • calc_area • Now open the file calc_area.m with ‘gedit’ • Applications – Accessories – Text Editor • Change the radius to 3 and re-run ‘calc_area’