410 likes | 478 Views
Systems Biology . Two ways of looking a problem. Top down or bottom up Either look at the whole organism and abstract large portions of it Or try to understand each small piece and then after understanding every small piece assemble into the whole
E N D
Two ways of looking a problem • Top down or bottom up • Either look at the whole organism and abstract large portions of it • Or try to understand each small piece and then after understanding every small piece assemble into the whole • Both are used, valid and complement each other
Bottom up is traditional approach • You would study a pathway in detail not worrying about how that pathway might interact with other elements in the cell. • You would strive to understand a gene or pathway in great detail, eventually you might extend this knowledge to other organisms and compare and contrast. • With top down you need other tools...
Definitions • At a recent NIH SysBio SIG retreat almost every talk started with that speakers definition of what systems biology is. • Leroy Hood came up with the following (my summary) • As global a view as possible • Fundamentally quantitative • Different scales integrated
The Systems Biology Institute take: • Understand the structure of the system • Regulatory and biochemical networks • Understand the dynamics of the the system • Construct model with predictive capabilities • Understand the control methods
Common “themes” • Cross disciplinary • Lots of data/information/knowledge • Concepts of networks for abstract portrayal of many interaction types. • Model development • Predictive models • Models to drive experimentation • Models to understand processes
“Inner life of a Cell”SIGGRAPH 2006 showcase winner • Need to fight infection • WBC • Need to keep blood from leaking out
Requires a higher level of understanding • Many tools “feed” into this understanding • Microarrays • Homology tools (BLAST, alignments COGS) • Biochemical literature • Genomic sequence • Specialized databases • Any faults in these tools lead to problems in the analysis
A complex problem • 35,000 genes either on or off (huge simplification!) would have 2^35,000 solutions • Things can be simplified by grouping and finding key genes which regulate many other genes and genes which may only interact with one other gene • In reality there are lots of subtle interactions and non-binary states.
Some real numbers from E. coli • 630 transcription units controlled by 97 transcription factors. • 100 enzymes that catalyse more than one biochemical reaction . • 68 cases where the same reaction is catalysed by more than one enzyme. • 99 cases where one reaction participates in multiple pathways. • The regulatory network is at most 3 nodes deep. • 50 of 85 studied transcription factors do not regulate other transcription factors, lots of negative auto-regulation
Theoretical hurdles to jump • Switching delay (McAdams and Arkin 1997) • More transcripts, less protein/transcript = more energy less noise • Fewer transcripts, More protein/transcript = less energy more noise. • Selection drives this trade-off • Two critical times; how long after trigger does a protein reach a critical level how long after removal of the trigger does the protein level decline to below critical level. • How critical is the level
Conclusions from Arkin: • Simulations found 3-20 minutes from transcript to active protein. • Many processes are stochastic (random) not deterministic. • The probabilities are definitely skewed but still have long tails • This means that with a large population there are cells which may be in very different states than most of the rest of the population. • Complex interplay between regulation, lag and activity that has implications when trying to reconstruct a network.
Surviving heat shock: Control strategies for robustness and performance • Taking engineering principles and applying them to systems biology
Air conditioning • Setpoint (temperature you set) • Sensor (thermostat) • Error signal (temp exceeded) • Controller (thermostat/ac) • Actuator (ac on)
Heat shock protein • Increased heat -> mRNA -32 mRNA melting • Make 32 • Interacts with RNAP to activate specific sub-sets of genes • Make a bunch >10,000 protein copies to deal with heat
DNAK Chaperone representative Binds to 32 and degraded proteins FtsH Protease degrading 32 Titrated away by degraded proteins 32 Temperature regulation at translation Components
Need to turn off (cooler) Don’t want to activate inappropriately (energy waste) Fast response (proteins degrading) Proportional response (it’s a little hot) Why make it more difficult?
Sometimes simple is better but: Often some complexity adds desirable features Trade off between complexity, robustness, and economy Modules, reuse “Helps” evolution Can help biologist Summary
Techniques • Advanced Methods and Algorithms for Biological Networks Analysis “such questions are conventionally viewed as computationally intractable. Thus, biologists and engineers alike are often forced to resort to inefficient simulation methods or translate their problems into biologically unnatural terms in order to use available algorithms; hence the necessity for an algorithmic scalable infrastructure the systematically addresses these questions”
Problems of modeling • Compare model to data • But with complex model and large parameter set any data set can be made to fit • Could a simpler model also work • Untested parameters
Alternative to exhaustive searches • Use sum of squares to generate dynamical behavior barriers • Don’t test all possible values just see where they make a difference • Stocastic simulation is another way but • Uses months to simulate picoseconds • Robustness provides a key • Biological systems must exhibit robustness • This robustness also limits the search space
Case studies • Consistency between literature and microarray profiles. • Galactose utilization in yeast.
Case study 1: Microarrays -> regulatory networks • Long been a dream, all this data should tell me everything. • Try with E. coli: • How consistent is the literature knowledge base with the microarray expression profile • Genome Research 13:2435-2443 2003 • Literature compiled into the RegulonDB database • Correlation was significant 70-89% but…
But… • Noise filtering removed >50% of the genes on the microarray • 83/179 known regulatory genes where used the rest discarded also due to noise filtering. • Simple conditions: Minimal media, anaerobic and stationary phase growth. • 32% of the 83 where always off. • Fell to ~40% if effector metabolites not considered.
Case study 2: Figure out Galactose utilization in yeast • Classic last line: “As technologies for cellular perturbation and global measurement mature, these approaches will soon become feasible in higher eukaryotes” • Combines: literature knowledge, microarray, proteomics, visualization, and network techniques to refine what is known about galactose utilization in yeast. • Science 292:929-934
Utilization of galactose is well studied 1625 papers in PubMed dating back to the 1950s. • Simple process get Galactose into the cell then modify this sugar into the more usable form of glucose-6-P; don’t waste a lot of energy doing it if: (1.) there is no gal or (2.) you have plenty of glucose.
The Process • Define all genes in the genome, particularly the subset of genes and other small molecules that are involved in the gal pathway (DONE) • For each gene or condition change (ie delete the gene) and measure the global effect on both mRNA and protein levels. • Integrate the changes in respect to the first point with all known protein-protein and protein-DNA networks • Form new hypothesises and test
Blue line (pp) Yellow line (pd) Visualizing the data
Networks the “system” of systems biology • Humans produce some pretty complex structures. • Computer chips • Oil refineries • Airplanes • The goals for these structures are similar to life forms • Survive • Do it at a cheap cost • Reproduce/evolve??
Basic network terminology • Nodes • Edges • Scale-free • Power laws • Exponential/Random networks • Robustness • Ability to respond to different conditions • Robust yet fragile • Complexity • Not the number of parts… consider a lump of coal • The number of different parts AND the organization of those parts
Graph theory, networks • Two types of networks • Exponential and scale free • Most cellular networks are scale free • It makes the most sense to study the interactions of the central nodes not the outer nodes
High Throughput data sources • Microarray data • Already well covered in the last couple of weeks. • Probably the most mature • Proteomics • Several processes • Separation of the products • Digest the products • Find the mass of the products • Problems • Contamination • Phosphorylation, glycosylation, Acylation, methylation, cleavage.
Cytoscape • Software tool to manage data and develop predictive models (Genome Research Shannon et al. 2003) • Not directed specifically to a cellular process or disease pathway • Combine • Protein-protein interactions • RNA expression • Genetic interactions • Protein-dna interactions • Protein abundance • Protein phosphorylation • Metabolite concentrations • Integrate (global) molecular interactions and state measurements. • Organized around a network graph