600 likes | 725 Views
http://www.flint.umich.edu/ Departments/ITS/crac/ mazeorig.form.html. Maze in biology: the pathway problem. Ueng-Cheng Yang ( 楊永正) Institute of Bioinformatics National Yang-Ming University Nov. 14, 2003. fertilization. 1st cleavage. 2nd cleavage. 3rd cleavage. oogenesis. mRNA
E N D
http://www.flint.umich.edu/Departments/ITS/crac/mazeorig.form.htmlhttp://www.flint.umich.edu/Departments/ITS/crac/mazeorig.form.html Maze in biology: the pathway problem Ueng-Cheng Yang (楊永正) Institute of Bioinformatics National Yang-Ming University Nov. 14, 2003
fertilization 1st cleavage 2nd cleavage 3rd cleavage oogenesis mRNA localization oocyte 2 identical cells 4 identical cells sperm 8 cells with 2 different cell types embryonic development Genome is the complete set of genetic material, which is similar to the programs in the ROM
Gene expression of eukaryotes Picture taken from Lehninger’s “Principles of Biochemistry”
Perturbation Black box Changes in geneexpression Microarray (Gene chip) is a high-throughput technique that may measure thousands of gene expression at a time
Sequence information decompress Expression level Tissue (spatial) Development(temporal) Genes Presentation of life and knowledge management
Transform or out of the game? Global High-throughputanalysis Local Individualanalysis http://www.sciencemag.org/cgi/content/full/291/5507/1221/F1
Bioinformatics should provide the direction for future biology Genome, transcriptomeand proteome research Bioinformaticsresearch Interpretdata Collectdata tatttctctactgatttgaacaagattgtcgagaaattcccaaaacaagccgaaaaattg Data=> Information => Knowledge => Technique => Economy
Are there rules in biology? * Picture made from screenshot of http://www.shef.ac.uk/~chem/web-elements/
Variation (mutation) Gene duplication Recombination Gene duplication + Should there be rules in biology?
Pathway study is the one of most fundamental problems for biological research at molecular level • Metabolism • Signal transduction • Biosynthesis of macromolecules (mechanism study) • Replication • Transcription • RNA processing • Translation
COOH COOH CH2 CH2 CH2 CH2 C O + NAD+ + CoASH C O + CO2 + NADH+H+ COOH S CoA a-ketoglutarate succinyl CoA Similar chemistry can be re-used in different enzymes
Observation (III): “Dehydrogenation, hydration, dehydrogenation” is a pathway module acetyl CoA OAA citrate -2H release CO2 isocitrate malate TCA cycle -2H -CO2 H2O reforming the carrier a-ketoglutarate fumarate CoA -2H succinate succinyl CoA -2H -CO2 CoA + GTP
b a RCH2CH2 CH2C-S-CoA O b a RCH2CH=CHC-S-CoA OH O b a RCH2CH CH2C-S-CoA OH O RCH2C CH2C-S-CoA O O -2H RCH2CH2CH2CH2CH2CH2C-S-CoA O RCH2CH2CH2CH2C-S-CoA O RCH2CH2C-S-CoA O Acetyl CoA Acetyl CoA +H2O -2H A set of reactions can be “re-used” together
Trans-ketolase 6 Trans-aldolase 1 5 + 5 3 3 + 7 6 4 + 6 3 1 5 + 5 3 + 3 + 7 6 4 + 6 A single reaction may create a new pathway Photosynthesis Pentose phosphate cycle
The pathway problems that might be obvious to physicists Pathway simulation => hypothetical cell Flux balance analysis S-system … etc.
(-) A B C D W X Y Z (-) Complicated feedback regulation "x"(such as ADP) will accumulate if this reaction is inhibited.
M G1 S G2 M interphase M G2 G1 S Cell cycle and simulation of complex biological events
Other types of pathway problems • Pathway discovery • From protein-protein interaction and microarray • Pathway reconstruction • Genome annotation and interpretation • Pathway simulation => hypothetical cell • Flux balance analysis • S-system
DNA transcription Genomic seq. RNA translation Annotation, comparison EST, SAGE, Gene chips protein Modification, expression, interaction, structure Information integration is the first step for data mining
Small Colon Kidney Lung Ovary intestine Testis Thyroid … Total EGF 0 15 1 0 0 0 0 … 19 EGFR 3 4 19 9 0 0 0 … 103 PLCG1 1 3 7 1 2 1 0 … 68 SHC1 4 10 22 1 0 3 1 … 249 GRB2 1 1 3 2 0 0 2 … 77 SOS1 4 3 0 2 0 0 0 … 36 HRAS 1 7 10 0 2 1 0 … 58 RAF1 4 6 28 1 3 4 0 … 197 MAP3K1 2 8 2 2 0 0 0 … 44 MAP2K4 5 6 1 3 1 4 0 … 81 MAP2K1 4 10 3 2 0 2 0 … 82 MAPK8 1 2 2 0 0 1 0 … 33 STAT1 13 32 14 6 4 6 3 … 260 STAT3 3 7 17 7 0 1 0 … 135 MAPK3 9 10 9 4 1 1 0 … 181 Different cells have the same genome, but they express different set of genes after differentiation
F6P EGF CDK E2F PFK F1,6P Glycolysis Signal transduction Gene regulatory network Metabolic pathway Organizing the known information: Integrating different types of pathways
Steps in pathway discovery Factors involved => Components Molecular interaction => Events Order of events => Pathways Pathway interaction => Circuits
The dream of molecular biologists Science. Vol 292. May,2001 ? Cell., 100(1):57–70 Review, 2000. PNAS, Vol. 95, 14863-14868
Appropriate presentation format is essential for computation Nature biotechnology 20, 370-375
Nucleus Strategy Receptor inwardreconstruction cellmembrane adaptor ? ? ? connector ? X Y Z outwardreconstruction
Receptor Reconstructing pathways based on protein-protein interaction adaptor inwardreconstruction … etc.
Identifying new receptor is the starting point for inward reconstruction
The distribution of death domain containing genes in human genome 1 10 4 5 3 11 12 22 13 6 7 8 14 15 23 21 17 16 24 18 25 19 20 2 9 ? 30 28 29 27 26
16 UNC5D A 10 UNC5A 21 UNC5B 7 UNC5C 0.1 23 NFKB2 B 31 8 NFKB1 19 DAPK1 34 NY-REN-64 36 MALT1 33 IRAK2 C 35 IRAK1 26 IRAK-M 12 2 3 EDAR D 5 29 NGFR 27 CRADD 6 24 FADD 28 TRADD 11 RIPK1 E 13 TNFRSF21 32 LRDD 1 TNFRSF12 25 TNFRSF1A 14 TNFRSF10A 15 TNFRSF10B 18 TNFRSF11B 22 TNFRSF6 30 P84 4 MYD88 20 ANK3 F 17 ANK1 9 ANK2 Phylogenetic clusters correlate with protein functions
Functional correlation: Tissue specificity of gene expression Paralogous genes brain tissues
A 16 UNC5D 10 UNC5A B 21 UNC5B 7 UNC5C 0.1 23 NFKB2 31 8 NFKB1 19 DAPK1 C 34 NY-REN-64 36 MALT1 33 IRAK2 35 IRAK1 26 IRAK-M D 12 2 3 EDAR 5 29 NGFR 27 CRADD 6 24 FADD E 28 TRADD 11 RIPK1 13 TNFRSF21 32 LRDD 1 TNFRSF12 25 TNFRSF1A 14 TNFRSF10A 15 TNFRSF10B 18 TNFRSF11B 22 TNFRSF6 F 30 P84 4 MYD88 20 ANK3 17 ANK1 9 ANK2 Specificity of protein-protein interaction TNFRSF1A, 12 --- TRADD --- FADD TNFRSF6, 10A, 10B --- FADD
Nucleus Reconstructing pathways based gene expression and pathway information ? cellmembrane MAP2K4-P* MAPK8-P* MAPK8-P* outwardreconstruction Jun
Shared 18 Component 13 16 14 13 25 19 15 14 13 15 17 15 13 17 23 18 20 19 25 17 15 13 Related pathways can be discovered by looking for shared components among pathways
If PDGF receptor does not exist in colon, why do we need the downstream components in PDGF signaling pathway?
PDGF 11 EGF 11 TNF 21 EGF/PDGF 16 ALL 4 “MAP2K4, MAPK8, Jun” is a pathway module shared by at least 3 pathways
Pathway modules Stress signal Death signal Growth signal TRAF2 HRAS RAF1 (RAF) module MAP3K1 (MEKK1) module MAP3K7 (TAK) module FOS JUN ATF2 SP1 RPS6KA5 Gene expression regulation, (including transcription, splicing), translation and protein modification…
Connector Factors involved => Components Molecular interaction => Events Order of events => Pathways
Inducible gene sets are co-regulated. Picture taken from http://genomics.stanford.edu/yeast/additional_figures_link.html
Pyruvate kinase Most constitutively expressed genes are not regulated Rate-limiting step is usually the target for regulation
Microarray exp. is the nature’s way to classify genes Collect sections from different angles Image reconstruction Tomography(斷層掃瞄) http://www.npcc.gov.tw/npcc/chn/imaging/imaging.htm
ALPHA ELU CDC15 SPO HT D C DX Conflicts? In extreme environment, the whole pathway can be turned on/off ALPHA = alpha factor arrest 18; ELU = centrifugal elutriation 14; CDC15 = cdc15 ts 15; SPO = sporulation 7; HT = shock by high temp 6; D = reducing agent 4; C = low temp 4; DX = diauxic shift 7 Clustering is driven by these features
Unrelated sequences of similar function cluster together Eisen, M.B., Spellman, P.T., Brown, P.O., and Botstein, D. (1998) Cluster analysis and display of genome-wide expression pattern. Proc. Natl. Acad. Sci. USA 95, 14863-14868.
In microarray clustering hexokinase II phosphofructokinase aldolase triose phosphate isomerase GAPDH 1, 2, 3 phosphoglycerate kinase phosphoglycerate mutase Enolase II pyruvate kinase In glycolysis, in total there are 10 enzymes involved Microarray experiment only missed phospho-glucose isomerase Pyruvate (de)carboxylase and transaldolase are mis-placed Pretty good How good is the classification?
Pathway is a subset of components in a regulatory network How can we reconstruct the network from partial pathways?