690 likes | 757 Views
Bioinformatics is about mapping the biological universe using certain parameters. Understand the essence and complexities of life within the universe through the study of energy, information, evolution, and more. Explore the relationship between physical reality and information, and how bioinformatics aids in deciphering the patterns and complexities of biological systems.
E N D
1 Bioinformatics Principles 박종화 Jong Bhak 朴鍾和 KOGIC UNIST Ulsan Korea jongbhak@gmail.com 20160509
Bio감사의말 • Researchers who are honest and passionate in doing science • People who support scientific research by paying tax • MRC, Harvard, KAIST, KOBIC, TBI & Genome Research Foundation. 테라젠 고진업대표이사. • NCC, 이연수, 이진수박사, • 국가참조표준센터(채균식, 김창근박사) • 해양연 이정현박사와 동료들 • 한양대학교 (류성언교수, 김덕수교수, 고인송교수) • UNIST, (조무제, 정무영총장), BME교수, 지역지원자들 • UNIST students
World Map: Land Area 육지면적
What map? No. of PCs
Whatmap ? Infant death rate per
Bioinformatics Is about Mapping X is IEP Y is Size Old map Is not Accurate. However It helps People to Explore. Data Information Knowledge Gangnido 1402
Lesson • Depending on the parameters we use, the world and problems can be interpreted differently. • Bioinformatics is mapping the biological universe using certain bioinformatics parameters
First principles first • Biological systems are within the universe • It is necessary to understand the universe to understand life
What is essence of the universe?: information jongbhak@genomics.org CopyLeft Under BioLicense http://biolicense.org
Assumption/Hypothesis: • The essence of the universe is switching. The existences are the simulatenous instances of essence.
Assumption/Hypothesis: • The fundamental elements of the universe are of information. • Physical objects are representations, reflections, or virtuals of information. • Basic entities are of inforamtion: energy/matters, space, and time are derivatives of information.
Assumption/Hypothesis: • The physical reality is the product of information. • The physical world is numerical and mathematical representation of information. • Information precedes three dimentional physical world. • The universe is perfectly computable as it is an instance of computing.
So, what is the universe? • The universe is the largest set of switches
“Para-Programmed”Meta-Programmed Universe? jongbhak@genomics.org CopyLeft Under BioLicense
Physical universe is an instance of the informational universe
Schematic representation of para-determined information universe
What is Life? Life is a set of switches jongbhak@genomics.org CopyLeft Under BioLicense
Switching State Change • Switching is the result of state change • The concept of time invented (Dynamics) jongbhak@genomics.org CopyLeft Under BioLicense
IFE: Infinitely Fractal Encapsulation jongbhak@genomics.org CopyLeft Under BioLicense
Reproducing chemicals Human • We could be all computers • The Earth is a gigantic computer jongbhak@genomics.org CopyLeft Under BioLicense
Fundamentals for bioinformatics • Required: Thoughts on Philosophy, Society, Science, and Biology in general.
Life is Complex(?) • Complex Homo sapiens live in layers of complex systems: multi-cellular, multi-organismal, multi-societal and multi-cultural. • Similar patterns occur again again in different layers? • The main problem is what the general patterns are in the infinite number of biological layers. • Bioinformatics is the Study of Complexity
The core of Biology • Biology is an information science over energy metabolism • Two important things: • ENERGY and INFORMATION • Johann G. Mendel’s (1822-1884) genetic work on peas (Pisum) bioinformatic analysis, modelling and prediction. • Perhaps he is the first well-known classical bioinformatist in history.
Evolution, • Charles Darwin’s [1809 - 1882] is one of very few general principle biology has. The process of evolution is often applied to technology nowadays. Virtually every aspect of Biological information processing is concerned about evolution. Evolutionary theories also provide the third element in Biology: time. • Bioinformatics deals with evolution in Biology, all the time.
Long Definition of Bioinformatics Is a discipline of Science that analyses, seeks understanding and models Life as an Information Processing phoenomenon over Energy with methods derived from philosophy, mathematics and computer science using biological experimental data. - Jong Bhak, 2000
Short Definition of Bioinformatics • Bioinformatics is Biology and Biology is Bioinformatics - Jong Bhak, 2000
Brief history of Biology • Darwin • Mendel • 3D Proteins • DNA model • Sequencing(Sanger) • Cloning Recombination • Amplification Technologies • Human Reference Genome • Next Gen. Sequencing/Personal Genomics • Diagnostics using Genomic data • Synthetic Biology • Genome Engineering • Cancer Cure 2022 • Aging Cure 2042
Darwin (evolution).. Mendel (genetic analysis).. DNA codon anticodon peptide개념정리 Bioinformatics ·Structural genomics • Comparative genomics ·Sequencing ·Functional genomics, Interactomics ·SNP, SAP ·Proteomics (Mass spec. protein chip) ·DataBases ·Computational analysis Methodology DNA modelling (Watson & Crick, 1953): Molecular Biology Hemoglobin Myoglobin(Max Perutz,John Kentreu) (Structure and sequence relationship) structure sequence Computational Methodology (Chris Sander, Arther Lesk,… ) Genome sequencing (F.sanger) Full genome sequencing Dynamic programming Sequence comparison app module개발 (Niedleman & Bunsche) DB 구축개시(Gen Bank, PIR, ...) DNA chip & Microarray technology Southern blot Hybridization methology Functions Computer INTERNET
Bioinformatics in time • Last decades: heavily driven by structural studies such as protein folding problem and structural comparisons/classifications/molecular analyses. • A recent shift toward sequence, databases, software, computation, commercialization and functions of proteins. Mid 1990s. • A leap of life : BioInternet early 1990s. Life managed to connect humans as neurons did some time ago.
Bioinformatics in time • The most important contemporary problem: • Explaining complex systems of biology functionally and evolutionarily. Major fields of Bioinformatics: next page.
Major Domains of Bioinformatics • Sequence • Structure • Expression • Interaction • Function
Bioinformatics • Sequence • Genomics, Comparative Genomics • Structure • Structural Genomics, Structural Proteomics • Biophysics • Expression • Functional Genomics, Proteomics • Interaction • Proteomics, Interactomics • Function • Physiomics, Metabolomics
Major Parts of Bioinformatics: • Computing • (1) Structural studies, (2) sequence analysis, (3) molecular interactions (4) functional analysis of genes, proteins and their ligands (Large scale expression analysis: DNA chips, microarray ) • (5) Algorithm development ( Mathematical and physical calculation programs.Bioperl, BioJava, BioXML, BioPython, BioCPP, CGI programming ), Network and middleware programs. BioInfrastructure • (6) Database construction (Relational databases, Object oriented databases). Medical informatics • (7) large scale data mining (artificial intelligence approach), • (8) Complex systems and network analysis • (9) Various prediction methods. • (10) Visualization of large and complex data. • (11) Large computer systems construction (hardware) and administration. • (12) OS, Compiler, Microprocessor optimization for bioanalyses • (13) Socio-economic modelling of life • (14) neuronal and psychological description of complex organisms • (15) designing and engineering cells and organisms
Applied Fields of Bioinformatics • Sequencing related • Gene prediction, gene mapping, annotation, visualization • Genomics • Structural Genomics, • Functional Genomics (proteomics, interactomics) • Comparative genomics • SNP (single nucleotide polymorphism) , SAP (single amio…) • Proteomics • Mass spec, Protein Chip, Protein Interaction • Interactomics (Network Biology) • Complex systems (Network Biology) approach • Neuroinformatics (neurological informatics) • Medinformatics (medical informatics)
Adding one more dimension? How to map/compute RNA expressions In relation with bio-function? 6 billion persons 6billion Bases 1,000,000 RNA expression
Adding even more dimension? How to map/compute Phenome? 6 billion persons 6billion Bases 1,000,000 Phenotypes 1,000,000 RNA expression
How to map/compute epigenome? 6 billion persons 1,000,000 epigenetic variation 6billion Bases 1,000,000 Phenotypes 1,000,000 RNA expression
How to map/compute Microbiome? 6 billion persons 100,000 microbes 1,000,000 epigenetic variation 6billion Bases 1,000,000 Phenotypes 1,000,000 RNA expression
How to map/compute Proteome? 6 billion persons 10,000,000 epigenetic variation 1,000,000 microbes 100,000 단백질 6billion Bases 1,000,000 Phenotypes 1,000,000 RNA expression
Bioinformatic problems boil down to: • Representation of data.
Ways of representing BioEntities • Sequence • Structure • Expression levels • Pathways • Function • Networks
Computer • The universe is computable?
Very Basic information for non-biologists. • Elementary biological information on proteins etc. • Only for non-biologists!
Proteins • Proteins: The central processing molecules of life.(15% of the mass of the average person) • Minium 20 different kinds of amino acids: Alanine ala a CH3-CH(NH2)-COOH Arginine arg r HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH Asparagine asn n H2N-CO-CH2-CH(NH2)-COOH Aspartic acid asp d HOOC-CH2-CH(NH2)-COOH Cysteine cys c HS-CH2-CH(NH2)-COOH Glutamine gln q H2N-CO-(CH2)2-CH(NH2)-COOH Glutamic acid glu e HOOC-(CH2)2-CH(NH2)-COOH Glycine gly g NH2-CH2-COOH Histidine his h NH-CH=N-CH=C-CH2-CH(NH2)-COOH Isoleucine ile i CH3-CH2-CH(CH3)-CH(NH2)-COOH Leucine leu l (CH3)2-CH-CH2-CH(NH2)-COOH Lysine lys k H2N-(CH2)4-CH(NH2)-COOH Methionine met m CH3-S-(CH2)2-CH(NH2)-COOH Phenylalanine phe f Ph-CH2-CH(NH2)-COOH Proline pro p NH-(CH2)3-CH-COOH Serine ser s HO-CH2-CH(NH2)-COOH Threonine thr t CH3-CH(OH)-CH(NH2)-COOH Tryptophan trp w Ph-NH-CH=C-CH2-CH(NH2)-COOH Tyrosine tyr y HO-p-Ph-CH2-CH(NH2)-COOH Valine val v (CH3)2-CH-CH(NH2)-COOH http://www.nyu.edu/pages/mathmol/library/life/life1.html
Types of Amino Acids Amino acids can be grouped into 4-5 different groups for Bioinformatic analysis. Most important distinctions: Hydrophobic and Hydrophilic groups Big side chain groups and Small side chain groups Cysteine for disulphide bonding. (well conserved) Proline structurally important Histidine important for switching • Aliphatic - alanine glycine isoleucine leucine proline valine • Aromatic - phenylalanine tryptophan tyrosine • Acidic - aspartic acid glutamic acid • Basic - arginine histidine lysine • Hydroxylic - serine threonine • Amidic (containing amide group) - asparagine glutamine • http://chemistry.gsu.edu/glactone/PDB/Amino_Acids/aa.html
Amino Acids • CH – COO – R – NH3 (CORN law: Clockwise) Zwitterions remain when the a-amino acid is dissolved in water at pH7. Addition of an acid, supplying more protons, produces ions with a surplus positive charge: