440 likes | 457 Views
Discover the dual aspects of Bioprogramming as a natural process and a programming technique in bioinformatics. Uncover how proteins and amino acids play a crucial role in information processing at a nano scale. Dive into the complexity and diversity of life through the network of interactions among proteins. Explore how organizational and regulatory differences shape functional diversity in various species.
E N D
1 BioProgramming Jong Bhak TGI UNIST Ulsan Korea jongbhak@gmail.com 20030321
Bio감사의말 • Researchers who are honest and passionate in doing science • People who support scientific research by paying tax • MRC, Harvard, KAIST, KOBIC, TBI & Genome Research Foundation. 테라젠 고진업대표이사. • NCC, 이연수, 이진수박사, • 국가참조표준센터(채균식, 김창근박사) • KIOST이정현박사와 동료들 • 한양대학교 (류성언교수, 김덕수교수, 고인송교수) • UNIST, (조무제총장), BME교수, 지역지원자들 • UNIST학생
Two Aspects of BioProgramming 1) Bioprogramming as the natural process of information propagation in the universe 2) Bioprogramming as programming technique in bioinformatics
Bioprogramming the Universe • Programming as a key mechanism of the universe advancement
Universe is programmable jongbhak@genomics.org CopyLeft Under BioLicense
IFE: Infinitely Fractal Encapsulation jongbhak@genomics.org CopyLeft Under BioLicense
Semiconductor of Life • Nano scale chemicals and molecules are the semiconductor of life for information processing • Proteins are the key molecules for information processing.
Proteins modules for bioprogramming • Proteins: The central processing molecules of life.(15% of the mass of the average person) • Minium 20 different kinds of amino acids: Alanine ala a CH3-CH(NH2)-COOH Arginine arg r HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH Asparagine asn n H2N-CO-CH2-CH(NH2)-COOH Aspartic acid asp d HOOC-CH2-CH(NH2)-COOH Cysteine cys c HS-CH2-CH(NH2)-COOH Glutamine gln q H2N-CO-(CH2)2-CH(NH2)-COOH Glutamic acid glu e HOOC-(CH2)2-CH(NH2)-COOH Glycine gly g NH2-CH2-COOH Histidine his h NH-CH=N-CH=C-CH2-CH(NH2)-COOH Isoleucine ile i CH3-CH2-CH(CH3)-CH(NH2)-COOH Leucine leu l (CH3)2-CH-CH2-CH(NH2)-COOH Lysine lys k H2N-(CH2)4-CH(NH2)-COOH Methionine met m CH3-S-(CH2)2-CH(NH2)-COOH Phenylalanine phe f Ph-CH2-CH(NH2)-COOH Proline pro p NH-(CH2)3-CH-COOH Serine ser s HO-CH2-CH(NH2)-COOH Threonine thr t CH3-CH(OH)-CH(NH2)-COOH Tryptophan trp w Ph-NH-CH=C-CH2-CH(NH2)-COOH Tyrosine tyr y HO-p-Ph-CH2-CH(NH2)-COOH Valine val v (CH3)2-CH-CH(NH2)-COOH http://www.nyu.edu/pages/mathmol/library/life/life1.html
Types of Amino Acids Amino acids can be grouped into 4-5 different groups for Bioinformatic analysis. Most important distinctions: Hydrophobic and Hydrophilic groups Big side chain groups and Small side chain groups Cysteine for disulphide bonding. (well conserved) Proline structurally important Histidine important for switching • Aliphatic - alanine glycine isoleucine leucine proline valine • Aromatic - phenylalanine tryptophan tyrosine • Acidic - aspartic acid glutamic acid • Basic - arginine histidine lysine • Hydroxylic - serine threonine • Amidic (containing amide group) - asparagine glutamine • http://chemistry.gsu.edu/glactone/PDB/Amino_Acids/aa.html
Amino Acids • CH – COO – R – NH3 (CORN law: Clockwise) Zwitterions remain when the a-amino acid is dissolved in water at pH7. Addition of an acid, supplying more protons, produces ions with a surplus positive charge:
Secondary Structures from A.A. • 3 main secondary structure elements often used. • In reality, there are many more types! • Different types of alpha, beta, coils….
Basic knowledge for Bioinformatics (focusing on proteins). • Some basic points to be understood by biologists and non-biologists.
Protein • Life is a huge chunk of protein with various ligands attached. • Protein level is very efficient to work with for us a Naturally distinct unit. So, favoured by bioinformatic computing.
Only 1,000 Protein shapes? Year2001 • There are less than 1,000 types of distinct shapes of protein structures known so far (called Folds)
Only 1,000 Protein superfamilies? Year2001 • There are around 1,000evolutionarily distinct protein shapes (called SuperFamily)
Only around 10,000 proteins? • Perhaps not much more than 10,000 different types (representative) protein sequences in nature. *Then, where are the complexity and diversity of life come from?: Network of interactions among them. Then, where all the bio-funcitons come from? next slide
Functional diversity organizational and regulatory differences • Chimps and Humans are the same in terms of genetic components. Yet different species. • The English and Koreans: same genome but different different sub-species. • Organization and regulation of information are different. Also developmental diversity. • Somehow cells ‘self-organize’ data very efficiently and effectively.
Organising Structures and Sequencesbioinformatically • A technical challenge we have is how to organize the structures and sequences of proteins • There are many different ways to organize protein sequences and structures. • PDB 1976, SCOP, CATH, FSSP,,, • Swissprot, PIR, Genbank, EST, SNP, Tremble, Enssembl,… (over 500 major biological DBs) Now we have very large scale data next
Interaction (directionless) • Interactions do not have directions
ResultingPSIMAP Add a strong statement that summarizes how you feel or think about this topic Summarize key points you want your audience to remember PSIMAP?A Low resolution
Structure DB Practical Steps of Complete Human Interactome Predicted Human Interactome http://hpid.org/
Global view of protein family interaction networks for 146 genomes
Signal Transduction (Pathways) • Pathways have directions
Bioinformatics Programming • BioPerl: 1994 • BioJava: 1995 • BioPHP • BioRuby • BioC++ • Bio[X]
Pragramming for Bioinformatics • Automation is the key
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources
Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources • GitHub
Requirements • Guru level coding ability • Understanding computer hardware • Parsing ability • Text manipulation • Database • Flat file • Relational (MySQL)
What is a compiler? • A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code).[1] The most common reason for converting a source code is to create an executable program.