BioProgramming

1 BioProgramming Jong Bhak TGI UNIST Ulsan Korea jongbhak@gmail.com 20030321

Bio감사의말 • Researchers who are honest and passionate in doing science • People who support scientific research by paying tax • MRC, Harvard, KAIST, KOBIC, TBI & Genome Research Foundation. 테라젠 고진업대표이사. • NCC, 이연수, 이진수박사, • 국가참조표준센터(채균식, 김창근박사) • KIOST이정현박사와 동료들 • 한양대학교 (류성언교수, 김덕수교수, 고인송교수) • UNIST, (조무제총장), BME교수, 지역지원자들 • UNIST학생

Two Aspects of BioProgramming 1) Bioprogramming as the natural process of information propagation in the universe 2) Bioprogramming as programming technique in bioinformatics

Bioprogramming the Universe • Programming as a key mechanism of the universe advancement

Universe is programmable jongbhak@genomics.org CopyLeft Under BioLicense

IFE: Infinitely Fractal Encapsulation jongbhak@genomics.org CopyLeft Under BioLicense

Semiconductor of Life • Nano scale chemicals and molecules are the semiconductor of life for information processing • Proteins are the key molecules for information processing.

Proteins modules for bioprogramming • Proteins: The central processing molecules of life.(15% of the mass of the average person) • Minium 20 different kinds of amino acids: Alanine ala a CH3-CH(NH2)-COOH Arginine arg r HN=C(NH2)-NH-(CH2)3-CH(NH2)-COOH Asparagine asn n H2N-CO-CH2-CH(NH2)-COOH Aspartic acid asp d HOOC-CH2-CH(NH2)-COOH Cysteine cys c HS-CH2-CH(NH2)-COOH Glutamine gln q H2N-CO-(CH2)2-CH(NH2)-COOH Glutamic acid glu e HOOC-(CH2)2-CH(NH2)-COOH Glycine gly g NH2-CH2-COOH Histidine his h NH-CH=N-CH=C-CH2-CH(NH2)-COOH Isoleucine ile i CH3-CH2-CH(CH3)-CH(NH2)-COOH Leucine leu l (CH3)2-CH-CH2-CH(NH2)-COOH Lysine lys k H2N-(CH2)4-CH(NH2)-COOH Methionine met m CH3-S-(CH2)2-CH(NH2)-COOH Phenylalanine phe f Ph-CH2-CH(NH2)-COOH Proline pro p NH-(CH2)3-CH-COOH Serine ser s HO-CH2-CH(NH2)-COOH Threonine thr t CH3-CH(OH)-CH(NH2)-COOH Tryptophan trp w Ph-NH-CH=C-CH2-CH(NH2)-COOH Tyrosine tyr y HO-p-Ph-CH2-CH(NH2)-COOH Valine val v (CH3)2-CH-CH(NH2)-COOH http://www.nyu.edu/pages/mathmol/library/life/life1.html

Amino Acids (L-form)

Types of Amino Acids Amino acids can be grouped into 4-5 different groups for Bioinformatic analysis. Most important distinctions: Hydrophobic and Hydrophilic groups Big side chain groups and Small side chain groups Cysteine  for disulphide bonding. (well conserved) Proline  structurally important Histidine  important for switching • Aliphatic - alanine glycine isoleucine leucine proline valine • Aromatic - phenylalanine tryptophan tyrosine • Acidic - aspartic acid glutamic acid • Basic - arginine histidine lysine • Hydroxylic - serine threonine • Amidic (containing amide group) - asparagine glutamine • http://chemistry.gsu.edu/glactone/PDB/Amino_Acids/aa.html

Amino Acids • CH – COO – R – NH3 (CORN law: Clockwise) Zwitterions remain when the a-amino acid is dissolved in water at pH7. Addition of an acid, supplying more protons, produces ions with a surplus positive charge:

Peptide Bond

Planes of peptide bonds

Amino Acid  Protein

Secondary Structures from A.A. • 3 main secondary structure elements often used. • In reality, there are many more types! • Different types of alpha, beta, coils….

Alpha and Beta

Supersecondary Structure

Basic knowledge for Bioinformatics (focusing on proteins). • Some basic points to be understood by biologists and non-biologists.

Protein • Life is a huge chunk of protein with various ligands attached. • Protein level is very efficient to work with for us  a Naturally distinct unit. So, favoured by bioinformatic computing.

Only 1,000 Protein shapes? Year2001 • There are less than 1,000 types of distinct shapes of protein structures known so far (called Folds)

Only 1,000 Protein superfamilies? Year2001 • There are around 1,000evolutionarily distinct protein shapes (called SuperFamily)

e.g) 11 very common superfamilies

Only around 10,000 proteins? • Perhaps not much more than 10,000 different types (representative) protein sequences in nature. *Then, where are the complexity and diversity of life come from?: Network of interactions among them. Then, where all the bio-funcitons come from?  next slide

Functional diversity  organizational and regulatory differences • Chimps and Humans are the same in terms of genetic components. Yet different species. • The English and Koreans: same genome but different  different sub-species. • Organization and regulation of information are different. Also developmental diversity. • Somehow cells ‘self-organize’ data very efficiently and effectively.

Organising Structures and Sequencesbioinformatically • A technical challenge we have is how to organize the structures and sequences of proteins • There are many different ways to organize protein sequences and structures. • PDB  1976, SCOP, CATH, FSSP,,, • Swissprot, PIR, Genbank, EST, SNP, Tremble, Enssembl,… (over 500 major biological DBs) Now we have very large scale data  next

Interaction (directionless) • Interactions do not have directions

ResultingPSIMAP Add a strong statement that summarizes how you feel or think about this topic Summarize key points you want your audience to remember PSIMAP?A Low resolution

Structure DB Practical Steps of Complete Human Interactome Predicted Human Interactome  http://hpid.org/

Global view of protein family interaction networks for 146 genomes

Signal Transduction (Pathways) • Pathways have directions

Bioinformatics Programming • BioPerl: 1994 • BioJava: 1995 • BioPHP • BioRuby • BioC++ • Bio[X]

Pragramming for Bioinformatics • Automation is the key

Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources

Pragramming for Bioinformatics • Automation is the key • Use fast prototyping • Solve problems • Reuse scripts • Share the codes with lab members • Use public resources • Use openfree resources • GitHub

Requirements • Guru level coding ability • Understanding computer hardware • Parsing ability • Text manipulation • Database • Flat file • Relational (MySQL)

What is a grammar?

What is a compiler? • A compiler is a computer program (or set of programs) that transforms source code written in a programming language (the source language) into another computer language (the target language, often having a binary form known as object code).[1] The most common reason for converting a source code is to create an executable program.

Programming modules in Python

BioProgramming

BioProgramming

Presentation Transcript