280 likes | 492 Views
ADVANCED COMPUTATIONAL BIOLOGY AND BIOINFORMATICS. Ivan Ivanov Department of Veterinary Physiology and Pharmacology, Genomic Signal Processing Lab Texas A&M University Office: VMR 422C E-mail: iivanov@cvm.tamu.edu ivanzau@gmail.com. Course Goals.
E N D
ADVANCED COMPUTATIONAL BIOLOGY AND BIOINFORMATICS Ivan Ivanov Department of Veterinary Physiology and Pharmacology, Genomic Signal Processing Lab Texas A&M University Office: VMR 422C E-mail: iivanov@cvm.tamu.edu ivanzau@gmail.com
Course Goals • Understanding current microarray technology. • Normalization and quality control of microarray data. • Unsupervised and supervised analysis methods. • Analyzing microarray data together with other data. • Work-flow of microarray experimentation and using databases to store microarray data.
Course Goals • How to implement an analysis method as a computer program. • Available software for analysis of microarray data, and the limitations of using it. • Information, complexity and their interpretation in statistical modeling. • Boolean and probabilistic Boolean networks as models of genomic regulation. • Minimum Description Length (MDL) principle and its application to modeling of genomic regulatory networks.
Grading • Quizzes (2) 20% • Paper presentation 30% • Research project (last day) 30% • Final exam (last day) 20%
Biology Background • Organic chemistry • Energy considerations in biochemical reactions • Proteins • DNA • Transcription and translation • Chromosomes and gene regulation • Genetic variation • Cell division • Cell cycle control, cell death, and cancer
Organic Chemistry • Electrovalent/ionic bond := between atoms that donate/accept electrons. Very strong bonds based on electrostatic forces. Example: NaCl; Na+and Cl- • Covalent bonds := between atoms that share electrons. • Single, double bonds • Polar (-O-H) or non-polar (-C-H) • Hydrophilic, hydrophobic, and amphipathic • Water is the most abundant substance inside cells • Amphipatic molecules are crucial building blocks for all cell membranes
Organic ChemistryBuilding Blocks for Common Organic Molecules • If we disregard water, almost all of the molecules in a cell are based on C (carbon). Cells contain four major families of small organic molecules members of which bond together by covalent bonds to form giant macromolecules. • Sugars: building blocks for more complex sugars and carbohydrates. • Fatty acids: building blocks for fats, lipids and all cell membranes. • Amino acids: building blocks for proteins. • Nucleotides: building blocks for DNA, RNA, etc.
Organic ChemistryNucleotides • Nucleotide := a molecule made up of nitrogen-containing compound linked to a 5-carbon sugar(pentose) that carries one or more phosphate groups. The nitrogen-containing compounds(bases) are ring-like structures. • Two major types of bases: pyrimidines and purines • Pyramidines: cytosine (C), uracil (U) and thymine (T) • Purines: adenine (A) and guanine (G) • Pentose: ribose ordeoxyribose
Organic ChemistryNucleic Acids • Nucleoside := base + sugar • Deoxyribonucleic acid (DNA):= deoxyribose + {A, G, C, T} • Ribonucleic acid (RNA) := ribose + {A, G, C, U} • Common structure: phosphate group on the 5’ end of an incoming nucleotide links together to the 3’ hydroxil group of the sugar of the previous nucleotide via a phosphodiester bond • Functions of nucleotides: • Building blocks of DNA and RNA • Storage for chemical energy • Combine with other groups to form coenzymes • Specific signaling molecules in the cell, e.g. cyclic (AMP)
Organic ChemistryThe General Picture • Macromolecules (polysaccharides, proteins, and nucleic acids) found in cells contain a specific sequence of units (sugars, amino acids, and nucleotides); these macromolecules are formed by condensation reactions where a new unit is added to a growing chain by expulsion of a water molecule. • Noncovalent bonds such as hydrogen bonds, hydrophobic forces, etc. make makromolecules assume specific shapes. • The specific shapes as well as the noncovalent bonds allow the macromolecules to seek out their appropriate partners and undergo the required reactions with them.
Energy Considerations in Biochemical Reactions • Hydrolisisand condensation: these two reactions are the reverse of each other. Thus, it becomes important to determine the direction in which the reaction will proceed in each specific situation. • The laws of thermodynamics • Energy can neither be created nor destroyed. It can be only converted from one form to another. • The entropy of the universe can only increase. The entropy is a measure of disorder. • Living things create and maintain order – a somewhat paradoxical situation.
Energy Considerations in Biochemical Reactions • Catabolic reactions: break down foodstuffs into smaller molecules, thereby generating both useful form of energy for the cell as well as some of the small molecules needed as building blocks of the cell. • Anabolic reactions: use the energy produced by catabolism to drive the synthesis of new molecules. • Cell metabolism := catabolic + anabolic reactions. • Free energy := the measure of the amount of energy that is available to do useful work. • Energetically favorable reaction := reaction that proceeds with a decrease of the free energy.
Energy Considerations in Biochemical Reactions • Enzymes := special class of proteins that catalise chemical reactions in cells • The role of enzymes: to provide the boost over the energy barrier, so that the respective reaction can proceed. Thus enzymes lower the activation energy and the reaction can occur at room temperature. • Each enzyme binds tightly to one or two molecules called substrates, and holds them in a way that greatly reduces the reaction activation energy. Once the substrates have reacted the enzyme dissociates from the products and is free to bind additional substrate molecules.
Energy Considerations in Biochemical Reactions • Feasibility of chemical reactions: when a particular reaction is not energetically favorable it is coupled with an energetically favorable one so that the combined change of the free energy is negative. • Activated carrier molecules (coenzymes): can rapidly diffuse throughout the cell and therefore transfer energy from one site to another. Store the energy either in high energy bonds (ATP) or as high energy electrons (NADH, NADPH) • Example: adenosine tri phosphate (ATP) ATP ADP + Pi (energy release) ADP + Pi ATP (sunlight or food)
Proteins • 20 different amino acids • Protein := a long chain of amino acids, each with a different side chain; furthermore some of these side chains are basic, some acidic, some polar and hydrophilic, and some non-polar and hydrophobic. Each amino acid is linked to its neighbor by a peptide bond. • Proteins fold into different shapes because of different sets of weak noncovalent bonds: hydrogen, ionic, Van der Waals attractions, and hydrophobic forces. • Denaturation (in the presence of specific solvents) and renaturation of proteins.
Proteins • Protein fold := energy minimizing 3-D structure. • Protein size: from 30 up to 10,000 amino acids • Structural motifs resulting from hydrogen bonding between the N-H and C=O groups in the polypeptide backbone • a-helix • b-sheet
Proteins • Levels of organization of proteins • Primary structure corresponds to the amino acid sequence • Secondary structure corresponds to the existence of a-helixes and b-sheets • Tertiary structure corresponds to the 3-D conformation • Quaternary structure corresponds to studying of the complete structure of proteins made up of mare than one polypeptide chain • Variety of proteins: HUGE. For example for a polypeptide chain that is 50 amino acids long there are 20250 different chains possible. Thus the role of natural selection is crucial.
Proteins • Protein-ligand interactions • Ligands := molecules that bind to proteins • Binding site := the region of a protein that associates with a ligand • Proteins that are enzymes • Hydrolases: catalyze a hydrolytic cleavage reaction • Nucleases: break down nucleic acids • Proteases: break down proteins • Synthases: synthesize molecules in the anabolic reactions by condensing two smaller molecules together • Isomerases: catalyze rearrangement of bonds within single molecule • Polymerases: catalyze reactions such as synthesis of DNA and RNA
Proteins W X Y Z E2 E3 E1 • Control of the catalytic activities of enzymes • Gene expression • Confining sets of enzymes to particular sub-cellular components • Response to other molecules encountered by the enzyme. For example, consider the feedback inhibition
Proteins Motor proteins can produce directed movement in cells by coupling the hydrolysis of ATP to conformational changes in the protein. In the absence of ATP hydrolysis, the protein would move back and forth in different directions generating random movement. By coupling the movement to ATP hydrolysis, which is energetically favorable reaction, the movement steps are made irreversible. This is the mechanism by which DNA and RNA polymerases move along a template DNA strand
Signal ??? Wikipedia • Electrical engineering: a varying quantity that can carry information. • Cell signaling: the system of communication that governs basic cellular activities and coordinates cell actions • An electrochemical communication activity in an organism • Computing: an event, message, or data structure transmitted between computational processes
Human Genome Project (HGP) 2003 Are we done, yet? ~25, 000 genes in human DNA ~3 billion base pairs
Genomic Signals • DNA • mRNA • miRNA • protein
Scientific Knowledge • Requires a model ~ mathematical formalization • Model inferred from observations • Requires a methodology to test the model • Can inferences be made from the model?
Genomic Signal Processing(GSP) • Understanding of both the structural and functional properties of genomic regulation. • System Approach + Signal Processing • A discipline that studies the processing of genomic signals.