280 likes | 338 Views
Gene Expression. Molecules in the Cell. The most common molecule in cells is water, which is the universal solvent that all the other molecules are dissolved in. Various small ions, dissolved salts, keep the cell in osmotic balance.
E N D
Molecules in the Cell • The most common molecule in cells is water, which is the universal solvent that all the other molecules are dissolved in. • Various small ions, dissolved salts, keep the cell in osmotic balance. • The main positively charged ions are sodium (Na+) potassium (K+), magnesium (Mg2+) , and calcium (Ca2+). • The main negative ions are chloride (Cl-) , bicarbonate (HCO3-) , and phosphate (PO4-). • Four main classes of macromolecule: nucleic acids, proteins, polysaccharides, and lipids. These molecules are usually in the form of polymers, long chains of similar subunits, which are called monomers. • Miscellaneous “small” molecules that act as helpers (co-factors) in enzymatic reactions. Many of these are vitamins.
Carbohydrates • Sugars and starches: “saccharides”. • The name “carbohydrate” comes from the approximate composition: a ratio of 1 carbon to 2 hydrogens to one oxygen (CH2O). For instance the sugar glucose is C6H12O6. • Carbohydrates are composed of rings of 5 or 6 carbons, with alcohol (-OH) groups attached. This makes most carbohydrates water-soluble. • Carbohydrates are used for energy production and storage, and for structure. • Glucose, a simple 6-carbon sugar, is the primary fuel source for most living things. It is broken down by the process of glycolysis. • Starches are glucose polymers, used to store fuel. • Structural carbohydrates include cellulose (another glucose polymer) and chitin, the outer coating of insects and many fungi. Also, slimy things like mucus and bacterial capsules.
Lipids • Lipids are the main non-polar component of cells. Mostly hydrocarbons—carbon and hydrogen. • They are used primarily as energy storage and cell membranes. • Energy storage: triglycerides (fats). Composed of glycerol attached to 3 fatty acid molecules. Fatty acids are long chains of carbon and hydrogen. Double bonds kink the chains and lower the melting temperature. • Cell membranes are composed primarily of phospholipids. These have 2 fatty acids attached to glycerol, plus a phosphate-containing polar “head group”. The heads stick into the water outside the membrane, while the non-polar tails stay in the hydrophobic interior of the membrane. This acts as a waterproof coat that keeps most other molecules from passing through the membrane. The membrane consists of 2 layers of phospholipids: the lipid bilayer. • Steroids have 4 carbon rings attached together in a specific way, plus some attached side groups. Cholesterol is an important component of the cell membrane. Various hormones are steroids.
Proteins • The most important type of macromolecule. • Roles: • Structure: collagen in skin, keratin in hair, crystallin in eye. • Enzymes: all metabolic transformations, building up, rearranging, and breaking down of organic compounds, are done by enzymes, which are proteins. • Transport: oxygen in the blood is carried by hemoglobin, everything that goes in or out of a cell (except water and a few gasses) is carried by proteins. • Also: nutrition (egg yolk), hormones, defense, movement • Proteins are composed of linear chains of amino acids. • There are 20 different kinds of amino acids in proteins. Each one has a functional group (the “R group”) attached to it. • Different R groups give the 20 amino acids different properties, such as charged (+ or -), polar, hydrophobic, etc. • The different properties of a protein come from the arrangement of the amino acids.
Protein Structure • A polypeptide is one linear chain of amino acids. Each gene produces one polypeptide. A protein may contain one or more polypeptides. Proteins also sometimes contain small helper molecules such as heme. • After the polypeptides are synthesized by the cell, they spontaneously fold up into a characteristic conformation which allows them to be active. The proper shape is essential for active proteins. For most proteins, the amino acids sequence itself is all that is needed to get proper folding. • Proteins fold up because they form hydrogen bonds between amino acids. The need for hydrophobic amino acids to be away from water also plays a big role. Similarly, the charged and polar amino acids need to be near each other. • The joining of polypeptide subunits into a single protein also happens spontaneously, for the same reasons. • Enzymes are usually roughly globular, while structural proteins are usually fiber-shaped. Proteins that transport materials across membranes have a long segment of hydrophobic amino acids that sits in the hydrophobic interior of the membrane.
Nucleic Acids • Nucleic acids store genetic information in the cell. They are also involved in energy and electron movements. • The two types of nucleic acid are RNA (ribonucleic acid) and DNA (deoxyribonucleic acid). • Nucleotides are the subunits of nucleic acids. • Each nucleotide has 3 parts: a sugar, a phosphate, and a base. • The sugar, ribose in RNA and deoxyribose in DNA, contain 5 carbons. They differ only in that an –OH group in ribose is replaced by a –H in DNA. • The main energy-carrying molecule in the cell is ATP. ATP is an RNA nucleotide with 3 phosphate groups attached to it in a chain. The energy is stored because the phosphates each have a negative charge. These charges repel each other, but they are forced to stay together by the covalent bonds.
Nucleotides • There are 4 possible DNA bases: adenine (A), guanine (G), cytosine (C), and thymine (T). • Adenine and guanine are purines: they consist of two linked rings of mixed nitrogen and carbon atoms. • Thymine and cytosine are pyrimidines, which consist of a single ring. In RNA, thymine is replaced by uracil (U), which looks like thymine except for a single methyl group.
DNA Structure • DNA consists of two anti-parallel chains twisted into a helix. The nitrogenous bases are paired in the center of the molecule, and the phosphate-sugar backbones are on the outside. • the chains start at the 5’ end and end at the 3’ end • The backbone of DNA consists of alternating phosphates and sugars. The bases are paired in the center of the DNA molecule. • Each strand of DNA pairs with a complementary DNA strand. A is paired with T, and each G is paired with a C. Thus, the information on one DNA strand easily allows the other strand to be deduced. The amount of A in DNA always equals the amount of T, and the amount of G always equals the amount of C. • One characteristic of genomes is their GC content: the percentage of G and C. This can vary between from about 20% to 70%. Eukaryotes generally have GC contents around 40%. Also, there are large scale variations in GC content along the length of chromosomes called “isochores”, which may be the result of horizontal gene transfer.
DNA Structure • Pairing is caused by hydrogen bonds, weak links between oxygen and nitrogen atoms where one of them has a hydrogen attached. G-C pairs are stronger than A-T, so high temperature organisms usually have a high GC content.
Semi-conservative Replication • DNA molecules replicate by unwinding, then synthesizing a new strand for each of the old stands. • This mode of replication is called “semi-conservative”. It means that each new DNA molecule consists of one old strand (from the original molecule) and one new strand. • DNA replication starts at specific locations “origins of replication”, and proceeds in both directions. • The DNA chain is said to grow from 5’ to 3’, which means that the first DNA base has a free 5’ end, with attached phosphates. The last nucleotide has a free 3’ OH group on it. All other bases have their 5’ carbons attached to a phosphate, which is attached to the 3’ OH group of the previous nucleotide. • DNA polymerase is the main enzyme used to replicate DNA. However, DNA polymerase is only one enzyme in the replication complex. Several other enzymes are needed to cause replication to occur
RNA • RNA is a nucleic acid, like DNA, with a few small differences: • RNA is single stranded, not double stranded like DNA • However, RNA often folds up into characteristic secondary structures, caused by base pairing. • RNA is short, only 1 gene long, where DNA is very long and contains many genes • RNA uses the sugar ribose instead of deoxyribose in DNA. • This means that RNA has an -OH group instead of an -H. This makes it unstable, a useful characteristic for short-lived messages. • RNA uses the base uracil (U) instead of thymine (T) in DNA.
RNA • There are 3 main types of RNA in the cell • 1. messenger RNA: copies of the individual genes • 2. ribosomal RNA: part of the ribosome, the machine that translates messenger RNA into protein. • Several other RNA/protein hybrid machines in the cell have specialized structural RNAs, for example: • the spliceosome that removes introns in eukaryotes, • the signal recognition particle that moves ribosomes to the endoplasmic reticulum or cell membrane (in prokaryotes) for translation of membrane proteins. • 3. transfer RNA, which is an adapter between the messenger RNA and the amino acids it codes for. • Many of the tRNA bases are enzymatically modified, creating things like pseudouridine and N-methyl adenosine. • Recently it has been shown that short RNAs (miRNA) play an important role in regulating development. miRNAs are not translated into proteins. They function by binding to specific messenger RNAs and either preventing translation or facilitating the degradation of the mRNA.
Gene Expression • Each gene is a short section of a chromosome’s DNA that codes for a polypeptide. • Polypeptides are linear chains of amino acids, and that proteins are composed of one or more polypeptides, sometimes with additional small molecules attached. The proteins then act as enzymes or structures to do the work of the cell. • All cells within an organism have the same genes. What makes one type of cell different from another is which genes are expressed or not expressed in the cell. For example, the genes for hemoglobin are on in red blood cells, but off in muscle and nerve cells. “Expressed” = making the protein product. • Genes are expressed by first making an RNA copy of the gene (transcription) and then using the information on the the RNA copy to make a protein (translation). • This process: DNA transcribed into RNA, then RNA translated into protein, is called the “Central Dogma of Molecular Biology”.
Genetic Code • There are only 4 bases in DNA and RNA, but there are 20 different amino acids that go into proteins. • Each amino acid is coded for by a group of 3 bases, a codon. 3 bases of DNA or RNA = 1 codon. • Since there are 4 bases and 3 positions in each codon, there are 4 x 4 x 4 = 64 possible codons. • This is far more than is necessary, so most amino acids use more than 1 codon. • 3 of the 64 codons are used as STOP signals; they are found at the end of every gene and mark the end of the protein. Stop codons do not code for any amino acid. • In eukaryotes, ATG is used as a START signal: it is at the start of every protein. In prokaryotes, GTG and TTG are also used as start codons. • note that these codons are also used within the protein: they are not JUST start codons.
Reading Frames • Codons are groups of 3 bases. Since translation can start at any nucleotide, the same region of DNA can be read in 3 ways, starting one base apart. Each of these 3 modes is a reading frame. • The DNA might also be read on the opposite strand, giving a total of 6 possible reading frames. • Genes occur in open reading frames (ORFs), areas where there are no stop codons. Genes end at the first stop codon that exists in their reading frame. • 3 out of every 64 codons is a stop codon, so large open reading frames are rare in random, unselected DNA. Since genes are under selection pressure, most long open reading frames contain genes.
Transcription • Transcription is the process of making an RNA copy of a single DNA gene. • The copying is done by an enzyme: RNA polymerase. Recall that in replication, a DNA copy of DNA is made by the enzyme DNA polymerase. • The bases of RNA pair with the bases of DNA: A with T (or U in RNA), and G with C. The RNA copy of a gene is just a complementary copy of the DNA strand. • RNA polymerase attaches to a signal at the beginning of the gene, the promoter. Then RNA polymerase moves down the gene, adding new bases to the RNA copy, until it reaches a termination signal at the end of the gene. • In eukaryotes, messenger RNA is terminated by the addition of 50-200 A nucleotides, a poly-A tail. There is no definite transcription termination signal
Translation • The basics: a ribosome binds to messenger RNA, then moves down the RNA (5’ to 3’ direction), reading the codons and creating the corresponding polypeptide from them. • The ribosomes are RNA/protein hybrid machines. They are composed of 2 subunits (small and large), which come together at the initiation of the translation process. • Transfer RNA molecules act as adapters between the codons on messenger RNA and the amino acids. Transfer RNA is the physical manifestation of the genetic code. • At one end is the “anticodon”, 3 RNA bases that matches the 3 bases of the codon. This is the end that attaches to messenger RNA. • At the other end is an attachment site for the proper amino acid. • A special group of enzymes (aminoacyl tRNA synthetases) pairs up the proper transfer RNA molecules with their corresponding amino acids.
Translation Steps • The ribosome then slides down the messenger RNA 1 codon (3 bases). • The first transfer RNA is pushed off, and the second transfer RNA, with 2 attached amino acids, moves to the first position on the ribosome. • The elongation cycle repeats as the ribosome moves down the messenger RNA, translating it one codon and one amino acid at a time. • Repeat until a STOP codon is reached. • The final step in translation is termination. When the ribosome reaches a STOP codon, there is no corresponding transfer RNA. • Instead, a small protein called a “release factor” attaches to the stop codon. • The release factor causes the whole complex to fall apart: messenger RNA, the two ribosome subunits, the new polypeptide. • The messenger RNA can be translated many times, to produce many protein copies. • First step: initiation. The messenger RNA binds to a ribosome, and the transfer RNA corresponding to the START codon binds to this complex. Ribosomes are composed of 2 subunits (large and small), which come together when the messenger RNA attaches during the initiation process. • Step 2 is elongation: the ribosome moves down the messenger RNA, adding new amino acids to the growing polypeptide chain. • The ribosome has 2 sites for binding transfer RNA. The first RNA with its attached amino acid binds to the first site, and then the transfer RNA corresponding to the second codon bind to the second site. • The ribosome then removes the amino acid from the first transfer RNA and attaches it to the second amino acid. • At this point, the first transfer RNA is empty: no attached amino acid, and the second transfer RNA has a chain of 2 amino acids attached to it.
1 3 2 5 4 6 7 9 8 11 10
Post-translation • The new polypeptide is now floating loose in the cytoplasm. It might also be inserted into a membrane, if the ribosome it was translated on was attached to the rough endoplasmic reticulum. • Polypeptides fold spontaneously into their active configuration, and they spontaneously join with other polypeptides to form the final proteins. • Sometimes other molecules are also attached to the polypeptides: sugars, lipids, phosphates, etc. All of these have special purposes for protein function. • Sometimes amino acids are chemically modifed: converting proline into hydroxyproline, for example
Gene Regulation • Genes have to turn on and off with specific signals. Most gene regulation occurs at the level of transcription. • But there is also a fair amount of regulation of protein activity using reversible phosphorylation • And, signaling across the cell membrane is an important part of the process. • Proteins that bind to the DNA and affect transcription of specific genes are transcription factors. Some are activators and others are repressors. • Many transcription factors bind to sequences just upstream from (5’ to) the transcription start site. • The DNA binding sites for transcription factors are short consensus sequences, variations on a common theme. The sites aren’t nearly as precise as the protein-coding portions of genes, and thus they are harder to identify. • In eukaryotes, enhancers are DNA sites that can be quite distant from the transcription start site.
Operons • In eukaryotes, each messenger RNA contains a single gene. Genes are scattered randomly throughout the genome, with no grouping of related genes. • “monocistronic” = having only 1 gene on a mRNA. • In prokaryotes, genes that make different parts of the same structure or metabolic pathway are often grouped together and transcribed as a single unit. Several different proteins are independently translated from the same mRNA molecule. This group of genes is called an operon. • “polycistronic” = having several genes co-transcribed onto the same mRNA.
Exceptions in Prokaryotes • Off the top of my head: • selenocysteine and pyrolysine also use the UGA stop codon • Bacteria have been seen (rarely) to use several other start codons, including CTG, ATA, ATC, and ATT. • Regardless of which start codon is used, all bacteria (NOT Archaea) use N-formyl methionine as the first amino acid in the polypeptide. • “RNA editing” is a process by which certain messenger RNAs are altered by adding, deleting, or altering certain bases. It seems rare and (so far) confined to eukaryotes (including mitochondria and chloroplasts).
RNA processing in Eukaryotes • Oddly, most genes in eukaryotes are not continuous. They are interrupted by long regions of DNA that don’t code for protein, called “introns”. Introns have no known function. The useful parts of the gene, the parts that code for proteins, are called “exons”. Some genes are more than 99% introns, with only 1% of the gene useful: the cystic fibrosis gene is like this. • The entire gene, introns and exons, is transcribed into an RNA copy, but the introns need to be removed before it can be converted to protein. • After transcription, snips out the introns, leaving only the protein coding portion of the gene in the RNA. • Also, the cell adds a protective cap to one end, and a tail of A’s to the other end. These both function to protect the RNA from enzymes that would degrade it starting on an end and moving inward. • Thus, an RNA copy of a gene is converted into messenger RNA by doing 2 things: 1. cut out the introns. 2. add protective bases to the ends. • Transcription of RNA processing occur in the nucleus. After this, the messenger RNA moves to the cytoplasm for translation.
Reverse Transcription • A few exceptions to the Central Dogma exist. • Most importantly, some RNA viruses, called “retroviruses” make a DNA copy of themselves using the enzyme reverse transcriptase. The DNA copy incorporates into one of the chromosomes and becomes a permanent feature of the genome. The DNA copy inserted into the genome is called a “provirus”. This represents a flow of information from RNA to DNA. • Closely related to retroviruses are “retrotransposons”, sequences of DNA that make RNA copies of themselves, which then get reverse-transcribed into DNA that inserts into new locations in the genome. Unlike retroviruses, retrotransposons always remain within the cell. They lack genes to make the protein coat that surrounds viruses. • Some viruses use RNA for their genome, and directly copy it into more RNA without any DNA intermediate. The enzyme involved is called a “replicase” or “RNA dependent RNA polymerase”.
Written Sequences • Both DNA and RNA are synthesized starting at the 5’ end and moving to the 3’ end. • When we write a DNA sequence, we only write one strand (the other is implied), with the 5’ end on the left. • genes are written using the “coding strand” of the DNA, which has the same sequence as the resulting messenger RNA. However, the RNA polymerase actually uses the other (“non-coding”) strand as the template for RNA synthesis. • Since the RNA copy of the gene is essentially identical to the DNA, the distinction between T and U bases is widely ignored. • There is no apparent convention for capital vs. small letters. • Unknown bases are generally written as “N”. The number of N’s in a row is not necessarily related to how many bases are actually present in the DNA at that point. • There is also a whole set of ambiguous base codes: “Y” for pyrimidine and “R” for purine, for example. I am not fond of these. • Similarly, proteins are synthesized starting at the N terminus and moving to the C terminus. Moving from 5’ to 3’ on the messenger RNA is the same direction as N to C on the corresponding protein. • The twenty amino acids can be written with a 3-leter code that is fairly obvious. However, the 1-letter code is more common. Here’s a useful version of the genetic code table that gives both: http://www.bios.niu.edu/johns/recdna/genetic_code.htm • selenocysteine, an amino acid found in a few proteins in most species (prokaryotic and eukaryotic) uses the UGA stop codon, with the translation machinery able to tell how to interpret the UGA from its context. Selenocysteine’s 3 letter code is “Sec” and its one letter code is “U”. • pyrolysine (symbols Pyl and O) is found in a few Archaeal (methanogen) proteins. It is also coded by UGA. • Unknown amino acids are generally symbolized by “X”. • Stop codons are symbolized by “*”.