150 likes | 874 Views
Transcription Factors. A transcription factor is a protein that binds to a specific DNA sequence. This controls the flow of information from DNA to RNA (transcription). i.e., it can turn a specific gene on or off.
E N D
Transcription Factors A transcription factor is a protein that binds to a specific DNA sequence. This controls the flow of information from DNA to RNA (transcription). i.e., it can turn a specific gene on or off. Approximately 2600 genes in the human genome (~10%) code proteins with DNA-binding domains. Most are assumed to be transcription factors. For 10% to regulate the other 90%, they sometimes must work in combination. Genes are often flanked by several transcription binding sites.
Transcription Factors From the literature, get a list of human TFs. Determine all of binding sites for each TF. Classify the TF based on whether it works alone, or in combination with other TFs. S to S One TF binds to a single location M to S Multiple TFs bind to one gene S to M One TF binds to multiple locations M to M Multiple TFs bind in combination in multiple genes
Pseudogenes Through a variety of processes, a gene can become corrupted and no longer function, and this gene is referred to as a pseudogene. For the organism to survive, however, there must still be a working copy of this gene ( the “true” gene). Project: • Read in the genetic sequence for various bacteria. • Determine pseudogenes from true genes. • Determine the type of, or feature of, the pseudogenes. • Compare the number of genes and the number of pseudogenes (and pseudogene type).
Protein-Protein Interactions Software has been written to look at a genetic sequence, determine motifs that could signal the presence of a gene, and translate that gene to see what protein it might create. In many organisms, there has not been time to study these proteins, but they are recorded in the data as “hypothetical proteins”. Find the hypothetical proteins for different bacteria and compare them to the yeast genome. The yeast genome has been well studied, so will act as your “known”. The program RPSBlast will do an alignment-based comparison of proteins. When you match a yeast protein to a hypothetical protein, they probably have the same function.
Longest Common Substring Longest Common Substring (LCS) is a way to look for similarities between the genetic sequences of different species. It compares two sequences and counts the number of bases that are the same. Eg., TAGGTTTGACCCTGC AGGGTTTGACCAATA have a 9 base substring in common. Comparing species to see which ones have the most in common genetically, should tell you which ones are the most closely related by evolution. Sequences for many bacteria can be found on biobase.ist.unomaha.edu at /clab_bdb/nucleotide/genomes/Bacteria/*
Longest Common Substring For this project, take several bacteria species that live in the human mouth, and several that live in the human gut, and compare them two at a time, and group them based on similarity. Mouth: • Streptococcus mutans (NC_013928) • Streptococcus pneumoniae (NC_010380) • Neisseriameningitidis (NC_013016) • Haemophilusinfluenzae (NC_000907) Lower GI • Bifidobacteriumbifidum (NC_014638) • Mycobacterium abscessus ( NC_010397) • Bacteroidesvulgatus (NC_009614)