210 likes | 450 Views
M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland. Introduction to Bioinformatics: February 2005 David Lynn (M.Sc., Ph.D.). http://www.binf.org/course2005/. Topics for the next 3 days:. Day 1a – Nucleic Acid Sequence Analysis
E N D
M.Sc. in Molecular Medicine, Institute of Molecular Medicine, Trinity College Dublin, Ireland. Introduction to Bioinformatics: February 2005 David Lynn (M.Sc., Ph.D.) http://www.binf.org/course2005/
Topics for the next 3 days: • Day 1a – Nucleic Acid Sequence Analysis • Day 1b – Protein Sequence Analysis • Day 1c – Accessing Complete Genomes • Day 2a – Alignments & Homology Searching • Day 2b – Phylogenetic Trees
Day 1a • Introduction • Interrogating Sequence Databases • Translating DNA in 6 frames. • Reverse complement & other tools. • Calculating some properties of DNA/RNA sequences. • Primer design. • Gene prediction. • Alternative splicing. • Promoter characterisation. • Other resources.
5'3' Frame 1 atcacctggtatagtataa I T W Y S I 5'3' Frame 2 atcacctggtatagtataa S P G I V * 5'3' Frame 3 atcacctggtatagtataa H L V * Y 3'5' Frame 1 ttatactataccaggtgat L Y Y T R * 3'5' Frame 2 ttatactataccaggtgat Y T I P G D 3'5' Frame 3 ttatactataccaggtgat I L Y Q V 1) Translating DNA in 6 frames
Why? • Translating in all 6 frames is commonly done for a range of bioinformatics applications. • One place you may need to do it is to locate ORFs in an mRNA sequence which will have untranslated 3’ and 5’ UTRs. • Try find the protein sequence encoded by the IL-11 mRNA (link on webpage) using the Translate Tool at Expasy.
2) Search launcher at Baylor College • Readseq – converts sequences from one format to another. • RepeatMasker – masks sequences against repeat sequences. • Primer Selection - PCR primer selection (See primer design later). • WebCutter- restriction maps using enzymes w/ sites >= 6 bases. • 6 Frame Translation - translates a nucleic acid sequence in 6 frames. • Reverse Complement - reverse complements a nucleic acid sequence. • Reverse Sequence - reverses sequence order. • Sequence Chopover - cut a large protein/DNA sequence into smaller ones with certain amounts of overlap. • HBR - Finds E.coli contamination in human sequences.
3) Oligo Calculator • Calculates the • Length • %GC content • Melting temperature (Tm) the midpoint of the temperature range at which the nucleic acid strands separate • Molecular weight • What an OD = 1 is in picoMolar of your input sequence. • Many of these parameters are useful in primer design
Beer – Lambert Law • A = ecl • e = molar extinction coefficient • c = molar concentration • l = light path = 1 cm • A = O.D. • If O.D. = 1 = 41 pM • Reading of O.D. = 0.5 on spectrometer • => concentration = 20.5pM
5) Gene Prediction • Gene prediction is an area under intensive research in bioinformatics. • GENSCAN program - one of the major programs used to predict genes in the human genome . • Should be useful in predicting genes in most vertebrate species, although caution should be used when dealing with other species especially prokaryotes where other programs are more suitable. • The Institute for Genomic Research • The Deambulum Nucleic Acids Sequence Analysis page at Infobiogen
6) Splice site prediction/Alternative splicing • For proper splicing => some way to distinguish exons from introns. • Accomplished using certain base sequences as signals. • Allow the spliceosome (the cellular machinery that does the splicing) to identify the 5' and 3' ends of the intron. • Eukaryotes: the base sequence of an intron begins with 5' GU, and ends with 3' AG. • Each species has additional bases associated with these splice sites. • Introns also have another important sequence signal called a branch site containing a tract of pyrimidine bases and a special adenine base, usually approximately 50 bases upstream from the 3' splice site.
Alternative splicing • Central dogma of molecular biology was that 1 gene = 1 protein. • Multiple possible mRNA transcripts can be produced from 1 gene and if translated these transcripts can code for very different proteins • Alternative splicing • 4 basic methods of alternative splicing.
The Human Alternative Splicing Database at UCLA • Used ESTs to locate alternative splices. • Project has resulted in a publication of over six thousand alternatively spliced isoforms of human genes. • Search the database using any of the following identifiers: • Gene Symbol • UniGene Sequence Identifier • UniGene Cluster Identifier • Gene Title • GenBank Sequence Identifier
7) Promoter Analysis & Recognition • A promoter is a sequence that is used to initiate and regulate transcription of a gene. • Most protein-coding genes in higher eukaryotes have polymerase II dependent promoters. • Features of pol II promoters: • Combination of multiple individual regulatory elements. • Most important elements are transcription factor binding sites. • CAAT or TATA boxes are neither necessary nor sufficient for promoter function. • In many cases, order and distances of elements are crucial for their function. • Sequences between elements within a promoter are usually not conserved and of no known function.
PromoterInspector • predicts eukaryotic pol II promoter regions with high specificity (~ 85%) in mammalian genomic sequences. • sensitivity of PromoterInspector is about 50% which means that the current version predicts about every second promoter in the genome. • PromoterInspector predicts the approximate location of a promoter region and not the exact location of the Transcription Start Site (TSS).
MatInspector professional • Individual Transcription Factor sites build the basis of the promoter. • Relatively short stretches of DNA (10 - 20 nucleotides) • Sufficiently conserved in sequence to allow specific recognition by the corresponding transcription factor. • Utilizes a library of matrix descriptions for transcription factor binding sites to locate matches in sequences.