550 likes | 683 Views
Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas, May 29, 2006. Genes or Regulation ?. “10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence”
E N D
Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas, May 29, 2006.
Genes or Regulation? • “10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence” • “Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible” Richards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.
BP England, U Heberlein, R Tjian. Purified Drosophila transcription factor, Adh distal factor-1 (Adf-1), binds to sites in several Drosophila promoters and activates transcription, J Biol Chem 1990.
S. Chatterji and L. Pachter, GeneMapper: Reference based annotation with GeneMapper, in press. http://bio.math.berkeley.edu/genemapper/
Genes or Regulatory Elements? • “10,516 10,867 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura/melanogaster divergence” • “Cis-regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible” Richards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.
Alignment of coding sequence DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of non-coding sequence DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** **
Alignment of coding sequence DroAna_20041206_ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_20041206_ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_20040829_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_20041029_ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of non-coding sequence droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA--CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAAAGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA----TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAATAGATCCT-TTATTT *** * * * * droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC *** **
UUCCCUAG--------CAAGUACCUCA------------------UUCCCUAG--------CAAGUACCUCA------------------UUCCCUAG--------CAAGUACCUCA------------------UUCCUUAGACUCUUAGCAAGUACCUCA------------------UUCCUUAGACUCUUAGAAAGUACCUCAAAAACGAAAUGCGAACACGACUCU----UUUUAGCAAGUACCUCAAAAUAUUUAAUUAAA-AC ACUCUU----UUUUAGCAAGUACCUCAAGAAUUACAAUUAAAUAU let-7 . . . . . . . . AUGGAGU Grun et al. microRNA target predictions across seven Drosophila species and comparison to mammalian targets, PloS Computational Biology, June 2005 Lall et al. A genome wide map of conserved microRNA targets in C. Elegans, Current Biology, February 2006 Example of a conserved microRNA target
Richards et al., Comparative genome sequencing of Drosophila pseudoobscura: Chromosomal, gene, and cis-element evolution, Genome Res., Jan 2005.
How is an alignment made from two sequences? Given two sequences of lengths n,m: >dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC >dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC n=50 m=62 ? dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC
dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ ------TGTACTTAC Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S).
#M=31, #X=22, #G=3, #S=12 #M=27, #X=18, #G=3, #S=28 dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ ------TGTACTTAC Each alignment can be summarized by counting the number of matches (#M), mismatches (#X), gaps (#G), and spaces (#S). 2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary.
The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12) (18,3,28)
The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12) (18,3,28) In the example of our two sequences there are 379522884096444556699773447791552717765633 different alignments.
The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12) (18,3,28) In the example of our two sequences there are 379522884096444556699773447791552717765633 different alignments, but only 53890 different summaries. So we don’t need to plot that many points.
The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12) (18,3,28) In the example of our two sequences there are 379522884096444556699773447791552717765633 different alignments, but only 53890 different summaries. So we don’t need to plot that many points. But 53890 is still quite a large number. Fortunately, there are only 69 vertices on the convex hull of the 53890 points. These are the interesting ones, and we can even draw them…
49 #x=24, #S=10, #G=2 There are eight alignments that have this summary. >mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC For the sequences: the alignment polytope is:
mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGA---------GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAG---------AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC
mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGAC pse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC Consensus at a vertex
49 #x=24, #S=10, #G=2 The vertices of the polytope have special significance. Given parameters for a model, e.g. the default parameters for MULTIZ: M = 100, X = -100, S = -30, G = -400 the summary is the result of maximizing the linear form -200*(#X)-400*(#G)-80*(#S) over the polytope. Thus, the vertices of the polytope correspond to optimalalignments.
Needleman-Wunsch Alignment What is usually done, is that a single set of parameters is specified (M = 100, X = -100, S = -30, G = -400 is a standard default) and then theoptimal vertex is identified using dynamic programming. An alignment optimal for the vertex is then selected. The running time of the algorithm is O(nm) [Needleman-Wunsch, 1970, Smith-Waterman, 1981] and it requires O(n+m) space [Hirschberg 1975] . Standard scoring schemes are: Parameters Model M,X,S Jukes-Cantor with linear gap penalty M,X,S,GJukes-Cantor with affine gap penalty M,XTS,XTV,S,GKimura-2 parameter with affine gap penalty
Building Drosophila whole genome multiple alignments • MAVID • http://hanuman.math.berkeley.edu/kbrowser • MULTIZ • http://genome.ucsc.edu/ (currently no D. erecta)
DroAna_20041206_ CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT DroMoj_20041206_ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG----DroSim_20040829_ CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT-DroVir_20041029_ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT *** * * * DroAna_20041206_ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_20041206_ ----TATTTACTCAC DroPse_1_ ------TGTACTTAC DroSim_20040829_ ATTCTATGGACTCAC DroVir_20041029_ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** MAVID N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p 693--699
droAna1.2448876 CTGAAGGAATTCTA--TATTAAAG------------------------------- dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG---------TGCCGAAAAGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA---------CACATAAA--CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG---------TGCGGAAAAGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA---------TAAACAA----TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG---------TGAAGAATAGATCCT-TTATTT *** * * * * droAna1.2448876 AAGATTTCTCATCATTGGTTGAATC---------------------ACTTAC dm2.chr2L -----------------------------------------TATGGACTCAC droMoj1.contig_2959 -------------------------AAATATTT--------TATTGACTCAC dp3.chr4_group3 -----------------------------------------TGT--ACTTAC droSim1.chr2L -----------------------------------------TATGGACTCAC droVir1.scaffold_6 ---------------------------------AAATATTTGGTCCACTCAC droYak1.chr2L -----------------------------------------CATAAACTCAC *** ** MULTIZ Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p 708--715
One (possibly wrong) alignment is not enough: the history of parametric inference • 1992: Waterman, M., Eggert, M. & Lander, E. • Parametric sequence comparisons, Proc. Natl. Acad. Sci. USA89, 6090-6093 • 1994: Gusfield, D., Balasubramanian, K. & Naor, D. • Parametric optimization of sequence alignment, Algorithmica12, 312-326. • 2003: Wang, L., Zhao, J. • Parametric alignment of ordered trees, Bioinformatics, 19 2237-2245. • 2004: Fernández-Baca, D., Seppäläinen, T. & Slutzki, G. • Parametric Multiple Sequence Alignment and Phylogeny Construction, Journal of Discrete Algorithms, 2 271-287. XPARAL by Kristian Stevens and Dan Gusfield
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods • Mathematics and Computer Science • Parametric alignment in higher dimensions. • Faster new algorithms. • Deeper understanding of alignment polytopes. • Biology • Whole genome parametric alignment. • Biological implications of alignment parameters. • Alignment with biology rather than for biology.
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods • Mathematics and Computer Science • Parametric alignment in higher dimensions. • Faster new algorithms. • Deeper understanding of alignment polytopes. • Biology • Whole genome parametric alignment. • Biological implications of alignment parameters. CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT- CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT analysis
Whole Genome Parametric AlignmentColin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods • Mathematics and Computer Science • Parametric alignment in higher dimensions. • Faster new algorithms. • Deeper understanding of alignment polytopes. • Biology • Whole genome parametric alignment. • Biological implications of alignment parameters. CTGAAGGAAT-------TCTATATT---------AAAGAAGATTTCTCATCATTGGTTG CTGCGGGATTAGGGGTCATTAGAGT---------GCCGAAAAGCGA---------GTTT CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA------- CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- CTGCGGGATTAGGAGTCATTAGAGT---------GCGGAAAAGCGG---------GTT- CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA------- CTGCGGGATTAGCGGTCATTGGTGT---------GAAGAATAGATC---------CTTT analysis
= + A Whole Genome Parametric Alignment of D. Melanogaster and D. Pseudoobscura • Divided the genomes into 1,116,792 constrained and 877,982 unconstrained segment pairs. • 2d, 3d, 4d, and 5d alignment polytopes were constructed for each of the 877,802 unconstrained segment pairs. • Computed the Minkowski sum of the 877,802 2d polytopes.
A Whole Genome Parametric Alignment of D. Melanogaster and D. Pseudoobscura • Divided the genomes into 1,116,792 constrained and 877,982 unconstrained segment pairs. • This is an orthology map of the two genomes. • 2d, 3d, 4d, and 5d alignment polytopes were constructed for each of the 877,802 unconstrained segment pairs. • For each segment pair, obtain all possible optimal summaries for all parameters in a Needleman--Wunsch scoring scheme. • Computed the Minkowski sum of the 877,802 2d polytopes. • There are only 838 optimal alignments of the two Drosophila genomes if the same match, mismatch and gap parameters are used for all the segment pair alignments.
>mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC ? How do we build the polytope for
Alignment polytopes are small Theorem: The number of vertices of an alignment polytope for two sequences of length n and m is O((n+m)d(d-1)/(d+1)) where d is the number of free parameters in the scoring scheme. Examples: Parameters Model Vertices M,X,SJukes-Cantor with linear gap penalty O(n+m)2/3 M,X,S,GJukes-Cantor with affine gap penalty O(n+m)3/2M,XTS,XTV,S,GK2P with affine gap penalty O(n+m)12/5 L. Pachter and B. Sturmfels, Parametric inference for biological sequence analysis, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16138--16143. L. Pachter and B. Sturmfels, Tropical geometry of statistical models, Proceedings of the National Academy of Sciences, Volume 101, Number 46 (2004), p 16132--16137. L. Pachter and B. Sturmfels (eds.), Algebraic Statistics for Computational Biology, Cambridge University Press.
Back to Adf1 BP England, U Heberlein, R Tjian. Purified Drosophila transcription factor, Adh distal factor-1 (Adf-1), binds to sites in several Drosophila promoters and activates transcription, J Biol Chem 1990.
Back to Adf1 mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse TGT-----------------GACTGCG *** ** *** BLASTZ alignment
Back to Adf1 mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse TGT-----------------GACTGCG *** ** *** mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG Pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG **** ** * ** * ****** ** *** * **
Back to Adf1 mel TGTGCGTCAGCGTCGGCCGCAACAGCG pse TGT-----------------GACTGCG *** ** *** mel TGTG----CGTCAGC--G----TCGGCC---GC-AACAG-CG pse TGTGACTGCG-CTGCCTGGTCCTCGGCCACAGCCAAC-GTCG **** ** * ** * ****** ** *** * ** mel TGTGCGTCAGC------GTCGGCCGCAACAGCG pse TGTGACTGCGCTGCCTGGTCCTCGGCCACAGC- **** * ** *** * ** *****