530 likes | 681 Views
Comparative Sequence Analysis in Molecular Biology. Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle, Washington, U.S.A. Outline. What genome data is available? What is phylogenetic footprinting?
E N D
Comparative Sequence Analysisin Molecular Biology Martin Tompa Computer Science & Engineering Genome Sciences University of Washington Seattle, Washington, U.S.A.
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
DNA: the cell’s program Cell DNA Nucleotide (A, C, G, or T)
DNA TCCAACGGTGCTGAGGTGCAC Protein Gene DNA, Genes, and Proteins DNA: program for cell processes Proteins (and RNA): execute cell processes
How Much DNA in a Cell? An organism’s genome is the total DNA in one of its cells. • How many nucleotides in a genome? M. tuberculosis bacterium 4,000,000 D. melanogaster fruit fly 200,000,000 H. sapiens human 3,000,000,000 P. nudum whisk fern 250,000,000,000 • How can we understand the genome’s program? • Lab benchwork is costly and time-consuming. • We will return to this question.
How Many Genomes Are Available? • 46 vertebrate genomes sequenced (primates to rodents to marsupials to birds to fishes) • 1025 bacterial genomes sequenced (as of 4/6/2010) • Insects, fungi, worms, plants, … • Many more will be finished very soon • Fertile ground for comparative genomics
1982-2003: number of nucleotides in GenBank doubled every 18 months Since 2003: doubled every 3 years
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
Phylogenetic Footprinting(Tagle et al. 1988) • Functional regions of DNA (regions under “purifying constraint”) evolve slower than nonfunctional ones. • Consider a set of corresponding DNA sequences from related species. • Identify unusually well conserved subsequences (i.e., ones that have not mutated much over the course of evolution): “motifs”
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
How to Find Conserved Motifs ACTAACCGGGAGATTTCAGAhuman AAGTTCCGGGAGATTTCCAchimp TAGTTATCCGGGAGATTAGAmouse AAAACCGGTAGATTTCAGGrat
Multiple Sequence Alignment AC--TAACCGGGAGATTTCAGA human AAGTT--CCGGGAGATTTCC-Achimp TAGTTATCCGGGAGATT--AGAmouse AA---AACCGGTAGATTTCAGGrat (Finding the optimal alignment is NP-complete.)
Phylogenetic Footprinting • Use whole-genome multiple alignment such as provided by UCSC Genome Browser. • Search for regions of well conserved alignment. • Regulatory elements [Cliften; Kellis; Kolbe; Prakash; Woolfe; Xie (2)] • RNA elements [Pedersen; Washietl] • General conservation & constraint [Bejerano; Boffelli; Cooper; Margulies (4); Pollard; Prabhakar; Siepel]
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
Which Alignment Columns to Trust? • Vertebrate alignment has 3.8 billion columns • Automatically generated • Recent comparison (Margulies et al., 2007) of 4 whole-mammal alignment methods revealed widespread disagreement
Which Alignment Columns to Trust?(with Amol Prakash, generalizing Karlin and Altschul 1990) Goal: label each alignment column with confidence measure of alignment correctness • Identify sequences that do not belong • Users forewarned about regions of interest • Genome browser designers consider realigning • Alignment tool designers get feedback for possible improvements
Sample Suspicious Alignment Human -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Chimp -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Rhesus -----------GTTGCCATGC-AAAAATATTATGTCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Mouse -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CGTGTCAA----------TTAACAC Rat -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CGTGTCAA----------TTAACAC Dog -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Cow -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Elephant -----------GTTGCTATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Tenrec -----------GTTGCCATAC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATGTCAA----------TTAACAC Opossum -----------GTTGCCATGC-AAAAATATTATGGCTTTACTAAAATTTATACAAG---CATATCAA----------TTAACAC Chicken -----------GTTGCCATGCAAAAAATAATATGGCTTTACTAAAATTTACACAAC---CCTGACAA----------TTAACAC ZebrafishGAACATATCCGAGTGCTGTAA-AATACTACTGGGA----ACCAGAAATG—-ACAAGTTCCATGACAGCTTTGCCTTTTTGGCTC
Human Chimp Mouse Rat Chicken Pr(12345| ) Pr(125 | ) Pr(34 |) • sc(12345 | ) = log() Scoring Function Pr(1,2) Pr(1)Pr(2) Pairwise:score(1,2) = log ( ) Multiple: 1 2 3 4 5
Outline of Computation Input Multiple sequence alignment A For each branch k of the tree { Compute scoring function sck (Felsenstein) Find all maximally scoring segments of A usingsck(Ruzzo & Tompa) Compute K, using sck (Karlin & Altschul) Compute p-value pk of each segment score using K, (Karlin & Altschul) } Output Discordance: maxkpk
Suspicious Alignment Regions • Case study: human chromosome 1 alignment to 16 other vertebrates in UCSC Genome Browser • Identify suspicious alignment regions: • Length 50 bp • p-value 0.1 at each position, all with respect to the same branch k • At most 50% gapped columns
247,000,000 9.7% 15% 3.3% 2.3% 26% 1.3% 29% 24% .004%
Genomic Locations of Suspicious Regions 6% of chromosome 1 alignments containing mouse are exonic 35% of chromosome 1 alignments containing zebrafish are exonic
Outline • What genome data is available? • What is phylogenetic footprinting? • Phylogenetic footprinting by multiple sequence alignment • Which parts of multiple sequence alignments are trustworthy? • FootPrinter: phylogenetic footprinting without alignment
DNA TCCAACGGTGCTGAGGTGCAC Protein Gene DNA, Genes, and Proteins DNA: program for cell processes Proteins: execute cell processes
Regulation of Genes • What turns genes on and off? • When is a gene turned on or off? • Where (in which cells) is a gene turned on? • How many copies of the gene product are produced?
Regulation of Genes Transcription Factor RNA polymerase DNA Gene Regulatory Element
Regulation of Genes Transcription Factor RNA polymerase DNA Gene Regulatory Element
Goal • Identify regulatory elements in DNA sequences. These are: • Binding sites for proteins • Short subsequences (5-25 nucleotides) • Up to 1000 nucleotides (or farther) from gene • Inexactly repeating patterns (“motifs”)
CLUSTALW multiple sequence alignment (rbcS gene) Cotton ACGGTT-TCCATTGGATGA---AATGAGATAAGAT---CACTGTGC---TTCTTCCACGTG--GCAGGTTGCCAAAGATA-------AGGCTTTACCATT Pea GTTTTT-TCAGTTAGCTTA---GTGGGCATCTTA----CACGTGGC---ATTATTATCCTA--TT-GGTGGCTAATGATA-------AGG--TTAGCACA Tobacco TAGGAT-GAGATAAGATTA---CTGAGGTGCTTTA---CACGTGGC---ACCTCCATTGTG--GT-GACTTAAATGAAGA-------ATGGCTTAGCACC Ice-plant TCCCAT-ACATTGACATAT---ATGGCCCGCCTGCGGCAACAAAAA---AACTAAAGGATA--GCTAGTTGCTACTACAATTC--CCATAACTCACCACC Turnip ATTCAT-ATAAATAGAAGG---TCCGCGAACATTG--AAATGTAGATCATGCGTCAGAATT--GTCCTCTCTTAATAGGA-------A-------GGAGC Wheat TATGAT-AAAATGAAATAT---TTTGCCCAGCCA-----ACTCAGTCGCATCCTCGGACAA--TTTGTTATCAAGGAACTCAC--CCAAAAACAAGCAAA Duckweed TCGGAT-GGGGGGGCATGAACACTTGCAATCATT-----TCATGACTCATTTCTGAACATGT-GCCCTTGGCAACGTGTAGACTGCCAACATTAATTAAA Larch TAACAT-ATGATATAACAC---CGGGCACACATTCCTAAACAAAGAGTGATTTCAAATATATCGTTAATTACGACTAACAAAA--TGAAAGTACAAGACC Cotton CAAGAAAAGTTTCCACCCTC------TTTGTGGTCATAATG-GTT-GTAATGTC-ATCTGATTT----AGGATCCAACGTCACCCTTTCTCCCA-----A Pea C---AAAACTTTTCAATCT-------TGTGTGGTTAATATG-ACT-GCAAAGTTTATCATTTTC----ACAATCCAACAA-ACTGGTTCT---------A Tobacco AAAAATAATTTTCCAACCTTT---CATGTGTGGATATTAAG-ATTTGTATAATGTATCAAGAACC-ACATAATCCAATGGTTAGCTTTATTCCAAGATGA Ice-plant ATCACACATTCTTCCATTTCATCCCCTTTTTCTTGGATGAG-ATAAGATATGGGTTCCTGCCAC----GTGGCACCATACCATGGTTTGTTA-ACGATAA Turnip CAAAAGCATTGGCTCAAGTTG-----AGACGAGTAACCATACACATTCATACGTTTTCTTACAAG-ATAAGATAAGATAATGTTATTTCT---------A Wheat GCTAGAAAAAGGTTGTGTGGCAGCCACCTAATGACATGAAGGACT-GAAATTTCCAGCACACACA-A-TGTATCCGACGGCAATGCTTCTTC-------- Duckweed ATATAATATTAGAAAAAAATC-----TCCCATAGTATTTAGTATTTACCAAAAGTCACACGACCA-CTAGACTCCAATTTACCCAAATCACTAACCAATT Larch TTCTCGTATAAGGCCACCA-------TTGGTAGACACGTAGTATGCTAAATATGCACCACACACA-CTATCAGATATGGTAGTGGGATCTG--ACGGTCA Cotton ACCAATCTCT---AAATGTT----GTGAGCT---TAG-GCCAAATTT-TATGACTATA--TAT----AGGGGATTGCACC----AAGGCAGTG-ACACTA Pea GGCAGTGGCC---AACTAC--------------------CACAATTT-TAAGACCATAA-TAT----TGGAAATAGAA------AAATCAAT--ACATTA Tobacco GGGGGTTGTT---GATTTTT----GTCCGTTAGATAT-GCGAAATATGTAAAACCTTAT-CAT----TATATATAGAG------TGGTGGGCA-ACGATG Ice-plant GGCTCTTAATCAAAAGTTTTAGGTGTGAATTTAGTTT-GATGAGTTTTAAGGTCCTTAT-TATA---TATAGGAAGGGGG----TGCTATGGA-GCAAGG Turnip CACCTTTCTTTAATCCTGTGGCAGTTAACGACGATATCATGAAATCTTGATCCTTCGAT-CATTAGGGCTTCATACCTCT----TGCGCTTCTCACTATA Wheat CACTGATCCGGAGAAGATAAGGAAACGAGGCAACCAGCGAACGTGAGCCATCCCAACCA-CATCTGTACCAAAGAAACGG----GGCTATATATACCGTG Duckweed TTAGGTTGAATGGAAAATAG---AACGCAATAATGTCCGACATATTTCCTATATTTCCG-TTTTTCGAGAGAAGGCCTGTGTACCGATAAGGATGTAATC Larch CGCTTCTCCTCTGGAGTTATCCGATTGTAATCCTTGCAGTCCAATTTCTCTGGTCTGGC-CCA----ACCTTAGAGATTG----GGGCTTATA-TCTATA Cotton T-TAAGGGATCAGTGAGAC-TCTTTTGTATAACTGTAGCAT--ATAGTAC Pea TATAAAGCAAGTTTTAGTA-CAAGCTTTGCAATTCAACCAC--A-AGAAC Tobacco CATAGACCATCTTGGAAGT-TTAAAGGGAAAAAAGGAAAAG--GGAGAAA Ice-plant TCCTCATCAAAAGGGAAGTGTTTTTTCTCTAACTATATTACTAAGAGTAC Larch TCTTCTTCACAC---AATCCATTTGTGTAGAGCCGCTGGAAGGTAAATCA Turnip TATAGATAACCA---AAGCAATAGACAGACAAGTAAGTTAAG-AGAAAAG Wheat GTGACCCGGCAATGGGGTCCTCAACTGTAGCCGGCATCCTCCTCTCCTCC Duckweed CATGGGGCGACG---CAGTGTGTGGAGGAGCAGGCTCAGTCTCCTTCTCG
AGTCGTACGTGAC...(Human) AGTAGACGTGCCG...(Chimp) ACGTGAGATACGT...(Rabbit) GAACGGAGTACGT...(Mouse) TCGTGACGGTGAT... (Rat) Finding Short Motifs Size of motif sought: k = 4
Most Parsimonious Solution AGTCGTACGTGAC... AGTAGACGTGCCG... ACGTGAGATACGT... GAACGGAGTACGT... TCGTGACGGTGAT... ACGT ACGT ACGT ACGG “Parsimony score”: 1 mutation (Finding the most parsimonious motif is NP-complete.)
Substring Parsimony Problem • Given: • phylogenetic tree T, • set of orthologous sequences at leaves of T, • length k of motif • threshold d • Problem: • Find each set S of k-mers, one k-mer from each leaf, such that the parsimony score of S in Tis at most d. • This problem is NP-hard.
… ACGG: +ACGT: 0 ... … ACGG:ACGT :0 ... … ACGG:ACGT :0 ... … ACGG:ACGT :0 ... … ACGG: 1 ACGT: 0 ... 4k entries AGTCGTACGTG ACGGGACGTGC ACGTGAGATAC GAACGGAGTAC TCGTGACGGTG … ACGG: 2ACGT: 1... … ACGG: 1ACGT: 1... … ACGG: 0ACGT: 2 ... … ACGG: 0 ACGT: +... FootPrinter’s Exact Algorithm(with Mathieu Blanchette, generalizing Sankoff and Rousseau 1975) Wu [s] = best parsimony score for subtree rooted at node u, if u is labeled with string s.
Wu [s] = min ( Wv [t] + d(s, t) ) v:child t ofu Average sequence length Number of species Total time O(n k (4k + l )) Motif length Running Time
Improvements • Better algorithm reduces time from O(n k (42k + l ))toO(n k (4k + l )) • By restricting to motifs with parsimony score at most d, greatly reduce the number of table entries computed (exponential in d, polynomial in k) • Amenable to many useful extensions (e.g., allow insertions and deletions)
Gilthead sea bream (678 bp) Medaka fish (1016 bp) Common carp (696 bp) Grass carp (917 bp) Chicken (871 bp) Human (646 bp) Rabbit (636 bp) Rat (966 bp) Mouse (684 bp) Hamster (1107 bp) Application to -actin Gene
Common carp ACGGACTGTTACCACTTCACGCCGACTCAACTGCGCAGAGAAAAACTTCAAACGACAACATTGGCATGGCTTTTGTTATTTTTGGCGCTTGACTCAGGATCTAAAAACTGGAACGGCGAAGGTGACGGCAATGTTTTGGCAAATAAGCATCCCCGAAGTTCTACAATGCATCTGAGGACTCAATGTTTTTTTTTTTTTTTTTTCTTTAGTCATTCCAAATGTTTGTTAAATGCATTGTTCCGAAACTTATTTGCCTCTATGAAGGCTGCCCAGTAATTGGGAGCATACTTAACATTGTAGTATTGTATGTAAATTATGTAACAAAACAATGACTGGGTTTTTGTACTTTCAGCCTTAATCTTGGGTTTTTTTTTTTTTTTGGTTCCAAAAAACTAAGCTTTACCATTCAAGATGTAAAGGTTTCATTCCCCCTGGCATATTGAAAAAGCTGTGTGGAACGTGGCGGTGCAGACATTTGGTGGGGCCAACCTGTACACTGACTAATTCAAATAAAAGTGCACATGTAAGACATCCTACTCTGTGTGATTTTTCTGTTTGTGCTGAGTGAACTTGCTATGAAGTCTTTTAGTGCACTCTTTAATAAAAGTAGTCTTCCCTTAAAGTGTCCCTTCCCTTATGGCCTTCACATTTCTCAACTAGCGCTTCAACTAGAAAGCACTTTAGGGACTGGGATGC Chicken ACCGGACTGTTACCAACACCCACACCCCTGTGATGAAACAAAACCCATAAATGCGCATAAAACAAGACGAGATTGGCATGGCTTTATTTGTTTTTTCTTTTGGCGCTTGACTCAGGATTAAAAAACTGGAATGGTGAAGGTGTCAGCAGCAGTCTTAAAATGAAACATGTTGGAGCGAACGCCCCCAAAGTTCTACAATGCATCTGAGGACTTTGATTGTACATTTGTTTCTTTTTTAATAGTCATTCCAAATATTGTTATAATGCATTGTTACAGGAAGTTACTCGCCTCTGTGAAGGCAACAGCCCAGCTGGGAGGAGCCGGTACCAATTACTGGTGTTAGATGATAATTGCTTGTCTGTAAATTATGTAACCCAACAAGTGTCTTTTTGTATCTTCCGCCTTAAAAACAAAACACACTTGATCCTTTTTGGTTTGTCAAGCAAGCGGGCTGTGTTCCCCAGTGATAGATGTGAATGAAGGCTTTACAGTCCCCCACAGTCTAGGAGTAAAGTGCCAGTATGTGGGGGAGGGAGGGGCTACCTGTACACTGACTTAAGACCAGTTCAAATAAAAGTGCACACAATAGAGGCTTGACTGGTGTTGGTTTTTATTTCTGTGCTGCGCTGCTTGGCCGTTGGTAGCTGTTCTCATCTAGCCTTGCCAGCCTGTGTGGGTCAGCTATCTGCATGGGCTGCGTGCTGGTGCTGTCTGGTGCAGAGGTTGGATAAACCGTGATGATATTTCAGCAAGTGGGAGTTGGCTCTGATTCCATCCTGAGCTGCCATCAGTGTGTTCTGAAGGAAGCTGTTGGATGAGGGTGGGCTGAGTGCTGGGGGACAGCTGGGCTCAGTGGGACTGCAGCTGTGCT Human GCGGACTATGACTTAGTTGCGTTACACCCTTTCTTGACAAAACCTAACTTGCGCAGAAAACAAGATGAGATTGGCATGGCTTTATTTGTTTTTTTTGTTTTGTTTTGGTTTTTTTTTTTTTTTTGGCTTGACTCAGGATTTAAAAACTGGAACGGTGAAGGTGACAGCAGTCGGTTGGAGCGAGCATCCCCCAAAGTTCACAATGTGGCCGAGGACTTTGATTGCATTGTTGTTTTTTTAATAGTCATTCCAAATATGAGATGCATTGTTACAGGAAGTCCCTTGCCATCCTAAAAGCCACCCCACTTCTCTCTAAGGAGAATGGCCCAGTCCTCTCCCAAGTCCACACAGGGGAGGTGATAGCATTGCTTTCGTGTAAATTATGTAATGCAAAATTTTTTTAATCTTCGCCTTAATACTTTTTTATTTTGTTTTATTTTGAATGATGAGCCTTCGTGCCCCCCCTTCCCCCTTTTTGTCCCCCAACTTGAGATGTATGAAGGCTTTTGGTCTCCCTGGGAGTGGGTGGAGGCAGCCAGGGCTTACCTGTACACTGACTTGAGACCAGTTGAATAAAAGTGCACACCTTAAAAATGAGGCCAAGTGTGACTTTGTGGTGTGGCTGGGTTGGGGGCAGCAGAGGGTG Parsimony score over 10 vertebrates: 0 1 2
Motifs Absent from Some Species • Find motifs • with small parsimony score • that span a large part of the tree • Example: in tree of 10 species spanning 760 Myrs, find all motifs with • score 0 spanning at least 250 Myrs • score 1 spanning at least 350 Myrs • score 2 spanning at least 450 Myrs • score 3 spanning at least 550 Myrs
Application to c-fos Gene 10 Puffer fish Chicken Pig Mouse Hamster Human 7 2 2 1 2 2 1 0 1 Asked for motifs of length 10, with 0 mutations over tree of size 6 1 mutation over tree of size 11 2 mutations over tree of size 16 3 mutations over tree of size 21 4 mutations over tree of size 26 Found: 0 mutations over tree of size 8 1 mutation over tree of size 16 3 mutations over tree of size 21 4 mutations over tree of size 28
Application to c-fos Gene Motif Score Conserved in Known? CAGGTGCGAATGTTC 0 4 mammals TTCCCGCCTCCCCTCCCC 0 4 mammals yes GAGTTGGCTGcagcc 3 puffer + 4 mammals GTTCCCGTCAATCcct 1 chicken + 4 mammals yes CACAGGATGTcc 4 all 6 yes AGGACATCTG 1 chicken + 4 mammals yes GTCAGCAGGTTTCCACG 0 4 mammals yes TACTCCAACCGC 0 4 mammals metK in B. subtilis
Microbial Footprinting • 1105 prokaryotes with genomes completely sequenced (as of 4/6/2010) • For any prokaryotic gene of interest, plenty of close genes in other species available • Relatively simple genomes • MicroFootPrinter (with Shane Neph) • Designed specifically for phylogenetic footprinting in microbial genomes • undergraduate Computational Biology Capstone project • User specifies species and gene of interest • Automates collection of orthologous genes, cis-regulatory sequences, gene tree, parameters
Demo • MicroFootPrinter home • Examples: Agrobacterium tumefaciens genes regulated by ChvI (with Eugene Nester) • chvI (two component response regulator) • ropB (outer membrane protein )
Sample chvI motif Parsimony score: 2Span: 41.10Significance score: 4.22 B. henselae-151 GCTACAATTTR. etli -90 GCCACAATTTR. leguminosarum -106 GCCACAATTTS. meliloti -119 GCCACAATTTS. medicae -118 GCCACAATTTA. tumefaciens -105 GCCACAATTTM. loti -80 GCCACATTTTM. sp. -87 GCCACATTTTO. anthropi -158 GCCACATTTTB. suis -38 GCCACATTTTB. melitensis -156 GCCACATTTTB. abortus -156 GCCACATTTTB. ovis -156 GCCACATTTTB. canis -38 GCCACATTTT
Sample ropB motif Parsimony score: 1Span: 20.70Significance score: 1.34 Jannaschia sp. -151CACATTTTGGR. etli -134CACAATTTGGR. leguminosarum -135CACAATTTGGA. tumefaciens -131CACATTTTGGS. meliloti -128CACATTTTGGS. medicae -128CACATTTTGG
Combined ChvI Motif ropB: CACATTTTGG chvI: GCCACAATTT Atu1221: TTGTCACAAT ultimate: GYCACAWTTTGG Y={C,T} W={A,T}
References and Acknowledgments • Amol Prakash & Martin Tompa, Measuring the Accuracy of Genome-Size Multiple Alignments. Genome Biology, June 2007, R124. • Mathieu Blanchette & Martin Tompa, Discovery of Regulatory Elements by a Computational Method for Phylogenetic Footprinting. Genome Research, May 2002, 739-748. • Shane Neph & Martin Tompa, MicroFootPrinter: a Tool for Phylogenetic Footprinting in Prokaryotic Genomes. Nucleic Acids Research, July 2006, W366-W368. • All software available atbio.cs.washington.edu/software.html