370 likes | 463 Views
Phyloinformatics or How to analyze LOTS of sequences. Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014. Phyloinformatic workflow. Phyloinformatic workflow. www.phylota.net. Select and Download Data. Find a sequence cluster with: > 500 sequences
E N D
PhyloinformaticsorHow to analyze LOTSof sequences Heath Blackmon University of Texas at Arlington Bioinformatics – Spring 2014
Select and Download Data • Find a sequence cluster with: > 500 sequences < 2000 base pairs • - Tetrapoda • - Teleostei • - eudicotyledons • - arthropoda
Select and Download Data • Find a sequence cluster with: > 500 sequences < 2000 base pairs Download the example file of 18S sequences from the class google drive: 18S.fa • - Tetrapoda • - Teleostei • - eudicotyledons • - arthropoda
Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance
Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance
Alignment Programs Clustal Omega MAFFT ProbCons TCofee PRRN DECIPHER Muscle Clustal Kalign DIALIGN-T Bali-Phy
MAFFT • Align 1,000s of sequences in minutes/hours • Progressive and iterative methods supported • Multiple scoring schemes • Install locally or run on the CBRC servers
Go ahead and try aligning the 18S.fa file that you downloaded from the class google drive.
Retrieve Sequences • Phylota • Genbank Phyloinformatic workflow • Align • MAFFT……………… • Evaluate Alignment • LAST • Gblocks / Guidance
INVERSION Matches between opposite strand Matches between same strand
Evaluating the 18S alignment • Look at your dot plots first. What is wrong with the sequences? • How would you fix/prevent this problem?
Evaluating Sites in an Alignments • Bootstrapping - Guidance • ID regions with strong support - Gblocks
GBlocks 6 I residues 8 F residues 9 W residues
Bootstrapping These scores across the bottom scaled between 0 and 1 report the proportion of alignments that agree on the assignment of nucleotides in the original MSA
Try The Data You Downloaded • Make an alignment • Check the dot plots • Use Gblocks to remove uncertain sites • How many sites in initial alignment? • How many sites in filtered alignment? • Did you lose any taxa?
Treat your alignment as a model parameter! • BaliPhy: Estimates phylogenetic trees across all possible alignments without conditioning on a single alignment being “true” • Thanks for listening to me!