270 likes | 690 Views
Plant DNA Barcoding: data workflow. Aron Fazekas University of Guelph . Plant DNA Barcoding: data workflow. Workflow Outline: raw sequence editing data alignment re-edit the sequence file upload to BOLD quality checks using BOLD / genbank. Sequence editing: primer trimming.
E N D
Plant DNA Barcoding: data workflow Aron Fazekas University of Guelph
Plant DNA Barcoding: data workflow Workflow Outline: raw sequence editing data alignment re-edit the sequence file upload to BOLD quality checks using BOLD / genbank
Sequence editing: primer trimming 5’ GTTATGCATGAACGTAATGCTC GAGCATTACGT….
Sequence Alignment After editing: need to align the data Kelchner (2000) Ann Missouri Bot Gard rbcL easy to align - most programs work well matK tricky to align – TransAlign seems to do the best job trnH difficult (impossible between genera?) ITS difficult (impossible between genera?) Clustal www.clustal.org TransAlign http://www.biomedcentral.com/1471-2105/6/156 K-Align http://www.ebi.ac.uk/Tools/msa/kalign/
Sequence Alignment Problems to look for after alignment: - primers not trimmed - gaps at the ends - gaps in the middle (protein coding) - translation shows stop codons
- primers not trimmed trnH-psbA Real data submitted for publication - gaps at the ends
rbcL data submitted for publication - gaps in the middle of a coding region
Translate coding regions (rbcL, matK) to ensure there are no stop codons present
Can trnH-psbA (or other non-coding sequence) be aligned across diverse species?
Check for misplaced taxa – • remove them from the dataset • Check for singleton species – make a list
Acknowledgements Sujeevan Ratnasingham & Bold Team