360 likes | 600 Views
SCATA : Sequence Clustering and Analysis of Tagged Amplicons A web-based tool to analyse 454-sequence data from microbial communities. Björn Lindahl Mikael Brandström Karina Clemmensen, Jan Stenlid Department of Forest Mycology & Pathology Uppsala BioCenter
E N D
SCATA: Sequence Clustering and Analysis of Tagged Amplicons A web-based tool to analyse 454-sequence data from microbial communities Björn Lindahl Mikael Brandström Karina Clemmensen, Jan Stenlid Department of Forest Mycology & Pathology Uppsala BioCenter Swedish Univsersity of Agricultural Sciences
Quality screening: - Sequence length - Mean quality score - Nr of low quality bases - Blocked motifs (dimers) Results: Cluster summaries Cluster alignments Tag summaries Tag by cluster matrix Sequence Data Quality Data Primer Match to primer Summarise clusters per tag Sort by tags and move tags to metadata Tags Single linkage BLAST clustering with references included Summarise clusters and calculate cluster statistics Reference sequences
file extension = .fna file extension = .qual When files are uploaded, SCATA will send you a mail with information about the number of reads and average and maximum read lenghts!
Upload a text file. If ‘mids’ are used, these should be included in the tag
Quality filtering • lenght
Quality filtering • lenght • base quality
Quality filtering • lenght • base quality • primer match
Quality filtering • lenght • base quality • primer match • motifs enables removal of primer dimers, contaminants etc.
singletons doubletons Number of clusters Number of reads retained
Quality filtering • lenght • base quality • primer match • motifs • Clustering • stringency
Quality filtering • lenght • base quality • primer match • motifs • Clustering • stringency • match length
Quality filtering • lenght • base quality • primer match • Clustering • stringency • match length • Advanced settings: • Scoring parameters
Quality filtering • lenght • base quality • primer match • Clustering • stringency • match length • Advanced settings: • Scoring parameters • Consensus rules less than 75% of most common base => N
H. Wallander, U. Johansson, E. Sterkenburg, M. Brandström & B. Lindahl. • Production of ectomycorrhizal mycelium peaks during canopy closure in Norway spruce forests. (New Phytologist in press) • Sand-filled ingrowth bags were placed in the humus of managed spruce stands at different successional stages • During June – November the sand was colonised by fungi • Mycelial production was estimated by ergosterol analysis • DNA was extracted, PCR amplified and subjected to 454-sequencing
Ergosterol (μg/g sand) Age of forest (y)
Reference guided clustering – a tool to develop ‘preliminary taxonomy’? UNITE - ITS sequences from identified ectomycorrhizal fungi NCBI - all fungal ITS sequence could be extracted, trimmed and used as a reference data set Clusters with reference New data set Reference data base Clusters without reference Representative sequence e.g. BL58 (Helotiales)
BL58 (putative Helotiales) First found by Björn Lindahl Affiliated to Helotiales according to phylogenetic analysis (attached) BL58 (putative Lachnellula sp.) Affiliated to Lachnellula by Håvard Kauserud according to new and better phylogenetic analysis (attached) Lachnellula calyciformis (BL58) Affiliated to species when a new sequence from identified material clusters together with the BL58 consensus. Requires that a common sequencing region is decided on... Requires that a common clustering distance (e.g. 98,5%) is decided on...
SCATA: Sequence Clustering and Analysis of Tagged Amplicons A web-based tool to analyse 454-sequence data from microbial communities Björn Lindahl Mikael Brandström Karina Clemmensen, Jan Stenlid Department of Forest Mycology & Pathology Uppsala BioCenter Swedish University of Agricultural Sciences