450 likes | 597 Views
RDP – Capturing the Unclassified. Use only on data that can be publicly shared. These are not secure tools. Genboree RDP Output. Tutorial 2 Dataset QIIME chimeras removed RDP Sample Period. Download files. Raw.results.tar.gz. Unarchive and Decompress. Use 7zip Seq.fna.
E N D
RDP – Capturing the Unclassified Use only on data that can be publicly shared. These are not secure tools.
Genboree RDP Output • Tutorial 2 Dataset • QIIME • chimeras removed • RDP • Sample Period
Download files • Raw.results.tar.gz
Unarchive and Decompress • Use 7zip • Seq.fna
In Bioedit: • Ctrl +A – to select all sequences • Shift + Ctrl + C – to copy all sequence titles • In Excel: • Paste into excel. In Column B (or other) • =left(a1,number_of_characters_in_titles) • Ctrl+Shift+Down arrow • Ctrl+D – to copy to all cells below • Check your work. Select only your samples. Do not select blank cells. Copy the correct titles.
In Bioedit: • Paste Over titles • Save as: your_filename.fas • In the pull down menu • choose fasta
very tiny datasets • Do not navigate away
What do you get back? • Confidence file • Classifications • Failed classifications Check this file. • Problems have happened if not empty. • Hierarchy
Open classifications in excel • Focus on Phylum for tutorial. Use any level.
Keep it Tidy • Cut out what isn’t needed or being used.
Confidence in the Classification • Sort on the confidence level • Odd groups • Leave in or take out? • Replace those below your confidence level • Unclassified_ • =concatenate($column$row,cell) • $ keeps the column or row static in your formula as you drag to multiple cells
Even at the Phylum Level • 60 categorical levels • (could be 2 for every known phylum)
To count by sample and phylum classification • =countifs($K:$K,$O2,$A:$A,P$1) • How to stop recalculation and manually restart – don’t crash your machine! You can easily cause hours of computation on large matrixes!
Stop Automatic Recalculation • In the Options Menu • Under Formulas • F9
Sum Rows and Sort On (Your Favorite) • Total is Customary • Can rearrange as needed
Make a 100% Stacked Chart • Not very pretty
To Compare to Genboree • RDP must be run • png.result.tar.gz
Some Problems Commonly Encountered • Column formatting is not always followed with RDP output. • To get a clean graph with all taxonomic levels on one column, you may need to sort and remove sections of data. • Some have additional levels • Some have fewer levels of classification
Additional Levels of Classification Move over Move over Delete Delete
Fewer Levels of Classification Common Trouble Makers • Bacteroidetes • Verrucomicrobia • Acidobacteria • Dehalococcoidetes • Cyanobacteria • Chloroplast • Deltaproteobacteria • OD1_genera_incertae_sedis • TM7_genera_incertae_sedis • Armatimonadetes • WS3_genera_incertae_sedis Move Over