80 likes | 177 Views
Graphs for workflow. Dots assemblies. NRDB. Psipred. PDB. BlastX. Blast NR PDB. BlastP. MapCandAssembly SeqsToGenome. InterProScan. Genome Workflow. Compile time Include/Exclude. Molecular Weight. Calculate Protein Seq Include:Tbrucei927, Lmajor, Linfantum, Lbraziliensis.
E N D
Dots assemblies NRDB Psipred PDB BlastX Blast NR PDB BlastP MapCandAssembly SeqsToGenome InterProScan Genome Workflow Compile time Include/Exclude Molecular Weight Calculate Protein Seq Include:Tbrucei927, Lmajor, Linfantum, Lbraziliensis Analysis steps (Blue rectange) Extract Genome Seq Molecular Weight Min/Max Isoelectric point Extract Protein Seq Find tandem repeats Make ORF Make Protein Seq for NCBI filtverSequences load tandem repeats load ORF run SignalP formatncbiBlastFile run TMHMM Copy Genomic Sequence to Cluster loadLowComplexitySeq createEpitope MapFiles Load SignalP Load TMHMM Copy Protein Seq to Cluster LoadEpitope extractNaSeqAltDefLine runSplign loadSplignResults Analysis subflow (Orange rectangle With round corner)
NRDB/PDB Sub-flows NRDB PDB • Move download file • NR.gz • gi_taxid_prot.dmp.gz Find ProteinXRefs Load DbXrefs Shorten defLine (NR) Move download file Pdb.fsa Copy nr.fsa to cluster • Rename files • nr.fsa->nr_shortDef.fsa • nr->nr.fsa Copy pdb.fsa to cluster
Create Similarity Dir Blast Sub-flows Copy Similarity dir To cluster Start Blast on Cluster Wait for Cluster Copy results from cluster Rename file blastSimilarity.out.gz->blastSimilarity.unfiltered.out.gz Filter BLAST Results BlastX Optional step (runtime test) Extract Ids From BLAST Results BlastX & BlastP Load NRDB Subset BlastX & BlastP Load Protein Blast
Psipred Subflow Create psipred Data Dir Fix protein IDs for psipred Create psipred Task Dir • Copy files to cluster • Data Dir • AnnotatedProteinPsipred.fsa Start psipred On cluster Wait for cluster copy psipred files from cluste fix psipred File Names Make Alg Inv Load Secondary Structures
InterproScan Subflow Create Iprscan dir Copy files to cluster Iprscan Dir start Iprscan On cluster Wait for cluster Copy Iprscan Files from cluster Load Iprscan Results
mapCandAssemblySeqs ToGenome Subflow Make Candidate Assembly Seqs Extract Candidate Assembly Seqs Extract Genomic Seqs Into Separate Fasta Files Create Genome dir for GfClient Create Repeat Mask dir • Mirror To Cluster • Genome Dir • Repeatmask dir Stare GenomeAlign On Compute Cluster Wait for Cluster • Copy file from cluster • Results of Genome alignment • Results of repeatmask Update gus table with xmi Load contig alignments
clusterMultiEstSoursesByAlign Dots Assemblies Subflow getNotAlignedEstAndAddOneCluster splitCluster AssembleTranscripts extractAssembles Create Genome dir for GfGlient Create Repeat Mask dir • Copy files to cluster • Genome Dir • Repeatmask dir Start Genome Align On Compute Cluster Wait for Cluster • Copy file from cluster • Results of Genome alignment • Results of repeatmask Load contig alignments updateAssemblySourceId