40 likes | 116 Views
3. 3. 7. 6. 8. 1. 2. 1. 9. 10. 4. 5. 11. 2. 4. 1. 7. 2. 3. 8. 9. 4. 10. 6. Services descriptions Kegg_gene_ids_all_species (bconv ): converts external IDs to KEGG IDs [mapping] I nput string: External ID . In this workflow Entrez gene id [Entrez_Gene_ID] Output
E N D
3 3 7 6 8 1 2 1 9 10 4 5 11 2 4 1 7 2 3 8 9 4 10 6 Services descriptions Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping] Input string: External ID . In this workflow Entrez gene id [Entrez_Gene_ID] Output return: KEGG gene ID [KEGG_gene_id] Split_gene_ids: beanshell script to extract KEGG id from the record returned by “Kegg_gene_ids_all_species” operation Input input: result returned by Kegg_gene_ids_all_species operation Output output: Return gene KEGG id [KEGG_genes_id] Lister: List each element of a given file that can be used by subsequent operations Input File: file containing the elements to be listed Output listerReturn: return each element of the file to be used by subsequent operations Get_pathways_by_genes: Search all pathways which include all the given genes [Searching] Input genes_id_list: List of KEGG genes id [KEGG_genes_id] Output return: Return a list of pathway_id of specified KEGG gene ids [KEGG_record_id] & merge_pathways & mergePathways2 (Merge string list to string): concatenate a list of string Inputs stringlist: list of string to concatenate separator: separator to use between strings Output concatenated: Return concatenated string 5 Overall workflow description This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways id returned. Inputs gene: Entrez gene id Gi_numbers: Entrez gene id Outputs Kegg_strings: KEGG gene id merged_kegg_pathways: KEGG pathways Services descriptions Add_ncbi_to_string: beanshell script to add “ncbi-geneid:” to entrez gene ids. Input input: Entrez gene id [Entrez_Gene_ID] Output output: Return KEGG gene id [KEGG_genes_id] Services descriptions blast_ddbj (searchSimple): Executes BLAST with specified program, database and query [local_aligning] inputs: program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases: see appendix query: nucleotide or protein sequence [biological_sequence] Output Result: result of blast execution [BLAST_report] blastfilecomparer: Compares a new BLAST output to an older blast output to identify new hits [filtering] Inputs blastResult_direct_data: blast result file.[BLAST_report]. Use either this parameter or blastResult_url parameter as input but not both together. blastResult_url: url of the blast result. .[BLAST_report]. oldRefFile_direct_data: old blast result file..[BLAST_report]. Use either this parameter or oldblastResult_url as input but not both together. oldRefFile_url: url of the old blast result species: filter the result by species name chromo: filter the result by chromosome number advanced: words are looked for in the FASTA definition line Output report: Return a filtered blast result. Overall Workflow Description This workflow performs a BLAST search then compares the result to a previous blast result based on specified filter. 5 7 Inputs: program: blast type: blastn, blastp, blastx, tblastn or tblastx database: e.g. SWISS, NCBI, EMBL, DDBJ query : nucleotide or protein sequence OldBlastResult: blast result species_filter: species name chromosome_filter: chromosome number Outputs: blast_output: result of blast execution Compared_output: return a list of GI number 6 1 1 2 2 3 8 4 7 4 6 11 5 8 9 10 3 9 10 Services descriptions blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc, Score, bits, per, p, exp) to be displayed in the blast result output. [filtering]. Inputs new_direct_data: blast report file [BLAST_report]. mutually exclusive with new_url parameter new_url: url of the blast report file [BLAST_report]. The following parameter are optional. To select one of them , pass the name of the input as input parameter. For example to display GI numbers, pass gi to the parameter gi. seq_id: sequence identifier gi: For GI number acc: For accession number desc: for descriptions score: for score value bits: for bits score per: for percentage of identity. p: for p-value exp: for E-value Output report: return a simplified blast report 3 1 3 4 8 2 7 9 5 6 Services descriptions Split_by_regex (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input String: string to split Regex: regular expression Output split: return split string getGeneInfo: retrieves gene information given a Ensembl gene id [retrieving] Input geneId: Ensembl gene id [ensembl_record_id] Output Result: Return gene info of specified Ensembl gene id [Ensembl_record] Parse_ddbj_gene_info: extract information from DDBJ (Dna Data Bank of Japan) getGeneInfo processor [retrieving] Input file_direct_data: ‘getGeneInfo’ output result [Ensembl_record] option: used to extract a piece of data from output file. e.g. swiss Output Output: return the extracted piece of data parse_swiss: Beanshell script to extract only swissprot id from “parse_ddbj_gene_info” output Record. Input input: parse_ddbj_gene_info output record with ‘swiss’ as option. Output output: Return swissprot ids [SWISS-PROT_accession] Overall Workflow Description This workflow simplifies a BLAST text file into identifiers, descriptions and values (P, E-values). In order to extract the relevant ids etc. you need to pass the relevant string into the corresponding port, e.g. the default port being used is gi. This has been passed "gi". For any other ports simply pass in the string the SAME as the port name, e.g. seq_id, p, per etc. Overall workflow description This workflow extracts gene information and the relevant swissprot ids given Enembl gene ids Inputs genes_in_region: List of Ensembl gene ids. regex: Regex value to use for “split_by_regex” operation options: option value used to extract a piece of data from “parse_ddbj_gene_info” output file. e.g. swiss Outputs gene_info: return gene information swiss_ids: return swissprot ids 3 1 Inputs blast_file: blast result gi_option: here we want to retrieve only the gi number from the blast output. Outputs Simplified_output: list of GI numbers 2 1 4 5 2 8 6 4 9 Service descriptions blastsimplifier: Simplifies BLAST output by specifying elements (seq_id, gi, acc, desc, Score, bits, per, p, exp) to be displayed in the blast result output. [filtering]. Input: new_direct_data: blast report file [BLAST_report]. Parameter mutually exclusive with the “new_url” parameter new_url: url of the blast report file [BLAST_report] To choose one of the following input , pass the name of the input as parameter value. For example to display GI numbers, pass gi as value for the parameter gi. seq_id: sequence identifier gi: For GI number acc: For accession number desc: for descriptions score: for score value bits: for bits score per: for percentage of identity. p: for p-value exp: for E-value Output: report: a brief summary of the result output: list of specified element. Here, list of GI numbers. split_by_regex (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression Output: split: return split string Merge_string_list_to_string: Merge a list of string Input: stringlist: string list to merge seperator: separator used for merging the list of string Output: concatenated: Return concatenated string GOIDFromGiList: retrieves an array of GO id for a specified array of GI’s [retrieving] Input: giList: list of GI number [genbank_GI] Output: result: list of GO id [Gene_Ontology_term_id] 7 5 5 1 2 9 6 7 8 10 6 3 4 Overall workflow description This workflow takes the list of GI number of a given blast report and retrieves the corresponding GO id. Inputs: blast_report: blast result gi_number: gi to retrieve GI numbers regex: regex value to use for split_by_regex operation seperator: separator to use between strings Outputs: Gi_numbers: list of GI numbers GO_id: list of GO id 7 1 2 3 8 4 9 10 3 4 2 7 5 6 1 Services descriptions Blastx_ddbj (searchSimple): Execute BLAST with specified program, database and query [local_aligning]. inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . or all possible databases see appendix query: Nucleotide or protein sequence in fasta format or without format [biological_sequence] Output Result: Return the result of blast execution [BLAST_report] getFASTA: GetDDBJ entry of FASTA Format by Accession Number [Retrieving] Input accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession] [genebank_gene_accession] Output Result: Return a nucleotide sequence in fasta format [nucleotide_sequence]. Overall workflow description This workflow retrieves an EMBL sequence in fasta format then performs a blast operation. Inputs emblid_default: embl sequence identifier Blast_db: blast database. e.g. SWISS Blast_program: blast program. e.g. blastn Outputs Fasta_output: nucleotide sequence in fasta format Blast_result_ddbj: Blast result 4 1 2 3 5 6 7
& & 11 13 14 1 2 3 5 4 6 1 4 7 8 2 5 9 3 10 & & 5 8 6 10 11 12 13 14 15 16 14 4 15 17 8 11 20 21 3 5 10 1 2 19 12 14 22 9 27 13 6 8 28 10 29 13 17 Services descriptions getParents: Retrieves the IDs of all immediate parent terms of specified GO ID [retrieving] Input geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] Output getParentsReturn: Return the IDs of all immediate parent terms of the specified term [Gene_Ontology_term_id] . getAncestry (getAncestors): Retrieves the IDs of all ancestors of specified GO ID [retrieving]. Input geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] Output getAncestorsReturn: Return the IDs of all ancestors of the specified term [Gene_Ontology_term_id]. Create (createSession): Takes no arguments and Creates a new GoViz session on the server and returns a session identifier that can be used in subsequent operation. Output createSessionReturn: Return a session identifier that can be used in subsequent operation. getChildren & getImmediateChildren (getChildren): Retrieves the IDs of all immediate children of a specified GO ID [Retrieving]. Input geneOntologyID: GO ID of which the Children should be returned [Gene_Ontology_term_id]. Output getChildrenReturn: Return the IDs of all immediate children of the specified term [Gene_Ontology_term_id]. addImmediateChildren & add (addTerm): Add a GO term to the visualisation, updating the state of the named session. Input SessionID: Session ID returned by the createSession operation. geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] 12 24 36 3 1 5 7 2 31 Overall workflow description This workflow builds up a sub graph of the Gene Ontology given a GO term id to show the context for a supplied term or terms Inputs termID: GO term id. e.g. GO:0007601 childColour: colour to use for specify children ancestorColour: colour to use for specify ancestors colourInputTerm: specify the colour of given terms. Outputs graphical: Return a sub graph of the Gene Ontology given a GO id. 32 4 11 26 15 35 3 37 16 33 1 34 7 4 30 9 12 2 25 17 7 9 18 6 23 Services descriptions markAncestors & colourChildren & colourInputTerm (markTerm): Adds a specific colour parameter to supplied term in the Gene ontology. Inputs SessionID: Session ID returned by the createSession operation. geneOntologyID: GO ID of which the Parent terms should be returned [Gene_Ontology_term_id] colour: The colours can be anything that is a valid colour within the dot file format. For the list of colours see appendix: getresults (getDot): Retrieves the DOT text specifying the sub graph of the Gene Ontology that contains all the terms that have been added to the session. [retrieving] Input sessionID: SessionID: Session ID returned by the createSession operation. Output Return the DOT text specifying the subgraph of the Gene Ontology. Finish (destroySession): Removes a session from the server, identified by the session ID returned by the createSession operation. Input SessionID: Session ID returned by the createSession operation. Overall workflow description This workflow retrieves the protein sequence, Pathways, GO diagram, medline info, blast result, and EC numbers of a given probe set id. Inputs ProbSetid: probe set id database: blast database program: blast program used Outputs swissprot: protein sequence interproIds: InterPro ids goDiagram: GO diagram pathways: pathway diagram ecNumbers: enzyme EC number embl: nucleotide sequence in EMBL format meltTemp: nucleotide sequence melting temperature medline: medline info Blast_result: Blast result medlineIds: medline id Services descriptions getMolFuncGoIds: Retrieves GO id of specified probe set id [retrieving] Input probSetid: probe set id [probe_id] Output getGeneOntologyMolecularFunctionReturn: Return a GO id [Gene_Ontology_term_id] getEC: Retrieves enzyme EC number of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output getECReturn: Return EC number [EC_number] getEmblid: Retrieves EMBL id of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output Return: Return EMBL id [EMBL_accession] cleanECnumbers: beanshell script to extract EC number from “getEC” service output Input “getEC” service output execution Ouput ecNumber: Return EC number [EC_number] cleanGoIds: beanshell script to extract GO id from GO record returned by “getMolFuncGoIds”. Input GO records from “getMolFuncGoIds” execution Output goIds: Return GO id [Gene_Onotology_term_id] createVizSession (createSession): Create a new GoViz (Gene Ontology Visualisation Service) session on the server Output returns a session identifier that can be used in subsequent operations getSwissProtId: get a swissprot id of specified probe set id [retrieving] Input probeSetId: probe set id [probe_id] Output getSwissProtIdReturn: Return swissprot id [SWISS-PROT_accession] addTermToViz (addTerm): Add a GO term to the visualisation, updating the state of the named session. Inputs sessionID: session identifier created by “createVizSession” web service geneOntologyID: GO id [Gene_Onotology_term_id] 2 15 1 3 13 14 16 4 28 29 30 5 31 32 33 6 34 35 36 7 37 8 OverallworkflowDescription This workflow performs a sequence similarity search using the BLAST algorithm through the DDBJ (DNA Data Bank of Japan) web service Inputs: program: blast type: blastn, blastp, blastx, tblastn or tblastx database: e.g. SWISS, NCBI, EMBL, DDBJ query_seq : nucleotide or protein sequence Output: text_blast_out: result of blast execution 9 1 2 Services descriptions getPathwaysByECNumbers: get pathways by enzyme EC number [retrieving] Input enzyme_id_list: list of enzyme EC number [EC_number] Output Return: return pathway ids [KEGG_record_id] getMedlineIds(ebi_srslinks): For cross-referencing between databanks[retrieving] In this workflow retrieves medline id given EMBL id. Inputs databank: database name of the record to be linked from. fieldname: databank can be queried according to a number of field ( acc, All text) searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’ xrefDatabank: the databank to be linked to. See appendix for the list of databank Outputs report: summary of the result result: Result of ebi_srslinks execution. This case: medline ids [MEDLINE_reference_id] getFASTA: Get DDBJ entry of FASTA Format by Accession Number [Retrieving] Input accession: embl/DDBJ/NCBI accession number [DDBJ_accession] [EMBL_accession] [genebank_gene_accession] Output Result: Return a nucleotide sequence in fasta format [nucleotide_sequence] removePrefix: beanshell script to remove prefix “MEDLINE:” from “getMedlineIds” output. Inputs str: string containing the prefix to be removed. prefix: prefix to remove Output id: Return medline id [MEDLINE_reference_id] ebi_embl: retrieves embl records given search term(s) [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’ Outputs report: summary of the result result: Return Embl record [embl_record] mark_pathway_by_objects: Mark given objects on a given pathway map [displaying]. Inputs pathway_id: pathway id [KEGG_record_id] object_id_list: list of EC number (without) the prefix “EC” Output return: Return the URL of the generated pathway map [KEGG_record] getDotFromViz( getDot) : Return the DOT text specifying the subgraph of the GO that contains all the terms that have been added to this session using “addTermToViz” calls plus all the ancestors of such Term. input sessionID: session identifier created by “createVizSession” web service getInterProIds: get interPro records of specified probe set Id [retrieving] Input probeSetId: probe set Id [probe_id] Output getInterProReturn: return interPro record [InterPro_record] splitString (Split string into string list by regular expression): split a record by a given regular expression Inputs string: string to be split regex: regular expression used to split a given string Output split: return split string Ebi_uniprot: retrieves Uniprot records given search term(s) [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) searchterm: search term, multiple search terms can be separated using ‘&’, ‘|’ or ‘!’ Outputs result: Return uniprot record [Uniprot_record] cleanInterProIds: beanshell script to extract interPro id from interPro record returned by “getInterProIds” service. Input inputStr: InterPro record returned by the “getInterProIds” service Output InterProIds: Return interPro ids [InterPro_accession] destroyVizSession (destroySession): Remove a session from the server, identified by the session ID returned by the createVizSession operation. Input sessionID: session ID returned by createVizSession operation. getPathwayDiagrams(Get image from URL): retrieves image given the URL Input URL: URL of a image or diagram Output image: Retrun the image corresponding to a given URL. 18 3 10 5 19 Service description searchSimple:Executes BLAST with specified program, database and query [local_aligning] Inputs program: Specify blast type used: blastn, blastp, blastx, tblastn or tblastx database: Specify database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible database see appendix query: nucleotide or protein sequence [biological_sequence] output Result: result of blast execution [BLAST_report] 4 11 20 21 12 Services descriptions hsapiens_gene_ensembl: This biomart processor has been configured to retrieve Ensembl human gene ids and associated GO terms given chromosome number, start and end position [retrieving] Inputs chromosome_name_filter: chromosome number end_filter: end position to use for the query start_filter: start position to use for the query Outputs go_description: return GO term description go: return GO term id [Gene_Ontology_term_id] ensembl_gene_id: return ensembl gene id [ensembl_record_id] genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving] Inputs genesIds: gene identifier .e.g. BRCA2, ENSG00000128573 [Ensembl_record_id] species: species name. e.g. homo sapiens format: format of the gene id list. e.g. plain Output genesLocationsReturn: Return the location of genes on a given chromosome. split_pos (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input String: string to split Regex: regular expression (here:“\n”) Output split: return split string getKaryoviewImage: Returns a representation of the karyotype of given species with features you want to locate on [displaying] Inputs position: position of gene on the chromosome species: species name. e.g. homo sapiens chromosome: chromosome number Outputs getKaryoviewImageReturn: return the URL and html file of karyotype getImage: Beanshell script, extracts the URL of the karyotype Input tabResult: result of the “getKaryoviewImage” service Output url: return the URL of the karyotype. Get_image_from_URL: retrieves the image given the URL input url: URL of the image Output Image : Return the image of specified URL 6 22 15 16 7 23 24 17 8 Overall workflow description This workflow first retrieves Ensembl gene ids and associated GO term given a chromosome start and end position. Then displays the genes on a karyotype. Inputs chromosome: chromosome number .e.g. 12 end: chromosome end position to be used start: chromosome start position to be used species: species name. e.g. homo sapiens plain_format: format type. e.g. plain Outputs Image: karyotype image GO_description: GO term description GO_id: GO term ids ens_gene_id: Ensembl gene ids. 9 calcMeltTemp: Calculates RNA/DNA melting temperature [calculating] Inputs sequence_usa: the Uniform Sequence Address. Mutually exclusive with sequence_direct_data sequence_direct_data: Nucleotide or protein sequence in specified format [Biological_sequence] sformat: optional parameter. sequence format ( see appendix for all possible format) sbegin: optional parameter. the first position to be used in the sequence. send: optional parameter specify the last position to be used in the sequence sprotein: optional parameter. Is sequence protein? snucleotide: optional parameter. Is sequence nucleotide? sreverse: optional parameter. Use reverse sequence slower: optional parameter. Use lower case supper: optional parameter. Use upper case windowsize: optional parameter. Specify window size (see appendix) shiftincrement: optional parameter. specify Shift Increment (see appendix) dnaconc: optional parameter. specify DNA concentration (nM) saltconc: optional parameter. specify salt concentration (mM) graph_format: optional parameter. Format of the graphical output (png, postscript, colourps, hpgl) rna: optional parameter. Use RNA data values product: optional parameter. Prompt for product values formamide: optional parameter. specify percentage of formamide mismatch: optional parameter. specify percent mismatch prodIen: optional parameter. specify product length thermo: optional parameter. Thermodynamic calculations temperature: optional parameter specify temperature in Celsius plot: optional parameter. produce a plot mintemp: optional parameter. minimum temperature Outputs Outfile: Return DNA/RNA melting temperature. Ebi_medline2007: retrieves medline record given a search term [retrieving] Inputs Fieldname: databank can be queried according to a number of field (see appendix) Searchterm: search term, multiple search terms can be separated using ‘&’, ‘ |’ or ‘ !’ Outputs result: return medline record [MEDLINE_citation] DDBJ_blastx (searchSimple): Execute BLAST with specified program, database and query [local_aligning]. Inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ. For all possible databases see appendix query: Nucleotide or protein sequence in fasta format [biological_sequence] Output Result: Return blast report [BLAST_report] 1 25 2 3 4 10 5 12 11 13 14 15 26
1 2 3 4 1 2 6 7 5 6 3 8 4 3 4 8 1 3 4 5 1 9 9 10 7 6 2 5 11 2 7 7 6 11 10 5 Services descriptions getP53MutationIdsByExon: Get TP53 gene mutation ids by exon from IARC TP53 Database catalogue [retrieving] Input libs: Specifies the name (constant) of the TP53 somatic mutation database that must be queried. e.g. tp53_iarc exon: Exon number in the p53 gene Output result: Return exon ids including catalogues' names. e.g. TP53_IARC:9339 getP53MutationssByIds: Get TP53 gene mutations by ids from TP53 IARC database [retrieving] Input id: Exon id without catalogues' names. e.g. 9339 Output result: Return TP53 somatic mutation description. Split_string_into_string_list_by_regular_expression: split string with specified regular expression Inputs strings: string to be split regex: regular expression to use Output split: Return split string Filter_list_of_strings_extracting_match_to_a_regex: extract given regex from a specified string. Inputs stringlist: sting to extract from regex: regular expression to extract. Output filteredlist: return extracted string 3 Services descriptions Object: BioMoby object Inputs namespace: BioMoby name space id: NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl, or PDB identifier article_name: BioMoby article name Output mobydata: return BioMoby data MOBYSHoundGetGenBankWhateverSequence: Consumes a NCBI_Acc, NCBI_gi, PIR, SwissProt, Embl, or PDB identifier and returns the equivalent genbank record as a DNA, RNA, AminoAcid sequence object as appropriate. Input object (identifier): output of Biomoby “Object” service. Output GenericSequence(file): Returns sequence associated to a given identifier [biological_sequence] MIPSBlastBetterE13: executes blast against MAtDB Arabidopsis protein coding genes with a cut off E-value of E=1e-13 Input GenericSequence(QuerySequence): biological sequence [biological_sequence] Output WU_BLAST_Text(BlastReport): Return a blast report [BLAST_report] Extract_accession: beanshell script to extract accession number from a sequence file Input in: sequence file [biological_sequence] Output accs: Return accession numbers Extract_best_hit: beanshell script to extract AGI locus code for best hit Input in: blast report [BLAST_report] Output agi: Return AGI locus code for best hit id: Return sequence identity between query sequence and best hit. Overall workflow description Takes a GenBank identifier (a gi number), gets the according sequence, runs a BLAST against Arabidopsis Proteins and returns the AGI (Arabidopsis Genome Initiative) locus code for the best hit. Inputs string_constant: Biomoby namespace gi: GI number Outputs acc: sequences accession numbers AGI: AGI locus code identity: identical sequence gi: GI number 9 3 5 1 2 4 8 8 9 10 Overall workflow description This workflow takes the exon and the TP53 somatic mutation database as input and retrieves the full TP53 somatic mutation description(s) by first retrieving the TP53 somatic mutation database unique IDs associated with the input and then using IDs for retrieving the full TP53 somatic mutations descriptions. Inputs Tp53_somatic_mutations_database: TP53 somatic mutation database exon: Exons in the p53 gene. Range between 5-11 regex_entry_list_separator: used as a regex separator string to moveTP53 somatic mutation IDs from a text string to a list of strings. regex_id_separator: This regular expression specifies the format of a TP53 somatic mutation id. id_position: specifies that the mutation code is the second part of the ID (regular expression specified by the 'regex_id_separator' string). Outputs ids: Return TP53 exon ids. mutations: Return TP53 somatic mutation description 5 11 6 1 2 4 7 6 7 11 10 5 6 14 1 10 9 13 2 8 4 7 11 3 12 Services descriptions Split_ids (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression (here:“\n”) Output: split: return split string genesLocations: retrieves the location of a gene on a genome using its identifier [retrieving] Inputs genesIds: gene identifier .e.g. ENSG00000128573 [Ensembl_record_id] species: species name. e.g. homo sapiens format: format of the gene id list. e.g. plain Output genesLocationsReturn: Return the location of genes on the chromosome. split_positions (Split string into string list by regular expression): split a given string with a specified regular expression (regex) Input: String: string to split Regex: regular expression (here:“\n”) Output: split: return split string getKaryoviewImage: Returns a representation of the karyotype of a species with features we want to locate on [displaying] Inputs position: position of genes on the chromosome species: species name. e.g. homo sapiens chromosome: chromosome number Outputs getKaryoviewImageReturn: return the URL and html file of karyotype Overall workflow description This workflow retrieves and displays genes positions on a chromosome using Ensembl Karyoview. Inputs ids: list of gene id. e.g. BRCA2, ENSG00000128573 species: species name. e.g. homo sapiens chromosome: chromosome number. plain_format: format of the gene id list Outputs HTML_file: HTML file of the URL containing the image image: image of the genes positions on the chromosome Position: position of gene on the chromosome 5 Overallworkflowdescription This workflow marks and retrieves a pathway diagram given a KEGG pathway id. It also retrieves gene information associated to the pathway id. Input pathwayId: KEGG pathway id. e.g. path:eco00020 Outputs: Image: Pathway image Gene_info: gene information Services descriptions kegg_getGenesByPathway (get_genes_by_pathway): Search all genes on a specified pathway [searching] Input pathway_id: KEGG pathway id. e.g. path:bsu00010 [KEGG_record_id] Output return: Returns all gene_id of the specified pathway [KEGG_record_id] mark_pathway_by_genes: Mark given genes on a given pathway map and return the URL of the generated image. Inputs map_id: KEGG pathway id [KEGG_record_id] oids: KEGG gene id . e.g. eco00020 [KEGG_genes_id] Outputs return: Returns URL of the generated image. 1 Services descriptions GetImage (Get image from URL): retrieve the image associated to a given URL Input url: URL to retrieve the image from. Output image: return the image. kegg_getEntries (bget): Retrieves KEGG database entries specified by a list of entry_id. [retrieving] Input Kids: KEGG database entry id. e.g. eco00020 [KEGG_genes_id] Output return: Return KEGG gene information [KEGG_record] 2 6 4 3 1 4 6 5 12 7 7 13 14 2 8 3 Services descriptions getImageURL: Beanshell script, extracts the URL of the karyotype Input tabResult: result of the “getKaryoviewImage” service Output url: return the URL of the karyotype. getHTMLPage: Beanshell script, extracts the html page of the karyotype Input tabResult: result of “getKaryoviewImage” service Output HTMLPage: Return the HTML page of the karyotype. Get_image_from_URL (Get image from URL): retrieves the image given the URL input url: URL of the image Output Image : Return the image of specified URL 9 11 10 5 Services descriptions getgenesbyspecies: Retrieves a list of Ensembl genes for a given species, chromosome and position [retrieving] Inputs database: name of the Ensembl database to retrieve the genes from. chromosome: chromosome number. e.g. 12 start: start position of the region in the chromosome. end: end position. Output output: return a list of Ensembl gene id of specified region of a given chromosome [ensembl_record_id] getcurrentdatabase: Retrieves the current databases used by ENSEMBL for given species [retrieving] Input species: species name e.g. homo_sapiens Output output: Return the current database from ENSEMBL 1 6 Services descriptions emma: Multiple alignment program - interface to ClustalW program [aligning] Input sequence_direct_data: nucleotide or protein sequence [biological_sequence] Output outseq: Return aligned sequence [multiple_sequence_alignment_report] analyseSimple: Execute ClustalW specified with multi sequences [aligning]. Input query: nucleotide or protein sequence [biological_sequence] Output result: Return aligned sequences [multiple_sequence_alignment_report] prettyplot: Displays aligned sequences, with colouring and boxing [displaying] Input sequence_direct_data: File containing a sequence alignment [multiple_sequence_alignment_report] [pairwise_sequence_alignment_report] Output Graphics_in_PNG: Return a plot of aligned sequences. Overall workflow description This workflow aligns given sequences and displays aligned sequences, with colouring and boxing. Input seqs: nucleotide or protein sequence in fasta format Outputs alignment: return sequence alignment result using analyzeSimple operation single_list: return sequence alignment result using “emma” operation pretty_alignment: Return alignment result with colouring and boxing. 8 2 4 1 7 3 5 3 5 6 7 2 6 4 Overall workflow description This workflow retrieves a list of genes and current databases used from ENSEMBL for a given species, chromosome and positions. inputs Chromosome: chromosome number. e.g. 12 Start: start of the region in the chromosome. e.g. 100 end: end position. e.g. 5000000 species: species name. e.g. homo_sapiens Outputs genes_in_region: Return a list of ENSEMBL gene current_database: Return current database used 4 3 2 1 7 8 Services descriptions genscan: determines the most likely gene structure given a genomic DNA [predicting] Inputs sequence_direct_data: genomic DNA sequence in fasta format [DNA_sequence] sequence_url: URL of the genomic DNA sequence in fasta format. These 2 input parameters are mutually exclusive Output output: Return a gene prediction report [gene_prediction_report] genscansplitter: Run genscan (for gene prediction) on the given sequence input [predicting] Inputs: Scanrecord_direct_data: genomic DNA sequence in fasta format [DNA_sequence] Scanrecord_url: URL of the genomic DNA sequence in fasta format These 2 input parameters are mutually exclusive Outputs: Peptide: Return the predicted protein sequence of the predicted gene [protein_sequence] Contig: Return the predicted gene sequence [DNA_sequence] Search simple: Execute BLAST with specified program, database and query [local_aligning]. inputs program: Blast type used: blastn, blastp, blastx, tblastn or tblastx database: blast database: eg. SWISS, NCBI, EMBL, DDBJ . For all possible databases see appendix query: Nucleotide or protein sequence in fasta format [biological_sequence] Output Result: Return the result of blast execution [BLAST_report] patmatmotifs: Search a PROSITE motif database with a given protein sequence [searching] Inputs sequence_direct_data: protein sequence in fasta format [protein_sequence] full: Boolean. Provide full documentation for matching patterns prune: Boolean. Ignore simple patterns Output outfile: return possible PROSITE motifs found in the given protein [ Prosite_record] 5 6 2 4 7 3 8 2 9 3 10 1 11 6 12 Overall workflow description This workflow first scans a DNA sequence for gene prediction. Then using the predicted gene, it performs a blast operation and finds motifs within the predicted gene. Inputs: dna: DNA sequence Database: blast database. e.g. e.g. SWISS, NCBI, EMBL, DDBJ program: blast type: blastn, blastp, blastx, tblastn or tblastx Outputs blast_out: blast result report prosite_matches: result of PROSITE motif search Peptides: translated gene cds: coding sequence of the predicted gene. genscan_report: sequence of predicted gene. 7 1 4 5 8 9 10 11 12
1 2 4 7 8 9 5 6 3 12 10 11 13 14 15 17 18 16 Overall workflow description This workflow fetches sequences using the seqret tool, the sequences are then subjected to a multiple alignment using emma and simultaneously scanned for predicted transmembrane regions. This alignment is then plotted to a set of PNG images and also used to build a profile using the prophecy and prophet tools. Inputs Sequenceid: sequence identifiers msFormat: sequence format prophecyType: prophecy type prophecyName: single word for sequence name transeqSequenceID: nucleotide sequence id sbegin: start position of the translation process send: end position of the translation process Outputs prophetOutput: Return aligned sequences outputPlot: Return alignment result with colouring and boxing tmapPlot: Displays membrane spanning regions Services descriptions seqret1(seqret): Reads and returns sequences [retrieving] Input Sequence_usa: identifier or GI number of the input sequence Output outseq: Retun sequence [biological_sequence] emma: Multiple alignment program, interface to ClustalW program [aligning] Input sequence_direct_data: nucleotide or protein sequence [biological_sequence] Output outseq: Return aligned sequence [multiple_sequence_alignment_report] formatSequences (seqret): Reads and return sequences [retrieving] Input Sequence_direct_data: nucleotide or protein sequence [biological_sequence] osformat: output sequence format. Possible values see appendix. Output outseq: Retun sequence in specified format. [biological_sequence] plot (prettyplot): Displays aligned sequences, with colouring and boxing [displaying] Input sequence_direct_data: File containing a sequence alignment [multiple_sequence_alignment_report] [pairwise_sequence_alignment_report] Output Graphics_in_PNG: result of prettyplot execution 2 1 3 4 5 6 13 7 8 9 17 12 16 18 Services descriptions Prophecy: Creates matrices/profiles from multiple alignments Inputs sequence_direct_data: alignment report file [multiple_sequence_alignment_report] type: The allowed values for this parameter are: F, G, H, name: Single word without spaces to identify the sequence Output outfile: Return matrix profile transeq: Translate nucleic acid sequences into protein [translating] Input sequence_usa: nucleotide sequence id [EMBL_id] sbegin: start position to be used in the sequence send: end position to be used in the sequence Output outseq: return protein sequence [protein_sequence] tmap: Displays membrane spanning regions [displaying] Input sequence_direct_data: sequence in specified format [biological_sequence] Output graphics_in_PNG: display a graph of the result Prophet: Return Gapped alignment for profiles [gapped_aligning] Input sequence_direct_data: sequence data [biological_sequence] infile_direct_data: Profile or weight matrix file Output outseq: return gap alignment report [multiple_sequence_alignment_report] 11 10 14 15