210 likes | 225 Views
ACT is a tool designed to extract additional information by comparing sequences from closely related organisms. It allows for the visualization of conserved regions, insertions, and deletions in DNA sequences.
E N D
Module 3 Comparative Genomics Introduction The Artemis Comparison Tool (ACT), also written by Kim Rutherford, was designed to extract the additional information that can only be gained by comparing the growing number of sequences from closely related organisms (Carver et al. 2005). ACT is based on Artemis, and so you will already be familiar with many of its core functions, and is essentially composed of three layers or windows. The top and bottom layers are mini Artemis windows (with their inherited functionality), showing the linear representations of the DNA sequences with their associated features. The middle window shows red and blue blocks, which span this middle layer and link conserved regions within the two sequences, in the forward and reverse orientation respectively. Consequently, if you were comparing two identical sequences in the same orientation you would see a solid red block extending over the length of the two sequences in this middle layer. If one of the sequences was reversed, and therefore present in the opposite orientation, there would be a blue ‘hour glass’ shape linking the two sequences. Unique regions in either of the sequences, such as insertions or deletions, would show up as breaks (white spaces) between the solid red or blue blocks. In order to use ACT to investigate your own sequences of interest you will have to generate your own pairwise comparison files. Data used to draw the red or blue blocks that link conserved regions is generated by running pairwise BLASTN or TBLASTX comparisons of the sequences. ACT is written so that it will read the output of several different comparison file formats; these are outlined in Appendix II. Two of the formats can be generated using BLAST software freely downloadable from the NCBI, which can be loaded and run a PC or Mac. Appendix X shows you how to generate comparison files from BLAST. Whilst having a local copy BLAST to generate ACT comparison files can be very useful, it means that you are tied to a particular computer. Another way of generating comparison files for ACT is to use the WebACT web resource (see page 27). This site allows you to cut and paste or upload your own sequences, and generate ACT readable BLASTN or TBLASTX comparison files. Aims The aim of this Module is for you to become familiar with the basic functions of ACT by using a series of worked examples. Some of these examples will touch on exercises that were used in previous Modules, this is intentional. Hopefully, as well as introducing you to the basics of ACT this Module will also show you how ACT can be used for not only looking at genome evolution but also to backup, or question, gene models and so on. In this module you will also use a web resource, WebACT to generate your own comparison files and view them in ACT.
For comparing more than two genomes! Comparison files end with ‘.crunch’. 1. Starting up the ACT software Make sure you’re in the Module_3_Comparative_Genomics directory. Then click on the ACT icon. The files you will need for this exercise are: Lmjchr11.fasta Lmjchr11fastaLin.chr21.fasta.crunch Lin.chr11.fasta 1 Click ‘File’ then ‘Open 2 Use the File manager to drag and drop files or see 4 3 4, 5 & 6 Click and select appropriate files Click ‘Apply’ and wait…… 6
Drop-down menus. These are mostly the same as in Artemis. The major difference you’ll find is that after clicking on a menu header you will then need to select a DNA sequence before going to the full drop-down menu. • This is the Sequence view panel for ‘Lmjchr11.fasta’ (Subject Sequence) you selected earlier. It’s a slightly compressed version of the Artemis main view panel. The panel retains the sliders for scrolling along the genome and for zooming in and out. • The Comparison View. This panel displays the regions of similarity between two sequences. Red blocks link similar regions of DNA with the intensity of red colour directly proportional to the level of similarity. Double clicking on a red block will centralise it. Blue blocks link regions that are inverted with respect to each other. • Artemis-style Sequence View panel for ‘Lin.chr11.fasta’ (Query Sequence). • Right button click in the Comparison View panel brings up this important ACT-specific menu which we will use later. 2. The basics of ACT You should now have a window like this so lets see what’s there. 1 2 5 3 4
Right button click here De-select stop codons Use the vertical sliders to zoom out. Drag or click the slider downwards from one of the genomes. The other genome will stay in synch. 1 2 3. Exercise 1 Introduction & Aims In this first exercise we are going to explore the basic features of ACT. Using the ACT session you have just opened we firstly are going to zoom outwards until we can see the entire Leishmania major chromosome compared against the Leishmania infantum chromosome. As for the Artemis exercises we should turn off the stop codons to clear the view and speed up the process of zooming out. The only difference between ACT and Artemis when applying changes to the sequence views is that in ACT you must click the right mouse button over the specific sequence that you wish to change, as shown above. Now turn the stop codons off in the other sequence too. Your ACT window should look something like the one below:
Once zoomed out your ACT window should look similar to the one shown above. If the genomes in view fall out of view to the right of the screen, use the horizontal sliders to scroll the image and bring the whole sequence into view, as shown below. You may have to play around with the level of zoom to get the whole genomes shown in the same screen as shown above.
You can optimise your image by either removing ‘low scoring’ (or percentage ID) hits from view, as shown below 1-3 or by using the slider on the the comparison view panel (4). The slider allows you to filter the regions of similarity based on the length of sequence over which the similarity occurs, sometimes described as the “footprint”. In the example shown below, the minimum alignment size has been increased to 166. 1. Right button click in the Comparison View panel LOCKED 2. Select either Set Score Cutoffs or Set Percent ID Cutoffs Move the sliders to manipulate the comparison view image Notice that when you scroll along with either slider both genomes move together. This is because they are ‘locked’ together. Right click over the middle comparison view panel. A small menu will appear, select Unlock sequences and then scroll one of the horizontal sliders. Notice that ‘LOCKED’ has disappeared from the comparison view panel and the genomes will now move independently 1 4 3
Once you have finished this exercise remember to close this ACT session down completely before starting the next exercise • 4. Things to try out in ACT • Double click red boxes to move the alignment to this position. • Zoom right in to view the base pairs and amino acids of each sequence. • Load annotation files into the sequence view panels - try LmjF11.embl and Lin.chr11.embl • Use some of the other Artemis features eg. graphs etc. • Navigate to an ID name, base number, etc on either sequence using the Artemis navigator. • Find an inversion in one genome relative to the other then flip one of the sequences. To reverse and complement a sequence right click the sequence window, then select either flip query sequence or flip subject sequence. This can also be accomplished by clicking on the sequence window and then Edit>Bases>Reverse And Complement • Find a region containing duplicated genes in one genome.
Exercise 2 Comparison of Trypanosoma brucei chromosome 1 to Leishmania major chromosomes 12 and 20. Introduction The structural annotation and analysis of the whole genome of Leishmania major and Trypanosoma brucei is mostly completed. This allows us to perform comparative analysis of the genomes of trypanosomatid parasites and understand the basic biology of their parasitism, based on the similarities / dissimilarities between the parasites at DNA / protein level. Aim You will be performing a three-way comparison between chromosome 1 of Trypanosoma brucei, and it’s homologous chromosomes - 12 and 20 in Leishmania major. This exercise will provide some insight into the difference in chromosome organization and gene content between these too genera with some reference to differences in their biology. The files that you are going to need are: Tb927_01_v4.embl - T. brucei sequence and annotation file Lmjchr12.embl - L. major sequence and annotation file Lmjchr20.embl - L. major sequence and annotation file Lmjchr12.fasta-txTb927_01_v4.fasta -Comparison file1 Tb927_01_v4.fasta-txLmjchr20.fasta.crunch.gz -Comparison file 2 Leishmania major Chromosome 12 TBLASTX comparison Trypanosoma brucei Chromosome 1 TBLASTX comparison Leishmania major Chromosome 20
Lmjchr20.embl Lmjchr12.embl Lmjchr12.fastaTb927_01_v4.fasta.crunch.gz Tb927_01_v4.embl Tb927_01_v4.fasta-txLmjchr20.fasta.crunch.gz After a few seconds a window like that on page 8 should appear. Use some of the techniques previously described to answer the questions on the next page. Drag and drop Files from File manager Add more than two sequences
Exercise 2 Things to try: 1. Adjust the alignment size and/or score cut-off and/or percent ID cutoff to filter out undesired blast hits. Find a break in synteny between T. brucei chromosome 1 and either of the Leishmania chromosomes. Find a gene deletion/insertion event in T. brucei relative to either Leishmania chromosome. Find an inversion in T. brucei chromosome 1 relative to either Leishmania chromosome. Find a gene that has been duplicated in T. brucei relative to Leishmania. Questions to Address using this data: What is the relationship between these three chromosomes? There are tblastx hits along the entire length of both Leishmania chromosomes when compared to T. brucei before any adjustment is made to alignment size or score cut-off. What do these hits represent? Are there large chromosomal regions unique to T. brucei? What is the significance of these regions?
Exercise 3 Prediction of gene models: In the previous Artemis exercise we annotated gene models on chromosome 12 of L. major based on codon usage and NCBI blast alignments. This exercise demonstrates how ACT can complement structural annotation by comparing a newly generated sequence to an existing, annotated orthologous chromosome. Here we will be comparing the EMBL file generated for Leishmaniamajor chromosome 12 to a fully annotated EMBL file of Leishmania infantum. The first part of this exercise will involve generating ‘crunch’ files containing the blast results for comparison in an ACT window using the on-line resource Webact at the URL shown below www.webact.org Select Generate ACT will accept tabular (-m 8) formatted BLAST results between two sequences. These can be generated by BLAST searching a local database, or though on-line tools like webACT.
Select Lmjchr12.fasta from the file menu Select Lin.chr12.fasta from the file menu Select BlastN Click submit when done. After several seconds you will have the option to download the output files. The most important is the crunch file - ‘blast_out1’
Lmj12.fasta Lin.12.fasta blast_out1 • Exercise 3 Part • Using ACT as an Annotation tool: • 1. Load fasta files for L. major and L. infantum chromosome 12 • 2. Load the ‘crunch file’ generated by WebACT - default name is blast_out1 • 3. Read in EMBL files for both sequences, use the EMBL file generated from Module 1 for L. major (NOTE! There is another file called Lmj12.embl in this directory corresponding to the fully annotated database version of this chromosome. For this exercise use the LmjF12new.embl file generated in Module 1) Select the sequence to read an entry into Read Lmj12new.embl into Lmj12.fasta Read Lin.chr12.embl into Lin.chr12.fasta
EMBL files can be annotated directly using ACT through a combination of positional orthology and sequence similarity. This exercise is meant to demonstrate how we can use sequence alignment information in combination with techniques described in module 1 to add new gene models. Load codon usage plots for both sequences. Missing L. major orthologues? Load codon usage tables for L. major by selecting the graph menu, then Lmjchr12.fasta>Add User Plot>Lmajorcodon, the same can be done for Lin.chr12. You may have to resize the window to see both codon plots in the same window. You might also try right clicking on the sequence and de-selecting ‘Revere Frame Lines’ to display only sequence in the forward frame. In the example above it is immediately obvious that a gene model is missing in the partially annotated Leishmania major chromosome 12. Further downstream of this we can see several large ORFs aligned against annotated gene models in the L. infantum chromosome.
1. Double click on ORF with mouse scroll button. 2. Select ‘create feature from base range’. Select the new gene model. By moving the zoom scroll bar on one of the sequence panels it is possible to compare two sequences at the sequence, or peptide level. In some cases it may be necessary to unlock the sequence to move it independently of the other. 3. Zoom in on new gene model using scroll bar. 4. Use the trim shortcut <ctrl-t> to trim gene model to same Met as L. infantum. In ACT, keyboard shortcuts using <ctrl> work on the query sequence only. To use the same shortcut for a subject sequence use <Alt>.
Exercise 2 Part IV • ACT Annotation of Leishmania major gene models • Using the techniques described, look for disagreements between the annotated L. infantum chromosome 12 and your annotations on the L. major chromosome 12 from module 1. • Try adding missing gene models to L. major, informed by the ACT alignment and codon usage. • Look for examples of the following between these two chromosomes: • Repeated genes/regions • Chromosomal rearrangements, translocations, etc. Example location: 389400..422400, in Lin.chr12.embl.
Exercise 4 Aim Using ACT, you will be viewing the apicoplast genomes from five apicomplexan parasites including P. falciparum and do basic comparitive analysis of the sequences. You will identify key features and spot differences between the apicoplast genomes of all these parasites. Out of the five apicoplast genomes, the Babesia bigemina apicoplast genome has been recently sequenced at the PSU (WTSI) and is unpublished. It contains ‘first pass’ manual annotation at present. The other four apicoplast genomes are all published previously and the sequences have been retrieved from the EMBL database. The file containing P. falciparum apicoplast genome (PF_apicoplast_genome.embl) was created by joining two separate entries (EMBL accession numbers X95275 & X95276) in the EMBL database. The files that you are going to need are: 1) BG_apicoplast_genome.embl- sequence & annotated file for Babesia bigemina 2) TP_apicoplast_genome.embl- sequence & annotation file for Theileria parva 3) PF_apicoplast_genome.embl- sequence & annotation file for Plasmodium falciparum 4) TG_apicoplast_genome.embl- sequence & gene model file for Toxoplasma gondii 5) ET_apicoplast_genome.embl- sequence & gene model file for Eimeria tenella 6) TP_BG_comparison - tblastx comparison file of T. parva & B. bigemina 7) TP_PF_comparison - tblastx comparison file of T. parva & P. falciparum 8) PF_TG_comparison - tblastx comparison file of P. falciparum & T. gondii 9) TG_ET_comparison - tblastx comparison file of T. gondii & E. tenella. First, open an ACT window and then open the annotation and the appropriate comparison files in the order of 1 – 6 – 2 – 7 – 3 – 8 – 4 – 9 – 5 (the numbers are designated above). You will need to click on ‘more files’ to upload more than 2 sequences and the comparison flies. Click on ‘apply’ after you have uploaded all the files. 17
Upload the files in sequential order as described in the previous page Click on here to load more files and select the appropriate file. Remember you can also use the file manager to drag in the files Click on here to read all the files that you have selected. Click on ‘no’ if any small dialogue window appears while reading / opening the files.
Can you see any conserved gene order (in the apicoplast genome) between the B. bigemina & T. parva or between P. falciparum, T. gondii and E. tenella ? Can you obtain a clearer picture of the ACT 5-way comparison figure by filtering out the low scoring segments, using the blast score cut off feature? Zoom in and look at some of the genes encoded within theses regions. View the details by clicking on the feature, and then select `Edit selected feature’ from the ‘Edit’ menu after selecting the appropriate CDS feature. You may need to ‘lock’ or ‘unlock’ sequences to get all of them visible within the same window. You may also need to ‘flip’ one or more sequences for the comparative analysis. Use the right click on your mouse and select score cutoff window to appear. Scroll along the bar to screen out low scoring hits
B. bigemina (BG) T. parva (TP) P. falciparum (PF) T. gondii (TG) E. tenella (ET) After filtering out the low-scoring blast matches, you should be able to see a figure like the image below.
Can you fill in the table, based on your analysis? You may need to use some of the features of Artemis that you have learned in earlier Artemis modules to extract some of the numbers. Can you make a schematic diagram showing major features of P. falciparum apicoplast genome? Can you show how P. falciparum apicoplast genome compares with the apicoplast genomes of other four related parasites?