1 / 24

Utilizing GC Content to Differentiate Phytophthora from Tomato Sequences

Calculate and analyze the GC content of Phytophthora-tomato interactome sequences using Perl and R scripts, followed by creating a histogram for visualization. Follow step-by-step instructions to complete the mission successfully.

pmarvin
Download Presentation

Utilizing GC Content to Differentiate Phytophthora from Tomato Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using GC content to distinguish Phytophthora sequences from tomato sequences

  2. Mission #1 Calculate the GC content of each sequence in the Phytophthora-tomato interactome We will use a perl script to accomplish the mission.

  3. Preparation • Download the perl script (gc.pl) from the class web site and store it in C:/BioDownload folder

  4. Running the script • Open cygwin, or command prompt (Vista users), or terminal (Mac users) • Change directory (cd) to the BioDownload folder • perl<space>gc.pl<space>PhytophSeq1.txt<space>phyto_gc.out

  5. Results In cygwin (Windows users) or terminal (Mac users) grep<space>--perl-regexp<space>”\t”<space>-c<space>phytoph_gc.out grep<space>”>”<space>-c<space>PhytophSeq1.txt You should get the same number from the two commands. The number should be 3921.

  6. The output file Name column GC content column

  7. Mission #2 Build a histogram of the values of GC content We will use R program to accomplish this mission.

  8. http://www.r-project.org

  9. Mac users

  10. All Windows users

  11. XP users Vista users

  12. getwd() to know which folder you are in now

  13. setwd(“c:/BioDownload”) to change the working directory to C:/BioDownload setwd(“/path/to/biodownload”) for Mac users

  14. data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE)data<-read.table(“phytoph_gc.out”,sep=“\t”,header=FALSE) to read in the data in the file phytoph_gc.out (your file name may be different)

  15. data[1:10,] to see the first 10 lines of the vector “data”

  16. gc<-data[,2] to assign the values from the 2nd column of “data” to a new vector “gc”

  17. summary(gc) to get the summary of the values in the vector “gc”

  18. hist(gc,breaks=58) to draw a histogram of the values in “gc” vector Breaks indicates how many cells you want for the histogram. It was calculated as 78.7 (max) - 21.2 (min). It means the bin of the histogram is ~ 1 GC value

  19. hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) to make the histogram look better

  20. >pdf(“gc_histogram.pdf”) >hist(gc,breaks=58,xlab=“GC content”,ylim=range(c(0,400)),main=“Histogram of GC content of sequences\ninPhytophthora-tomato interactome”) >dev.off() To output the histogram to a PDF file.

  21. location file

More Related