50 likes | 63 Views
Explore gene sequences, compare alignments, and analyze conservation levels using BLAST and ClustalW. Dive into amino acid alignments on Pfam and practice java program modifications.
E N D
CS177 homework assigned March 2 this can be either a group or individual assignment, whichever is easier for you this will cover multiple alignment html and java this is a lot of stuff, so it is due in 2 weeks the balance should start shifting from assigned homework to doing your projects
“vertical” multiple alignment • pick a gene, find the human mRNA (ie, NM_XXXX) RefSeq, and query NCBI nucleotides using BLAST • see if you get hits to 5 or 6 different species • if not, try another gene • pick out one good hit (ie, low p value and pretty long) for each species (including the original human RefSeq) • submit these sequences to clustalw • visually identify one column on the alignment that exemplifies highly conserved, one for moderately conserved, and one for poorly conserved
“horizontal” multiple alignment • go to pfam site • http://www.sanger.ac.uk/Software/Pfam/ • look it over until you are completely confused • here is my example • run through it • then do your own example • enter “fibrin” in “keywords” box • click on “kringle” • click on “view species tree” • click in box next to homo sapiens and then click “view selected species alignment” • meditate on what you are seeing • amino acids, not nucleotides • uses the one letter symbol format for amino acids • can you make any observations about the multiple alignments? • try it in a second browser window for gorilla and compare with human • ditto for mouse
java • take a look at the NCBI_STRUCTURES.java program • go to the web site from the last homework • http://java.sun.com/j2se/1.3/docs/api/index.html • see if you can find something in the web site that helps you make sense out of one or two things in the NCBI_STRUCTURES.java program • hint: Look at the URLConnection class • just spend 30 or 40 minutes on this. don’t get too frustrated now - you will have plenty of time for that once you get a real job • think of this as a growth experience that builds character
java and html - we will do this in class next week, but you will have to do it on your own also • the NCBI_STRUCTURES.java program can be used as a prototype for this part • go to NCBI web site • view the html source code underlying the web page • locate the form action POST stuff • look at the stuff that happens between the <form and the </form tags • copy the html source into a file, and change POST to GET, save as NCBI.html • open another browser window, and read in NCBI.html using the File menu • perform a query type of your choice • this will not actually work since POST is expected, but notice the stuff in the URL window that is exposed by using GET • copy the html source into a file, and change the NCBI URL into the URL for my cgi program testloop.cgi, save as NCBIecho.html • repeat the last 3 steps, and see if the echo is the same as the GET • modify NCBI_STRUCTURES.java • make it put out what you need for your query • modify the part that does the parsing (ie, the line with <dd>) to make it relevant for parsing your output • hint: figure out the modification by looking at the real output html source from a real query at the real NCBI site • if you cannot figure out how to modify the parsing, then at least comment it out entirely or you will not see any output!! • remember that the java program is run as: • java NCBI_STRUCTURES inputfilename • inputfilename is the name of the input file that has 4or 5 gene names to test oout • remember that first you need to run javac NCBI_STRUCTURES.java to get NCBI_STRUCTURES.classs