1 / 1

Example of a Hash Table (Ning, 2001)

Rat Cow Mosquito Armadillo Honeybee Worm Pufferfish Chicken Frog. Dog Human Mouse Zebrafish Chimpanzee Macaque Opossum Elephant Fly. SSAHA - A Fast Search Method Meg Eckman, Kaitlyn McPartland, Nnenna Opara, Marni Siegel Genome Revolution Focus, Duke University, 2007.

teague
Download Presentation

Example of a Hash Table (Ning, 2001)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rat • Cow • Mosquito • Armadillo • Honeybee • Worm • Pufferfish • Chicken • Frog • Dog • Human • Mouse • Zebrafish • Chimpanzee • Macaque • Opossum • Elephant • Fly SSAHA - A Fast Search Method Meg Eckman, Kaitlyn McPartland, Nnenna Opara, Marni Siegel Genome Revolution Focus, Duke University, 2007 Introduction Genomes Available for Comparison Using SSAHA Online at http://www.sanger.ac.uk/DataSearch/blast.shtml DNA Comparison APT Applications Problem Statement In trying to compare two different genomes of the same species, bioinformatic scientists use SSAHA to rapidly find the similarity between strands of DNA. One application of the SSAHA algorithm and tool is to find SNPs: single nucleotide polymorphisms. In this problem you'll return the location of the first SNP in two strands of the same length, or zero if the strands are the same and don't contain a SNP. The first base-pair in a strand has index 1 in this problem, not index zero. As in the real SSAHA code, any character other than 'g', 't', 'c', or 'a' should be treated as an 'a' for the purposes of matching nucleotides. Definition Class: DNACompare Method: snpFind Parameter: String dna1, String dna2 Returns: int Method signature: int snpFind (String dna, String n); (be sure your method is public) Class public class DNAComparison { public int snpFind (String dna, String n) { // fill in code here } } Constraints dna1 and dna2 will have a maximum of fifty characters and the same character length. The characters of DNA will be ‘a’, ‘g’, ‘t’, ‘x’ or ‘c’. The character ‘x’ should be treated as an ‘a’. Examples 1. dna1= aaggttcc dna2= aagtttcc Returns: “4” because the first place the strands disagree is the g and t. 2. dna1 = aaaaaaaaaaaa dna2 = aaaaaaaaxaaa Returns: “0” because the program will treat the ‘x’ as an ‘a’. SSAHA stands for Sequence Search and Alignment by Hashing Algorithm, a method that quickly and easily searches the large amount of data in DNA databases. Created in 2001, SSAHA offers a faster tool for sequence alignment. SSAHA uses a hash table to store the locations of small sequences of nucleotide bases (known as k-tuples) in many different “bins”. Each nucleotide is coded for using a binary system; thus when trying to match a string of nucleotides from a different genome or within the same genome (as in genome shotgun technique), the program only has to look through bins until it finds the proper nucleotide sequence instead of searching through the entire genome. SSAHA compromises memory for speed, but this speed means that SSAHA can handle high throughput such as entire genomes. • SSAHA has been used to… • Refine repetitive sequence searches (Reneker, 2005) • Locate large amounts of genetic variation within the zebrafish species (Guryev, 2006) • Localizing large data sets (Harper, 2006) • Aligning DNA strands (Ning, 2004) Literature Cited Altschul, Gish, Miller, Myers, and Lipman. “Basic Local Alignment Search Tool.” J. Mol. Biology. Academic Press Limited. 1990, 215, 403-410. Guryev, Koudijs, Berezikov, Johnson, Plasterk, van Eeden, Cuppen. “Genetic Variation on Zebrafish.” Genome Research, March 13, 2006. Harper, Huang, Stryke, Kawamoto, Ferrin, Babitt. “Comparison of Methods for Genomic Localization of Gene Trap Sequences.” MC Genomics, v. 7, 2006. Knuth, Donald E. The Art of Computer Programming: Sorting and Searching. v3, 1st ed. Addison-Wesley Publishing Company: Reading, Massachusetts, 1973. Ning, Cox, Mullikin. “SSAHA: A Fast Search Method for Large DNA Databases.” Genome Research, 2001 11: 1725-1729. Ning, Spooner, Spargo, Leonard, Rae, Cox. “The SSAHA Trace Server.” Computational Systems Bioinformatics Conference, 2004. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=1332490 Welcome Sanger Trust Services. “Sequencing Blast Search Services.” http://www.sanger.ac.uk/DataSearch/blast.shtml Raphael, B. “Aspects and applications of symbol manipulation.” ACM/CSC-ER: Proceedings from the 1966 Conference. http://portal.acm.org/citation.cfm?id=800256.810683 Reneker, Jeff, and Chi-Ren Shyu. “Refined Repetitive Sequence Searches Utilizing a Fast Hash Function and Cross Species Information Retrievals.” BMC Bioinformatics 2005, 6:111. Example of a Hash Table (Ning, 2001) Table 1. A 2-tuple Hash Table for S1, S2, and S3 w E(w) Positions AA 0 (2, 19) AC 1 (1, 9) (2, 5) (2, 11) AG 2 (1, 15) (2, 35) AT 3 (2, 13) (3, 3) CA 4 (2, 3) (2, 9) (2, 21) (2, 27) (2, 33) (3, 21) (3, 23) CC 5 (1, 21) (2, 31) (3, 5) (3, 7) CG 6 (1, 5) CT 7 (1, 23) (2, 39) (2, 43) (3, 13) (3, 15) (3, 17) GA 8 (1, 3) (1, 17) (2, 15) (2, 25) GC 9 GG 10 (1, 25) (1, 31) (2, 17) (2, 29) (3, 1) GT 11 (1, 1) (1, 27) (1, 29) (2, 1) (2, 37) (3, 19) TA 12 (3, 25) TC 13 (1, 7) (1, 11) (1, 19) (2, 23) (2, 41) (3, 11) TG 14 (1, 13) (2, 7) (3, 9) TT 15 7) (3 Acknowledgements Timeline We would like to recognize Professor Owen Astrachan for his brilliance, amazing YouTube video creations, and ever sarcastic quips in class. 2004 2000 2006 1989 2001 1956 For Further Information Genetic Variation in Zebrafish is published. SSAHA is published. “Hash coding was first described in the open literature by Arnold I. Dumey, Computers and Automation 5, 12 (December 1956), 6-9… A classical article by W. W. Peterson, IBM J. Research & Development1 (1957), 130-146, was the first major paper dealing with the problem of searching in large files” (Knuth, 541). Altschul et al. publish BLAST - Basic Local Alignemnet Search Tool.. John Sulsten and Craig Venter jointly announce the completion of a draft of the Human Genome. SSAHA2 is published. Visit our webpage at www.duke.edu/~kam57/

More Related