140 likes | 317 Views
Individual sequences in large sets of gene sequences may be distinguished efficiently by combinations of shared sub-sequences. Mark J Gibbs, John S Armstrong and Adrian J Gibbs BMC Bioinformatics 2005, 6:90 Presented by Miguel Gonzalez. Outline. Background Results Discussion. Background.
E N D
Individual sequences in large sets of gene sequences may bedistinguished efficiently by combinations of shared sub-sequences Mark J Gibbs, John S Armstrong and Adrian J Gibbs BMC Bioinformatics 2005, 6:90 Presented by Miguel Gonzalez
Outline • Background • Results • Discussion
Background • Organism identification • Comparative Gene Sequencing • DNA probes
The Problem! • Using contemporary biological research is too time consuming and expensive. • Usually complex techniques are involved.
The Solution • Develop a method for identifying sequences that is not extremely specific. • Probes can be found that bind to more than target sequence to produce unique binding patterns or fingerprints.
Hypothesis • To develop a method for identifying sequences efficiently using distinguishing sub-sequences (DSSs).
Strategy • The study uses the methods of taxonomy where combinations of characters shared by different members of a target organisms. • The advantage is that identification requires fewer characters and questions to identify an individual target.
Strategy • The minimum number of characters for this method is defined by the binary logarithm X = log2Y, X = # of characters; Y = # of targets • Ex. 10 characters could identify a set of 1024 targets.
Testing Hypothesis • Three sets of cytochrome oxidase c 1 (CO1) sequences were used: animal, insect, and moth • CO1-animal had 96 species • CO1-insect had 92 species • CO1-moth had 201 species
Target Sequence • ClustalX was performed on the 3 sets of sequences to find a target region within sequences. • Pools of sub-sequences were created ranging from lengths of 6-31 nucleotides • From the sub-sequences, distinguishing sub-sequences were identified
Discussion • A method was produced where sub-sequences are found which, distinguish the gene sequences or groups of gene sequences from which they came from. • Sequence diversity and sub-sequence length were found to be major factors influencing the number of subsequences available as probe targets.