1 / 7

Sequence Search

Sequence Search. The Problem as defined in Windows 3.1 days. Search for a sequence in a database several megabytes in size, on a machine with 640 KB memory machine as quickly as possible and return all matching sequences If the sequence does not exist then add it to the data file

Download Presentation

Sequence Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Search

  2. The Problem as defined in Windows 3.1 days • Search for a sequence in a database several megabytes in size, on a machine with 640 KB memory machine as quickly as possible and return all matching sequences • If the sequence does not exist then add it to the data file • Each sequence will be given a unique identifier • A sequence may be a subset of another sequence

  3. Givens • Unlimited disk space • Sequences made up of amino acids from a growing set • Each amino acid in the database given an entry number • Sequences are made up of at least 4 amino acids and maybe be of any length upwards

  4. Limitations • Max network speed 2Mb/sec, Lantastic • No SQL databases, only Paradox available

  5. Current Situation • Amino acid table consists of approximately 700 entries • Over 38000 unique sequences • Sequences occupy over 11MB of Paradox table • 2 auxiliary tables 17MB in total • Negative result returned almost instantly

  6. Implementation • Each amino acid is represented by a letter or its entry number in the AA table. Eg ABCFTR ABS(123)DFR • Sequence as entered is converted to a hex-triple representation. Eg 00100200301F • Hex-triple chosen as it only requires 3 characters to represent up to 4095 distinct amino acids. Hence making for shorter sequence representations

  7. Do we need to update the system? • Yes, we want to be rid of Paradox

More Related