230 likes | 324 Views
Chani & Malki present:. The OdzFinder. Project adviser: Dr. Ron Wides. WANTED. Name : Odz a.k.a : Ten-m Family : pair-rule gene Length: 10,000 bp. Getting to Know Odz …. Discovered in D. Melanogaster in 1994. Belongs to pair rule gene family.
E N D
Chani & Malki present: The OdzFinder Project adviser: Dr. Ron Wides
WANTED Name: Odz a.k.a: Ten-m Family: pair-rule gene Length:10,000 bp
Getting to Know Odz … • Discovered in D. Melanogaster in 1994 • Belongs to pair rule gene family • Plays a crucial role in the CNS during fetal development d z O Od Odz protein is expressed in neurons, developing brain and hindgut Odz protein is expressed in segmentation.
The Odz Family Odz gene orthologs have been found in 3 phylums: Ten-m1 Ten-m2 Ten-m3 Ten-m4 Vertebrates Ten-a Ten-m Arthropods Nematodes Ten-m
EGF-like domain Intracellular kinase substrate domain ODZ The Odz Protein The only pair rule gene that encodes a protein! • 2731 Amino Acids • Contains 3 domains: I. extracellular EGF-like repeats • hydrophobic sequences, • probably transmembrane • sequence II. tyrosine kinase phosphorylation sites
EGF-like Repeats • EGF-like domain: • 30 - 40 amino acid residues • Significant homology to epidermal growth factor (EGF) • Has been found in single or multiple copies in a number of other proteins • Generally found in the extracellular domain of membrane proteins or secreted proteins • Involved in receptor-ligand interactions • Includes 6 conserved cysteine residues involved in disulfide bonds x(4)-C-x(0,48)-C-x(3,12)-C-x(1,70)-C-x(1,6)-C-x(2)-G-a-x(0,21)-G-x(2)-C-x
The lab’s goals: • Genomics: • To find a broad family of Odz gene • Phylogenetic trees to discover segmentation mechanism • Massive alignment to find conserved regions • Biological in-vivo experiments to change regions • Proteomics: • The protein’s role • How the protein functions • The protein’s interactions with other proteins ( i.e : notch)
BLASTing new EST libraries Finding Odz Genes • BLASTing existing databases Data Bases Odz DataBase • Extracting DNA from various innocent creatures Se/uences discovered in the lab EST Libraries
Odz Database • The collected data was organized by Michal Markovitz in a relational database. • The database consists of 10 different tables. For example:
2 problems remained: 1.Blast results include many non Odz hits: • prokaryotic hits • non-metazoan hits • EGF region hits • Low similarity • 2. Every day… • New sequences are added to the • existing databases • New EST libraries are released We need a program to automatically extract Odz hits from NCBI Blast results!!!
The OdzFinder A perl program that will automatically extract Odz hits from NCBI Blast results.
S.O.F.T - screen Odz Flow Template input Blast Report Tax Report Prokaryote? no Metazoan? yes EGF? No EGF All EGF Mixed EGF Score>x? Score>x? yes yes Combination Evalue>y? Evalue>y? yes yes Odz Look up table UpdateDatabase
Blast Report input • BLASTS are performed on the Odz orthologs • The results are sent to the OdzFinder program to be filtered. • The program extracts relevant information from each hit: >gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence Length = 184032 Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%) Frame = +3 / +3 Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179
Blast Report Tax Report input • Search for eukaryotic and metazoan results. • Build prokaryotic database for possible future use. • Evolutional distance becomes relevant when dealing with EGF-like repeats. root; cellular organisms; Eukaryota; Fungi/Metazoa group; Metazoa; Eumetazoa; Bilateria; Coelomata; Protostomia; Panarthropoda; Arthropoda; Mandibulata; Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera; Endopterygota; Hymenoptera; Apocrita; Aculeata; Apoidea; Apidae; Apinae; Apini; Apis >gi|163076235|gb|AC765764.7 Apis mellifera BAC clone RP11-18D7 , complete sequence Length = 184032 Score = 153 bits (328), Expect = 3e-36 Identities = 59/59 (100%), Positives = 59/59 (100%) Frame = +3 / +3 Query: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH Subjct: 3 IQHKTFKFHGNYIKQRFHPRIYK*RYKYQRFHPRIYK*NLNLYRVCCSHIILECLQTAH 179 Taxonomy Report Eukaryota .................................. 2502 hits 41 orgs [root; cellular organisms] . Bilateria ................................ 2421 hits 33 orgs [Fungi/Metazoa group; Metazoa; Eumetazoa] . . Coelomata .............................. 2396 hits 31 orgs . . . Deuterostomia ........................ 2322 hits 23 orgs . . . . Chordata ........................... 2296 hits 22 orgs . . . . . Euteleostomi ..................... 2236 hits 21 orgs [Craniata; Vertebrata; Gnathostomata; Teleostomi] . . . . . . Tetrapoda ...................... 2022 hits 14 orgs [Sarcopterygii] . . . . . . . Amniota ...................... 1908 hits 12 orgs . . . . . . . . Eutheria ................... 1634 hits 10 orgs [Mammalia; Theria] ; The program will receive the BLAST hit’s Taxonomy Report and manipulate it into a manageable hash table. A default Taxonomy Report will be available when BLASTing against ESTs.
EGF? Tenascin-m (odz) includes 8 EGF-like repeats The conserved EGF region gave problematic results. Many hits appear only due to their similarity to the EGF region. Query : Subject : High score!!!
I. The hit is completely inside the query’s EGF-region 525 804 2750 Query Hit II. The hit is completely outside the query’s EGF-region 525 804 Query Hit III. The hit is partially in the query’s EGF-region 525 804 Query Hit There are three possible positions regarding the hit’s relation to the query’s EGF-like region -
Score>x? yes Evalue<y? yes Odz No EGF Position I : The hit is completely outside the query’s EGF-like region • score & e-value are examined • Set low threshholds to ensure that very small hits are not missed - some times they are translocations
evolutionally close query & subject high id % demanded evolutionally distant query & subject low id % demanded Score>x? yes Evalue>y? yes Look up table Odz Position II : The hit is completely inside the query’s EGF-like region All EGF In order to prevent acceptance of non-odz hits with high scores due to their egf-region , a look up table was established ? Look up table example:
Is it more like A or like B? A B Treat like II Treat like I Position III : The hit is partially inside the query’s EGF-like region Mixed EGF 2 Possibilities: A. False call ! An EGF hit with insignificant similarity outside of EGF-domains. B. The Real Thing ! EGF with adjacent regions of significant similarity.
DBI Update Database • A database interface module for Perl • Enables Perl applications to access multiple database types • Provides a consistent database interface independent of the actual database being used :Data flow through DBI MySQL RDBMS Perl Script DBI DBD::MSQL
Special thanks to our project adviser Dr. Ron Wides For his guidance, patience & Krispy Kreme donuts ה' odz לעמו יתן ה' יברך את עמו בשלום