240 likes | 334 Views
Similarity Searches on Sequence Databases. Chapter 7; Page:215. A story. H. pylori was discover in 1984 its genome was first sequenced in 1990s this was published in NATURE. In this publication, all proteins translated by the genome were also published HOW did they do in a short time?.
E N D
Similarity Searches on Sequence Databases Chapter 7; Page:215
A story • H. pylori was discover in 1984 • its genome was first sequenced in 1990s • this was published in NATURE. • In this publication, all proteins translated by the genome were also published • HOW did they do in a short time?
HOW? • They compare the sequence of the genome of H. pylori with those of other bacteria. • Then they predicted the proteins of H. pylori and its metabolits.
What does this similarity mean? • if two protein or gene sequences are similar, they are homologues. • SO • They are from similar organisms • similar proteins means; • similar functions • similar structures • that is, similar charactersitics
How similar is very similar • For proteins; • if >25% identity between 2 proteins, they are similar The range of identity <25% is called the TWILIGHT ZONE. Nothing is sure about similarity. For nucleotides, the limit is 70% similarity (homologous)
Homology • Addition to %, some other information is essential to say that there is a homology between 2 ones: • Expectation value: less value, more homology, • Lenght of the similar segments • Patterns of a.a conservation • Number of insertions/deletions
BLAST (Basic Local Assightment and Search Tool) • 30 years ago, to scan the simility between our query and hundreds of others we would need several hours :-(print, put on the wall, compare one by one manualy:-) • NOW, by speedy computers, we compare ours with millons at most in several minutes.
BLASTing Protein Sequence • 2 strategies • Compare; • a protein with a protein database : BLASTP • a protein with a nucleotide database : TBLASTN (machine turns your nucleotide seq. into 6 possible sequence) Important BLAST servers • BLAST server from NCBI from USA • BLAST server from Swiss EMBnet • if U learn one, U use other(s)
Which we should choose • Dependin on; • Database: Choose the one using a database you want • Speed: Choose the one which is not crowded (in Turkey, no problem during day until 5 because US and Japan in dark) different BLAST servers return different results instead of the same query because of differences between their databases
BLAST output contains; • A graphic display • A hit list • The alighments • The parameters
A graphic display • which part of other sequences is similar to yours • This part can be different or absent in some servers. • What colors say: best, good, moderate,worse, worst • what does length say: the same length...homologous, shorter corresponds to the domain
A hit list • Accesion number (sp:SWISS-PROT) & name • Description: You estimate whether it is interested or not • Score: if <50, unreliable • E-value: lower E, more similarity; E>0.001.twilight zone. E approaching “0” is the best
Alignments • Alignments say smthng on similarities btw seq • % identity: >25% is good • length:length of alignment. short alignments gives generally high E values • Top is ours; bottom is hit; (+) shows similar aa • XXXXXX: low complexity region • numbers shows the coordinates
BLASTing DNA sequences • If it is reading frame, tranlate it to protein than blast. • if not choose one of them below a DNA from DNA: BLASTN a TDNA from TDNA: TBLASTX a TDNA from protein: BLASTX T:translated; it means blast tanslates our sequence into 6 possible protein sequence
Control sequence masking • Protein: Remove low-complexity regions • DNA: many repeats. filter”human repeats”
BLAST output • a less homologous sequence can be important WHAT? Adjust parameters • suitable database: decrease results, use swiss p. • use the magic tags of enrez query • Adjust E-value
PSI-BLAST (Position Specific Iterated-BLAST) • BLAST finds close relatives. • To find far relatives, use PSI-BLAST • It uses more complex scoring procedures.