420 likes | 596 Views
... לקחת את הביולוגיה למימד חדש. מבוא לביואינפורמטיקה. בני שומר, נובמבר 2005. Exponential Growth Rate. Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries. ~200,000 Protein Entries.
E N D
...לקחת את הביולוגיה למימד חדש מבוא לביואינפורמטיקה בני שומר, נובמבר 2005
Exponential Growth Rate Over the last two decades, nucleic acid data has accumulated at the EMBL database at an exponential rate, currently totaling ~110 Gbases, related from 62M entries.
~200,000 Protein Entries Currently stored in the UniProt database, with 70M amino acids. ERK2 MAP Kinase
The whole genome of over 1500 viruses and 775 bacteria has been completely sequenced or is in progress… Salmonella sp. Bacteriophage T4 Haemophilus influenza
Trypanosoma brucei Plasmodium falciparum Leishmania major Schizosaccharomyces pombe …as well as some 400 eukaryotic genomes of which 135 are of parasites, fungi and other lower forms.
Mitochondrion 3D CAT More than 500 organelle genomes are in the databases Mitochondria Chloroplast
About 80 plants are being genome/EST sequenced or genetically mapped Arabidopsis thaliana
There are currently ~170 genome projects of Metazoa
3.2 Gb ~30,000 genes.
"בעלות על מאגר של ידע, זהו אושר לא קטן" סוקרטס
"לא צריך לצאת מפרופורציות וצריך להישאר עם הראש על המתניים" אלון מזרחי
Same Size Genome ~3Gb • About same number of genes (30,000) • Same gene contents • 85-90% similarity between genes (up to 98% similarity with apes)
Genes & Development Vol. 14, No. 20, pp. 2551-2569, October 15, 2000
From Sequence to Biology Human Zebrafish HoxB4 local alignment 1460 1470 1480 1490 1500 AF3071 TGGGCAATTCCCAGAAATTAATGGCTATGAGTTCTTTTTTGATCAACTCA :: ::::::: ::::::::::::: :::::::: : :::::::::::: AF0712 TGTGCAATTCAAAGAAATTAATGGCCATGAGTTCCTATTTGATCAACTCC 180 190 200 210 220 1510 1520 1530 1540 1550 AF3071 AACTATGTCGACCCCAAGTTCCCTCCATGCGAGGAATATTCACAGAGCGA :::::::: ::::: ::::: :: :: :::::::::::::: :::::::: AF0712 AACTATGTGGACCCTAAGTTTCCACCCTGCGAGGAATATTCCCAGAGCGA 230 240 250 260 270 1560 1570 1580 1590 1600 AF3071 TTACCTACCCAGCGACCACTCGCCCGGGTACTACGCCGGCGGCCAGAGGC ::::::::::: ::::: :: : ::::: : ::: :::::::: AF0712 CTACCTACCCAGT---CACTCTCCGG---ACTACTACAGCGCCCAGAGGC 280 290 300 310 1610 1620 1630 1640 1650 AF3071 GAGAGAGCAGCTTCCAGCCGGAGGCGGGCTTCGGGCGGCGCGCGGCGTGC ::: : ::::::: ::: :: :: : : ::: ::: ::: AF0712 AAGACCCCTCGTTCCAGCATGAGTCGATCTACCACCAGCGGTCGGGCTGC 320 330 340 350 360 Local, Global, Multiple…
>gi|28558768|sp|P53601|A4_MACFA Amyloid beta A4 protein precursor (APP) (ABPP) (Alzheimer's disease amyloid protein homolog) [Contains: Soluble APP-alpha (S-APP-alpha); Soluble APP-beta (S-APP-beta); C99; Beta-amyloid protein 42 (Beta-APP42); Beta-amyloid protein 40 (Beta-APP40); C83; P3(42); P3(40); Gamma-CTF(59) (Gamma-secretase C-terminal fragment 59); Gamma-CTF(57) (Gamma-secretase C-terminal fragment 57); Gamma-CTF(50) (Gamma-secretase C-terminal fragment 50); C31] Length = 770 Score = 1277 bits (3305), Expect = 0.0 Identities = 642/752 (85%), Positives = 643/752 (85%) Query: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP Sbjct: 19 EVPTDGNAGLLAEPQIAMFCGRLNMHMNVQNGKWDSDPSGTKTCIDTKEGILQYCQEVYP 78 Query: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ Sbjct: 79 ELQITNVVEANQPVTIQNWCKRGRKQCKTHPHFVIPYRCLVGEFVSDALLVPDKCKFLHQ 138 Query: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLXXXXXXXXX 198 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPL Sbjct: 139 ERMDVCETHLHWHTVAKETCSEKSTNLHDYGMLLPCGIDKFRGVEFVCCPLAEESDNVDS 198 Query: 199 XXXXXXXXXXWWGGADTDYADGSXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX 258 WWGGADTDYADGS Sbjct: 199 ADAEEDDSDVWWGGADTDYADGSEDKVVEVAEEEEVAEVEEEEADDDEDDEDGDEVEEEA 258 Query: 259 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXCSEQAETGPCRAMISRWYFDVTEGKCAP 318 CSEQAETGPCRAMISRWYFDVTEGKCAP Sbjct: 259 EEPYEEATERTTSIATTTTTTTESVEEVVREVCSEQAETGPCRAMISRWYFDVTEGKCAP 318
Genome Level Annotation Chromosome Oriented Focus Position Chromosome Slider Focus Area Overview
Genome Level Annotation Focus Area Detailed View
Genome Level Annotation Focus Area Basepair View
Genome Level Annotation Gene Oriented
Protein properties EX33 inflammation related GPCR analysis
Protein properties EX33 inflammation related GPCR analysis
Secondary Structure Prediction Garnier . 10 . 20 . 30 . 40 . 50 MWNSSDANFSCYHESVLGYRYVAVSWGVVVAVTGTVGNVLTLLALAIQPK helix HHHHHHH sheet E E EEEEEEEEEEEEEEE EEEE E turns TT TTTTT TTTT TTT T coil CC CCCC . 60 . 70 . 80 . 90 . 100 LRTRFNLLIANLTLADLLYCTLLQPFSVDTYLHLHWRTGATFCRVFGLLL helix HHHHHHHH H sheet EE EEEEEE EEEEEEEE E EEEE EEEEEEEE turns T TT T TTTTTTT coil C . 110 . 120 . 130 . 140 . 150 FASNSVSILTLCLIALGRYLLIAHPKLFPQVFSAKGIVLALVSTWVVGVA helix HH HHHHH HHHHHH sheet EEEEEEE EEEE EEE EEEEEEEEEEEEEE turns T TT T coil C C CC C . 160 . 170 . 180 . 190 . 200 SFAPLWPIYILVPVVCTCSFDRIRGRPYTTILMGIYFVLGLSSVGIFYCL helix sheet EEEEEEEEEEEE EEEEEEEEEEEE EEEEEE turns T TTTTTTT T TT coil CCCC C CC CC Plotstructure PredictProtein
Pattern and Motif Analysis ID GATA_ZN_FINGER_1; PATTERN. AC PS00344; DT NOV-1990 (CREATED); NOV-1997 (DATA UPDATE); JUL-1998 (INFO UPDATE). DE GATA-type zinc finger domain. PA C-x-[DN]-C-x(4,5)-[ST]-x(2)-W-[HR]-[RK]-x(3)-[GN]-x(3,4)-C-N-[AS]-C. NR /RELEASE=41.18,131945; NR /TOTAL=99(61); /POSITIVE=99(61); /UNKNOWN=0(0); /FALSE_POS=0(0); NR /FALSE_NEG=14; /PARTIAL=0; CC /TAXO-RANGE=??E??; /MAX-REPEAT=2; CC /SITE=1,zinc; /SITE=4,zinc; /SITE=15,zinc; /SITE=18,zinc; DR O13412, AREA_ASPNG, T; O13415, AREA_ASPOR, T; P17429, AREA_EMENI, T; Protein Families
Protein-Protein interaction Data Sources: Yeast Two Hybrid system Triclosan - FabI
Protein-Protein interaction Data Sources: Surface Plasmon Resonance Triclosan - FabI
Protein-Protein interaction Data Sources: Natural Language Processing
DNA Microarray & Expression Analysis
Cloning, Restriction & Mapping
Linguistics & Information systems