240 likes | 362 Views
Bioinformatics Exercise Stephen Tsui School of Biomedical Sciences http://www.bch.cuhk.edu.hk/teaching/bioinfo_exercise_question.ppt. Q1 Books in NCBI. Find a figure showing the genome map of hepatitis B virus. Q2 PubMed.
E N D
Bioinformatics ExerciseStephen TsuiSchool of Biomedical Scienceshttp://www.bch.cuhk.edu.hk/teaching/bioinfo_exercise_question.ppt
Q1 Books in NCBI Find a figure showing the genome map of hepatitis B virus.
Q2 PubMed Professor TSUI Lap Chee published an article, which describing the discovery of the cystic fibrosis gene, in a journal called ‘Science’ in 1989. What is the title of this article? Answer: Identification of the cystic fibrosis gene: genetic analysis.
Q3 PubMed Which university published the article “Genetic modulation of polyglutamine toxicity by protein conjugation pathways in Drosophila”? Who is the first author of this article? Try to download this paper. Answer: University of Pennsylvania Professor Edwin CHAN (Chan HY)
Q4 Entrez What is the sequence of the first five amino acid of a human protein called cysteine rich heart protein discovered in 1995? Answer: MPKCP
Q5 Entrez What is the full name of the gene with the accession number AF133732? How many amino acids are there in the protein it encodes? Answer Full name: Human LIM-only protein FHL3 (FHL3) Number of amino acids in FHL3: 280
Q6 OMIM What is the chromosomal location of glucose-6 phosphate dehydrogenase gene? Answer The chromosomal location of the G6PD gene is Xq28.
Q7 Taxonomy Browser What is the species represented by the name "Ovis aries"? Answer: Sheep
Q8 Genome Biology How many nucleotides are there in rattus norvegicus (rat) chromosome 5? How many genes can be found in this chromosome? Answer: No. of nucleotides = 173,096,209 No. of genes = 1,604
Q9 Codon Usage How many CGG codons have been used to code for the amino acid arginine in the following piece of coding DNA? atgcccaagtgtcccaagtgcaacaaggaggtgtacttcgccgagagggtgacctctctgggcaaggactggcatcggccctgcctgaagtgcgagaaatgtgggaagacgctgacctctgggggccacgctgagcacgaaggcaaaccctactgcaaccacccctgctacgcagccatgtttgggcctaaaggctttgggcggggcggagccgagagccacactttcaagtaa Answer: 2
Q10 Restriction Site Analysis How many BamH1 cutting sites are there in the following DNA segment? cagaacaaca gtgcgggctc acctgccaag ggaggagaag agagcgcccc taaacatgcg gctgcggctg ctggtgtccg cgggcatgct gctggtggct ctgtcgccct gtctgccttg cagggccctg ctgagcaggg gatccgtctc tggagcgccg cgggccccgc agccgttgaa tttcttgcaa ccggagcagc cccagcaacc tcagccgatt ctgatccgca tgggtgaaga atacttcctc cgcctgggga acctcaacag aagtcccgct gctcggctgt cccccaactc cacgcccctc accgcgggtc gcggcagccg cccctcgcac gaccaggctg cggctaactt tttccgcgtg ttgctgcagc agctgcagat gcctcagcgc ccgctcgaca gcagcacgga gctggcggaa cgcggcgccg aggatgccct cggtggccac cagggggcgc tggagaggga gaggcggtcc gaggagccgc ccatctctct ggatctcacc ttccaccttc tgagggaagt cttggaaatg gccagggcag agcagttagc tcagcaagct cacagcaaca ggaaactgat Answer: 1
Q11 Hydrophobicity Plot Which of the following is the most hydrophilic region, a.a. 50-60, a.a. 70-80, a.a. 330-340? 1 mdgsgerslp epgsqssaas ddieivvnvg gvrqvlygdl lsqypetrla elinclaggy 61 dtifslcddy dpgkrefyfd rdpdafkcvi evyyfgevhm kkgicpicfk nemdfwkvdl 121 kflddccksh lsekreelee iarrvqlild dlgvdaaegr wrrcqkcvwk flekpesscp 181 arvvaelsfl lilvssvvmc mdtipelqvl daegnrvehp tlenvetaci gwftleyllr 241 lfsspnklhf alsfmnivdv lailpfyvsl tlthlgarmm eltnvqqavq alrimriari 301 fklarhssgl qtltyalkrs fkelglllmy lavgifvfsa lgytmeqshp etlfknipqs 361 fwwaiitmtt vgygdiypkt tlsklnaais flcgviaial pihpiinnfv ryynkqrvle 421 taakhelelm elnsssggeg ktggsrsdld nlppepagke apscssrlkl shsdtfipll 481 teekhhrtrl qsck Answer a.a. 70-80.
Q12 pI and Molecular Weight What are the pI and molecular weight of the following protein? 1 mdgsgerslp epgsqssaas ddieivvnvg gvrqvlygdl lsqypetrla elinclaggy 61 dtifslcddy dpgkrefyfd rdpdafkcvi evyyfgevhm kkgicpicfk nemdfwkvdl 121 kflddccksh lsekreelee iarrvqlild dlgvdaaegr wrrcqkcvwk flekpesscp 181 arvvaelsfl lilvssvvmc mdtipelqvl daegnrvehp tlenvetaci gwftleyllr 241 lfsspnklhf alsfmnivdv lailpfyvsl tlthlgarmm eltnvqqavq alrimriari 301 fklarhssgl qtltyalkrs fkelglllmy lavgifvfsa lgytmeqshp etlfknipqs 361 fwwaiitmtt vgygdiypkt tlsklnaais flcgviaial pihpiinnfv ryynkqrvle 421 taakhelelm elnsssggeg ktggsrsdld nlppepagke apscssrlkl shsdtfipll 481 teekhhrtrl qsck Answer: pI = 5.73 Molecular weight = 55729.42 Da or 55.7 kDa
Q13 Protein Subcellular Location Prediction What is the predicted subcellular location of the SARS-3a protein? MDLFMRFFTLGSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSAT KIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINA CRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGISTPKL KEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDP PNVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL Answer Mitochondrial membranes, plasma membranes or endoplasmic reticulum membranes.
Q14 Transmembrane Region and Orientation Is the following a transmembrane protein? If yes, where is the transmembrane region? MDLFMRFFTLGSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSAT KIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINA CRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGISTPKL KEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDP PNVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL Answer: Yes, it is a transmembrane protein. Transmembrane region: a.a. 40-62, a.a. 77-99, a.a. 108-130
Q15 BLAST Sequence Alignment What are the identities of the following DNA sequence? cagaacaaca gtgcgggctc acctgccaag ggaggagaag agagcgcccc taaacatgcg gctgcggctg ctggtgtccg cgggcatgct gctggtggct ctgtcgccct gtctgccttg cagggccctg ctgagcaggg gatccgtctc tggagcgccg cgggccccgc agccgttgaa tttcttgcaa ccggagcagc cccagcaacc tcagccgatt ctgatccgca tgggtgaaga atacttcctc cgcctgggga acctcaacag aagtcccgct gctcggctgt cccccaactc cacgcccctc accgcgggtc gcggcagccg cccctcgcac gaccaggctg cggctaactt tttccgcgtg ttgctgcagc agctgcagat gcctcagcgc ccgctcgaca gcagcacgga gctggcggaa cgcggcgccg aggatgccct cggtggccac cagggggcgc tggagaggga gaggcggtcc gaggagccgc ccatctctct ggatctcacc ttccaccttc tgagggaagt cttggaaatg gccagggcag agcagttagc tcagcaagct cacagcaaca ggaaactgat Answer: Rattus norvegicus corticotropin releasing hormone
Q16 BLAST Sequence Alignment What are the identities of the following protein sequence? skpmgtqtht mifdnafnct feyisdafsl dvseksgnfk hlrefvfknk dgflyvykgy qpidvvrdlp sgfntlkpif klplginitn frailtafsp aqdtwgtsaa ayfvgylkpt tfmlkydeng titdavdcsq Answer: SARS coronavirus spike protein
Q17 Multiple Sequence Alignment Tools Which two of the following are closely related? >A MPKCPKCNKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYAAMFGPKGFGRGGAESHTFK >B MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >C MPKCPKCDKEVYFAERVTSLGKDWHRPCLKCEKCGKTLTSGGHAEHEGKPYCNHPCYSAMFGPKGFGRGGAESHTFK >D MPKCPKCQKEVYFAERVSSLGKDWHRPCLKCEKCSKTLTPGSHAEHEGKPYCNQPCYGALFGPKGFGRGGTESHSYK Answer: B and C
Q18 Structure Visualization • Get rasmol software from http://www.umass.edu/microbio/rasmol/getras.htm#raswin • Download the structure of a protein called CRIP and view the structure by rasmol. • Save the picture as a bmp file.
Q19 Secondary Structure Prediction How many alpha helices are there in the hemoglobin alpha subunit? MVLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSH GSAQVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLL SHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR Answer Five alpha helices.
Q20 Protein-protein Interaction Which proteins are the interacting partners of FHL2?
Q21 Secondary Structure Prediction What is the identities of the following fingerprint? 2882.50 1182.44 1423.52 1191.50 1000.33 814.43 1624.74 1355.53 958.35 1165.39 1300.47 1426.57 2653.39 2265.11 2544.41 Answer: Tripartite motif protein TRIM19
Q22 Enzymes What are the substrate and products of an enzymes called "catalase"? What is the enzyme number of catalase? Answer: Substrate: hydrogen peroxide Products: oxygen and water Enzyme number: 1.11.1.6
Q23 Protein Domain Analysis What domain can be find in the following protein? 1 msesfdcakc neslygrkyi qtdsgpycvp cydntfantc aecqqlighd srelfyedrh 61 fhegcfrccr cqrsladepf tcqdsellcn dcycsafssq csacgetvmp gsrkleyggq 121 twhehcflcs gceqplgsrs fvpdkgahyc vpcyenkfap scarcsktlt qggvtyrdqp 181 whreclvctg cqtplarqqf tsrdedpycv acfgelfapk cssckrpivg lgggkyvsfe 241 drhwhhncfs carcstslvg qgfvpdgdqv lcqgcfqagp Answer LIM Domain