240 likes | 467 Views
UniProt. Eric Jain Swiss Institute of Bioinformatics, Geneva W3C Workshop on Semantic Web for Life Sciences, October 2004. What is it?. ID ATPB_CANFA STANDARD; PRT; 19 AA. AC P99504; DT 15-JUL-1998 (Rel. 36, Created) DT 15-JUL-1998 (Rel. 36, Last sequence update)
E N D
UniProt Eric Jain Swiss Institute of Bioinformatics, Geneva W3C Workshop on Semantic Web for Life Sciences, October 2004
ID ATPB_CANFA STANDARD; PRT; 19 AA. AC P99504; DT 15-JUL-1998 (Rel. 36, Created) DT 15-JUL-1998 (Rel. 36, Last sequence update) DT 05-JUL-2004 (Rel. 44, Last annotation update) DE ATP synthase beta chain, mitochondrial (EC 3.6.3.14) (Fragment). GN Name=ATP5B; OS Canis familiaris (Dog). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Mammalia; Eutheria; Carnivora; Fissipedia; Canidae; Canis. OX NCBI_TaxID=9615; RN [1] RP SEQUENCE. RC TISSUE=Heart; RX MEDLINE=98163340; PubMed=9504812; RA Dunn M.J., Corbett J.M., Wheeler C.H.; RT "HSC-2DPAGE and the two-dimensional gel electrophoresis database of RT dog heart proteins."; RL Electrophoresis 18:2795-2802(1997). CC -!- FUNCTION: Produces ATP from ADP in the presence of a proton CC gradient across the membrane. The beta chain is the catalytic CC subunit. CC -!- CATALYTIC ACTIVITY: ATP + H(2)O + H(+)(In) = ADP + phosphate + CC H(+)(Out). CC -!- SUBUNIT: F-type ATPases have 2 components, CF(1) - the catalytic CC core - and CF(0) - the membrane proton channel. CF(1) has five CC subunits: alpha(3), beta(3), gamma(1), delta(1), epsilon(1). CF(0) CC has three main subunits: a, b and c. CC -!- SUBCELLULAR LOCATION: Mitochondrial. CC -!- SIMILARITY: Belongs to the ATPase alpha/beta chains family. DR HSC-2DPAGE; P99504; DOG. DR InterPro; IPR000194; ATPase_a/bcentre. DR PROSITE; PS00152; ATPASE_ALPHA_BETA; PARTIAL. KW ATP synthesis; ATP-binding; CF(1); Direct protein sequencing; KW Hydrogen ion transport; Hydrolase; Mitochondrion. FT UNSURE 8 8 FT UNSURE 17 19 FT NON_TER 19 19 SQ SEQUENCE 19 AA; 1871 MW; BB9C163FDC60BB42 CRC64; ATQTSPSPKG AAAXXXRVV //
[DIR] Parent Directory 19-Jul-2004 13:02 - • [ ] cellular-components.rdf 11-Oct-2004 19:15 5k • [ ] databases.rdf 11-Oct-2004 19:15 45k • [ ] databases.rdf.gz 13-Sep-2004 11:34 6k • [ ] datasets.rdf 19-Oct-2004 16:32 4k • [ ] enzymes.rdf.gz 11-Oct-2004 19:15 309k • [ ] go.rdf.gz 11-Oct-2004 19:15 839k • [ ] intact.rdf.gz 11-Oct-2004 19:15 636k • [ ] keywords.rdf.gz 11-Oct-2004 19:15 96k • [ ] ontology.owl 19-Oct-2004 18:27 77k • [ ] taxonomy.rdf.gz 11-Oct-2004 19:15 4.0M • [ ] uniparc.rdf.gz 13-Oct-2004 10:54 762M • [ ] uniprot.rdf.gz 11-Oct-2004 19:39 768M • [ ] uniref.rdf.gz 01-Oct-2004 12:56 52.2M
use Expasy::RDF; my $parser = Expasy::RDF::Parser->new('P12345.rdf'); while (my $protein = $parser->next) { my $id = $protein->id; my $mass = $protein->sequence->mass; print "Mass of $id is $mass.\n"; print $_->type, ': ', $_->comment, "\n"; foreach ($protein->annotation) } $parser->close;
XML Syntax <?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="urn:lsid:uniprot.org:ontology:" xmlns:owl="http://www.w3.org/2002/07/owl#" > <rdf:Description rdf:about="urn:lsid:uniprot.org:taxonomy:9606"> <rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Taxon"/> <mnemonic>HUMAN</mnemonic> <scientificName>Homo sapiens</scientificName> <commonName>Human</commonName> <rdfs:subClassOf rdf:resource="urn:lsid:uniprot.org:taxonomy:9605"/> </rdf:Description> </rdf:RDF>
Triples, Quads and Quints • What is the source of a triple? • Compact reification.
Web Services • Overkill for providing programmatic access to resources. • Often impractical for performance reasons.
Life Science Identifiers • Need special resolver. • Resolution tied to retrieval. • Explicit version numbers. • Not widely used.
Embedded References • uniprot.rdf • <rdf:Description rdf:about="#_2F9A"> • <rdf:type rdf:resource="urn:lsid:uniprot.org:ontology:Caution_Annotation"/> • <rdfs:comment>In mouse, 5 genes homologous to human CD209/DC-SIGN and CD209L/DC-SIGNR have been identified. Mouse CD209A product was named DC-SIGN by {citation 1} because of its similar expression pattern and chromosomal location in juxtaposition to CD23, but despite of the low sequence similarity.</rdfs:comment> • <citation rdf:resource="#_2F8A"/> • </rdf:Description> • cyc.rdf • <owl:Class rdf:ID="Antigen"> • <rdfs:comment>The collection of substances that can stimulate immune response. For example, bacteria [#$Bacterium], #$Viruses, proteins [#$ProteinMolecule] can serve as #$Antigens.</rdfs:comment> • </owl:Class>
People will adopt the technology if it provides immediate benefits and is simple to use.
<?xml version="1.0" encoding="UTF-8"?> • <rdf:RDF • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" • xmlns:foaf="http://xmlns.com/foaf/0.1/" • > • <foaf:Project> • <foaf:name>UniProt</foaf:name> • <foaf:homepage rdf:resource="http://uniprot.org/"/> • <foaf:fundedBy> • <foaf:Organization> • <foaf:name>National Institutes of Health</foaf:name> • <foaf:homepage rdf:resource="http://www.nih.gov/"/> • </foaf:Organization> • </foaf:fundedBy> • </foaf:Project> • <foaf:Organization> • <foaf:name>Swiss Institute of Bioinformatics</foaf:name> • <foaf:nick>SIB</foaf:nick> • <foaf:homepage rdf:resource="http://www.isb-sib.ch/"/> • </foaf:Organization> • <foaf:Organization> • <foaf:name>European Bioinformatics Institute</foaf:name> • <foaf:nick>EBI</foaf:nick> • <foaf:homepage rdf:resource="http://www.ebi.ac.uk/"/> • </foaf:Organization> • <foaf:Organization> • <foaf:name>Georgetown University</foaf:name> • <foaf:homepage rdf:resource="http://www.georgetown.edu/"/> • </foaf:Organization> • ...