330 likes | 590 Views
OiBC2002. A Feasibility Study on the Standardization in Bioinformatics (FSSB) by the Japan Biological Informatics Consortium (JBIC). November 19, 2002 Masahito Yamaguchi Fujitsu Limited. Introduction of JBIC and JBIC- related Projects. Market size of bioindustry in Japan.
E N D
OiBC2002 A Feasibility Study on the Standardization in Bioinformatics(FSSB) by the Japan Biological Informatics Consortium (JBIC) November 19, 2002Masahito Yamaguchi Fujitsu Limited
Introduction of JBIC and JBIC- related Projects
Market size of bioindustry in Japan Market size Jobs 1997 100 bill JPY 30,000 ⇒ 2010 10 trill JPY 150,000 Dec. 1997: The strategic plan for the promotion of bioindustry was endorsed by the Cabinet. Oct 1998: METI Report from the conference on “ Japan with strong Bio-industry in the 21st Century ” ⇒ 2010 25 trill JPY (*) METI : Ministry of Economy, Trade and Industry
Promotion of bioindustry in Japan Jan. 1999 5 bio-engaged ministries agreed to act collectively. Principal strategies for the creation of bio-industry in Japan Feb. 1999 Conference on the Economic Strategy Promotion of Bioindustry in Japan. Mar 1999 Establishment of the Federation of MPs for the promotion of life science. Commercialization of Life Science June 1999 Establishment of Japan Bio Conference Dec. 1999 Millennium Project was established by Prime Minister of Japan April 2000 The 1st Life Science Summit. Oct. 2002 The 3rd Life Science Summit.
JBIC Millennium Project • Realisation of Innovative Medical Care customised to individuals in the ageing society. Based on the elucidation of the diseases for the elderly, such as dementia, cancer, diabetes, high-blood pressure, study and materialise the customised medical treatment. Along with innovative drug discovery, materialise regenerative medicine for bones and blood vessels free from any rejections, based on the elucidation of birth of organisms. • Realisation of problem-free diet and of the safe environmentMaterialisation of high-functional plants free from allergic reaction, and rice with less usage of pesticides, for prevention of diseases and preservation of health.
Millennium Project Annual Plan(Human Genome Analysis) Year 2001 Year 2000 Year 2002 Year 2003 Year 2004 10000 analysis p.a. (Total 30000) Human Full-length cDNA analysis 10,000 analysis p.a. (Total20000) Human Genome Analysis ) Standard analysis for SNPs (150,000)Systematic diseases research on SNPs 150,000 analysis Research on the relation between Diseases and SNPs diseases・Drug Reaction Geneanalysis Look for the candidate for diseases・drug-reactive genes Search for gene Bio-informaticstechnologies Completion of databases on standard polymorphic genes Network Building Unification of the database
Medical Area JBIC Human Genome analysis Overcome 5 main diseases Realisation of Regeneration Medicine Dementia, Cancer, Diabetes, High blood pressure, Allergic diseases Bones, blood vessels, Neuron,skin, blood, cornea, bone marrow Structural Analysis on Human Full-length cDNA Genome Diversity Analysis Development of technologies on treatment utilising self recovery Analysis on Standard SNPs Deciphering on diseases inducing genes Analysis on Protein Functions Research on the birth, differentiation and regeneration Deciphering on drug- reactive genes Analysis on Expression information, modelling of protein Provide and utilise the data Provide and utilise the data Bioinformatics
What is JBIC ? November 1998: JBiC, a voluntary organization, was established by 12 companies and the support of 15 organizationsJuly 2000: Became incorporated, JBiC (with 75 member companies) • JBIC is an organization composed of a consortium of representatives of the industrial, academics and government sectors consisting primarily 94 private corporations belonging to industries including chemicals, pharmaceuticals, electronics, information, food, precision machinery. • It is a corporation jointly administered by 4 government agencies involved in the field of biotechnology, including the Ministry of Economy,Trade and Industry. Current
Business Domain JBIC promotes industrialization of bioinformatics, and development of bioinformatics.JBIC dispatch the information of Bioinformatics and circulate it. Investigate Research and development Disseminate and enlighten Interaction and co-operation with domestic and overseas organizations
General meeting Auditor Chairman Steering committee Organisation of JBIC Board of Directors Adviser ,Councilor Committee for ethical examination Database Centre Secretariat JBIRC Subcommittee strategies and planning Subcommittee research and development Subcommittee Intellectual Property Rights Subcommittee Database system Subcommittee education and dissemination WGs
Primary roles of JBIC • Protein analysis and analysis of individual differences between genes(SNPs) using Bioinformatics. • Development of Bioinformatics. • Construction of a comprehensive database for researchers including those from universities and government research institutes to access the results of biotechnology research and development • Training and development of experts in field of bioinformatics through the activities describes in (1) and (2) above.
A Feasibility Study about the Standardization in Bioinformatics (FSSB)
The standardization committee • The committee consists of experts in bioinformatics and the members of JBIC. Chairman Prof. Takashi Gojobori National Institute of Genetics Experts 5 peoples / members 15 peoples • The committee organizes a working group that actually carries out the feasibility study. • The role of committee • Survey on the standardization in bioinformatics • Discussion and decision of the feasibility study • Advice for working group • Verification of the result of the feasibility study and proposal from the working group
Rationale of the standardization of SNP data • Researches of SNP in Japan is comparatively advanced in the world. • Multiple projects on SNP are carried out in Japan and the interoperability among them is mandatory. • The standardization on the data of sequence, gene expression and cell simulation has been already proposed. • The standardization of SNP data was not suggested yet. The committee decided to develop the standardization of SNP data by use of XML technology.
Development of Polymorphism Markup Language (PML) • PLM is designed to accommodate various data model and to be expandable. • Analysis of the data system of: JSNP (http://snp.ims.u-tokyo.ac.jp/index_ja.html) dbSNP (http://www.ncbi.nlm.nih.gov/SNP/) HGVbase (http://hgvbase. cgb.ki.se/)
Process of standardization • Research the items of target databases. nomenclature of items classification of groups sort out whether mandatory or optional • Discuss the major classification Define the major classification Omit the duplicated item Define the name of item Select “shallow” hierarchy of schema
Results and discussion (1) Classification and nomenclature of item root element variation frequency population assay submitter publication disease miscellaneous
(2) Variation SNP, insertion, deletion and microsatellite. (3) Submitter data submitter, assay expert and institute (4) Hierarchy of the schema shallow and practical (5) Nomenclature referred to DDBJ, i.e. International Nucleotide Sequence Data (DDBJ/EMBL/GenBank)
Structure of PML Submitter Disease Publlication Assay Miscellaneous Population Variation Frequency Location Gene
Flexibility of PML Case 1 Exchange a variation entry with all other root element data <entry> <submitter …>…</submitter> <variation …>…</variation> <population …>…</population> <assay …>…</assay> <disease …>…</disease> … </entry>
Flexibility of PML Case 2 Exchange multiple variation data with batch processing <entry> <submitter …>…</submitter> <variation …>…</variation> <variation …>…</variation> <variation …>…</variation> … </entry>
Flexibility of PML Case 3 Exchange the complex data including variation and frequency data <entry> <variation …>…</variation> <frequency …>…</frequency> <frequency …>…</frequency> … </entry>
DTD of PML <!ELEMENT entry(variation|frequency|population|assay|submitter|publication|disease|miscellaneous)*><!—variation element variation_id variation unique ID submitter_id submitter reference ID population_id population reference ID assay_id assay reference ID publication_id publication reference ID create_date create date modify_date modify date souce_db source database name source_id source database id source_release_date release date on source database source_modify_date modify date on source database molecular_type molecular type e.g. DNA, RNA variation_type variation type e.g. SNP, Insertion, Deletion, Repeat, NoVariation allele observed allele length sequence length includes franking sequence sequence5 flanking 5' sequence sequence3 flanking 3' sequence validation_status validation_status e.g. Proven, Subpected success_rate certainty of variation information assay_failure assay problem at each variation entry mendelian_segregate Has this SNP been shown to "mendelize"? homozygote_detect Were homozygote individuals observed in the sample? pcr_confirmed Was polymorphism found on repeat PCR sample (not an artifact)? somatic_mutation Is this SNP known to be a somatic mutation? variation_dbxref other database reference information location location informationvariation_dbxref element db database name uid database unique id url linkout url location element location_chromosome chromosome location location_sequence sequence location location_chromosome element db database name uid database unique id version database version number chromosome number on which variation data is position chromosome position on which variation data is orientation chromosome orientation on which variation data is map chromosome map on which variation data is url linkout url location_sequence element db database name uid database unique id version database version position chromosome position on which variation data is orientation chromosome orientation on which variation data is url linkout url gene gene information gene element gene_structure category of gene structure e.g. exon, intron aminoacid_substitution aminoacid substitution generated for variation codon_substitution codon substitution generated for variation codon_position codon position gene_name gene name gene_symbol gene symbol gene_alias gene alias gene_product gene product gene_evidence gene type e.g. Functional Gene, Predicted EST, Predicted by Computer, Pseudogene changed_motif motif is changed or not changed_motif_name motif name splice_site_change splice site is changed or not splice_variant the number of splice variant and RefSeq gene_dbxref other database reference information gene_dbxref element db database name uid database unique ID url linkout url -->
<!ELEMENT variation (source_db|source_id|source_release_date|source_modify_date| molecular_type|variation_type|allele|length|sequence5|sequence3| validation_status|success_rate|assay_failure| mendelian_segregate|homozygote_detect|pcr_confirmed| somatic_mutation|variation_dbxref|location)*> <!ATTLIST variation variation_id CDATA #REQUIRED submitter_id CDATA #REQUIRED population_id CDATA #REQUIRED assay_id CDATA #REQUIRED publication_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT source_db (#PCDATA)> <!ELEMENT source_id (#PCDATA)> <!ELEMENT source_release_date (#PCDATA)> <!ELEMENT source_modify_date (#PCDATA)> <!ELEMENT molecular_type (#PCDATA)> <!ELEMENT variation_type (#PCDATA)> <!ELEMENT allele (#PCDATA)> <!ELEMENT length (#PCDATA)> <!ELEMENT sequence5 (#PCDATA)> <!ELEMENT sequence3 (#PCDATA)> <!ELEMENT validation_status (#PCDATA)> <!ELEMENT success_rate (#PCDATA)> <!ELEMENT assay_failure (#PCDATA)> <!ELEMENT mendelian_segregate (#PCDATA)> <!ELEMENT homozygote_detect (#PCDATA)> <!ELEMENT pcr_confirmed (#PCDATA)> <!ELEMENT somatic_mutation (#PCDATA)> <!ELEMENT variation_dbxref (db|uid|url)*> <!ELEMENT location (location_chromosome|location_sequence)*> <!ELEMENT location_chromosome (db|uid|version|number|position|map|orientation|url)*> <!ELEMENT db (#PCDATA)> <!ELEMENT uid (#PCDATA)> <!ELEMENT version (#PCDATA)> <!ELEMENT number (#PCDATA)> <!ELEMENT position (#PCDATA)> <!ELEMENT orientation (#PCDATA)> <!ELEMENT map (#PCDATA)> <!ELEMENT url (#PCDATA)> <!ELEMENT location_sequence (db|uid|version|position|orientation|url|gene)*> <!ELEMENT gene (gene_structure|aminoacid_substitution|codon_substitution| codon_position|gene_name|gene_symbol|gene_alias|gene_product|gene_evidence| changed_motif|changed_motif_name|splice_site_change|splice_variant|gene_dbxref)*> <!ELEMENT gene_structure (#PCDATA)> <!ELEMENT aminoacid_substitution (#PCDATA)> <!ELEMENT codon_substitution (#PCDATA)> <!ELEMENT codon_position (#PCDATA)> <!ELEMENT gene_name (#PCDATA)> <!ELEMENT gene_symbol (#PCDATA)> <!ELEMENT gene_alias (#PCDATA)> <!ELEMENT gene_product (#PCDATA)> <!ELEMENT gene_evidence (#PCDATA)> <!ELEMENT changed_motif (#PCDATA)> <!ELEMENT changed_motif_name (#PCDATA)> <!ELEMENT splice_site_change (#PCDATA)> <!ELEMENT splice_variant (#PCDATA)> <!ELEMENT gene_dbxref (db|uid|url)*>
<!-- frequency element frequence_id frequence unique ID submitter_id submitter reference ID variation_id variation reference ID population_id population reference ID assay_id assay reference ID publication_id publication reference ID create_date create date modify_date modify date source_release_date release date on source database source_modify_date modify date on source database allele allele allele_frequency allele frequency genotype genotype genotype_frequency genotype_frequency --> <!ELEMENT frequency (allele|allele_frequency|genotype|genotype_frequency)*> <!ATTLIST frequency frequency_id CDATA #REQUIRED submitter_id CDATA #REQUIRED population_id CDATA #REQUIRED assay_id CDATA #REQUIRED publication_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT allele_frequency (#PCDATA)> <!ELEMENT genotype (#PCDATA)> <!ELEMENT genotype_frequency (#PCDATA)> <!-- population element population_id population unique ID submitter_id submitter reference ID create_date create date modify_date modify date population_description population description organism organism strain strain cultivar cultivar population_parameter population parameter sample_size sample size class class pooled either pooled or not population_dbxref other database reference information --> <!ELEMENT population (population_description|organism|strain|cultivar| population_parameter|sample_size|class|pooled|population_dbxref)*> <!ATTLIST population population_id CDATA #REQUIRED submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT population_description (#PCDATA)> <!ELEMENT organism (#PCDATA)> <!ELEMENT strain (#PCDATA)> <!ELEMENT cultivar (#PCDATA)> <!ELEMENT population_parameter (#PCDATA)> <!ELEMENT sample_size (#PCDATA)> <!ELEMENT class (#PCDATA)> <!ELEMENT pooled (#PCDATA)> <!ELEMENT population_dbxref (db|uid|url)*>
<!-- assay element assay_id assay unique ID submitter_id submitter reference ID create_date create date modify_date modify date assay_description assay description assay_parameter assay parameter pcr_primer primer sequence pcr_profile PCR profile pcr_product PCR product e.g. single band, multi band assay_dbxref other database reference information --> <!ELEMENT assay (assay_description|assay_parameter|pcr_primer|pcr_profile| pcr_product|assay_dbxref)*> <!ATTLIST assay assay_id CDATA #REQUIRED submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT assay_description (#PCDATA)> <!ELEMENT assay_parameter (#PCDATA)> <!ELEMENT pcr_primer (#PCDATA)> <!ELEMENT pcr_profile (#PCDATA)> <!ELEMENT pcr_product (#PCDATA)> <!ELEMENT assay_dbxref (db|uid|url)*> <!-- submitter element submitter_id submitter unique ID create_date create date modify_date modify date submitter_name submitter name address address email email tel tel fax fax institution institution laboratory laboratory submitter_dbxref other database reference information submitter_dbxref element db database name uid database unique ID url linkout url --> <!ELEMENT submitter (submitter_name|address|email|tel|fax|institution|laboratory|submitter_dbxref)*> <!ATTLIST submitter submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT submitter_name (#PCDATA)> <!ELEMENT address (#PCDATA)> <!ELEMENT email (#PCDATA)> <!ELEMENT tel (#PCDATA)> <!ELEMENT fax (#PCDATA)> <!ELEMENT institution (#PCDATA)> <!ELEMENT laboratory (#PCDATA)> <!ELEMENT submitter_dbxref (db|uid|url)*>
<!-- publication element publication_id publication unique ID submitter_id submitter reference ID create_date create date modify_date modify date title title author author journal journal volume volume suppliment suppliment issue issue number issue_supplemnt issue supplement number pages pages year year publication_status publication status e.g. unpublished, published mesh_term Pubmed MeSH(Medical Subject Heading) publication_dbxref other database reference information --> <!ELEMENT publication (title|author|journal|volume|supplement|issue| issue_supplement|pages|year|publication_status|mesh_term| publication_dbxref)*> <!ATTLIST publication publication_id CDATA #REQUIRED submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT title (#PCDATA)> <!ELEMENT author (#PCDATA)> <!ELEMENT journal (#PCDATA)> <!ELEMENT volume (#PCDATA)> <!ELEMENT supplement (#PCDATA)> <!ELEMENT issue (#PCDATA)> <!ELEMENT issue_supplement (#PCDATA)> <!ELEMENT pages (#PCDATA)> <!ELEMENT year (#PCDATA)> <!ELEMENT publication_status (#PCDATA)> <!ELEMENT mesh_term (#PCDATA)> <!ELEMENT publication_dbxref (db|uid|url)*> <!-- disease element disease_id disease unique ID submitter_id submitter reference ID create_date create date modify_date modify date disease_description disease description disease_dbxref other database reference information --> <!ELEMENT disease (disease_description|disease_dbxref)*> <!ATTLIST disease disease_id CDATA #REQUIRED submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT disease_description (#PCDATA)> <!ELEMENT disease_dbxref (db|uid|url)*> <!-- miscellaneous element miscellaneous_id miscellaneous unique ID submitter_id submitter reference ID create_date create date modify_date modify date miscellaneous_description miscellaneous description miscellaneous_dbxref other database reference information --> <!ELEMENT miscellaneous (miscellaneous_description|miscellaneous_dbxref)*> <!ATTLIST miscellaneous miscellaneous_id CDATA #REQUIRED submitter_id CDATA #REQUIRED create_date CDATA #IMPLIED modify_date CDATA #IMPLIED > <!ELEMENT miscellaneous_description (#PCDATA)> <!ELEMENT miscellaneous_dbxref (db|uid|url)*>
A prototype of SNP data processing based on PML PML Search and View Converter and check comprehensively dbSNP Browser SNPsDatabase JSNP HGVBase Link to related data Registration DDBJDNA-DB
Next Challenge The standardization has to be done in: data exchange protocol data structure (data format) data items data contents JBIC proposes solutions for the top three items and will tackle the standardization or interoperability of the data contents.
An example of other efforts towardthe interoperability in Japan http://xml.nig.ac.jp/
Acknowledgement • Matsuura Yukio The Japan Biological Informatics Consortium • Hideaki SugawaraCenter for information Biology and DDBJ, National Institute of Genetics • Yasumasa ShigemotoLife Science System Division,FUJITSU