140 likes | 365 Views
HL7-CDISC-PGx Pharmacogenomics Vocabularies. Initial Scouting Report Mark Sharp sharp@merck.com May 2006. Methodology. Study Amnon’s “walk-through” Make a checklist of data elements ( PGx_elements.xls ) Pick out ones which seem to call for controlled vocabularies
E N D
HL7-CDISC-PGxPharmacogenomics Vocabularies Initial Scouting Report Mark Sharp sharp@merck.com May 2006
Methodology • Study Amnon’s “walk-through” • Make a checklist of data elements (PGx_elements.xls) • Pick out ones which seem to call for controlled vocabularies • Scout out available vocab resources • Index analysis files to PGx_elements.xls
Issues • Availability • Authority • Coverage (size, granularity) • Relationships (Medline, etc.) • “Quality” • Agreement with other vocabs • Agreement/coverage of • Amnon’s proposed codes • Peter Elkins’ mutation types, test types • Joyce’s example data • Other?
Chromosomal Loci • “Position" has ambiguous semantics: sequence occupies part/most/all of position. • Chromosomal loci are known/encoded to variable degrees of precision (>100 cases of "1" in HUGO). • This leads to variable cardinality between sequences, chromosomal loci, and approved gene names (i.e., "1" doesn't tell you much). • Should the chromosomal loci values in HUGO, GenBank, etc., be used like the gene names for vocabulary control, given that a new specific locus may be more meaningful than "1"? • These thoughts suggest a function for automatic computation of gene-gene relationships based on a hierarchy of [known] chromosomal loci.
Chromosomal Locus String Length is a Rough Measure of Precision
~24% GenBank ~11% HUGO