570 likes | 1.01k Views
Vanderbilt’s DNA Databank : BioVU. Personalized Medicine. Integration of genomic information into clinical decision making Personalized disease treatment and also preventative therapies. Personalized Medicine.
E N D
Personalized Medicine • Integration of genomic information into clinical decision making • Personalized disease treatment and also preventative therapies
Personalized Medicine • A SNP is a single base-pair mutation that occurs at a specific site in the DNA sequence - occurs in at least 1% of the population • SNPs are responsible for over 80% of the variation between two individuals; they are ideal for establishing correlations between genotype and phenotype • As some SNPs predispose individuals to have a certain disease or trait or react to a drug in a different way, they will be highly useful in diagnostics and drug development
What is BioVU? • The move towards personalized medicine requires very large sample sets for discovery and validation • BioVU: biobank intended to support a broad view of biology and enable personalized medicine • Contains de-identified DNA extracted from leftover blood after clinically-indicated testing of Vanderbilt patients who have not opted out • Linked to Synthetic Derivative: de-identified EMR • Current sample number: 116,551 • 105,910adult samples • 10,641pediatric samples
Extract DNA John Doe A7CCF99DE65732…. eligible A7CCF99DE5732…. One way hash scrubbed A7CCF99DE65732…. John Doe The “synthetic derivative” (SD): can be updated
Synthetic Derivative vs. BioVU A7CDE6532…. + scrubbed scrubbed A7CDE6532 …. A7CDE6532 …. Synthetic Derivative BioVU ~1.9 million ~116,000
The Synthetic Derivative • A Derivative of the EMR - information content reduced by ‘scrubbing’ identifiers • Systematically shifted event dates • Contains ~1.9 million records • ~1 million with detailed longitudinal data • averaging 100,000 bytes in size • an average of 27 codes per record • Records updated over time and are current through 9/31/09 • Can be searched restricting to records for which DNA is available
Synthetic Derivative Data Types • Narratives, such as: • Clinical Notes • Discharge Summaries • History and Physicals • Problem Lists • Surgical Reports • Progress Notes • Letters • Diagnostic Codes, Procedural Codes • Forms (intake, assessment) • Reports (pathology, ECGs, echocardiograms) • Clinical Communications • Lab Values and Vital Signs • Medication Orders • TraceMaster (ECGs)
BioVU Sample Management RTS SmaRTStore
Validation in BioVU • Sample handling algorithms • Gender match • 1/384 gender mismatches • Ancestry • Characterize sample ancestry, assess usefulness of ‘race’ as defined in EMR • Provide a panel of ancestry informative markers that define ancestry • No significant difference between the concordance of self-report or observer-report with genetic ancestry • Demonstration project – American Journal of Human Genetics • Can known associations between genetic variants and common diseases be identified in the EMR?
The “demonstration project” Genotype “high-value” SNPs in the first 8,000 samples accrued. including SNPs associated by replicated genome-wide experiments with common diseases & traits Atrial fibrillation Crohn’s disease Multiple Sclerosis Rheumatoid arthritis Type II Diabetes Develop Natural Language Processing methods to identify cases and controls Are genotype-phenotype relations replicated?
First results gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio
First results gene / disease marker region rs2200733 Chr. 4q25 Atrial fibrillation rs10033464 Chr. 4q25 rs11805303 IL23R rs17234657 Chr. 5 Crohn's disease rs1000113 Chr. 5 rs17221417 NOD2 rs2542151 PTPN22 rs3135388 DRB1*1501 Multiple sclerosis rs2104286 IL2RA rs6897932 IL7RA rs6457617 Chr. 6 Rheumatoid arthritis rs6679677 RSBN1 rs2476601 PTPN22 rs4506565 TCF7L2 rs12255372 TCF7L2 rs12243326 TCF7L2 rs10811661 CDKN2B Type 2 diabetes rs8050136 FTO rs5219 KCNJ11 rs5215 KCNJ11 rs4402960 IGF2BP2 0.5 1.0 5.0 2.0 Odds Ratio
Types of projects • Discovery or validation of genotype-phenotype relations for disease susceptibility or drug responses • Discovery of new disease/susceptibility genes resequence in patients (obesity, Cushing's, susceptibility to infection, insomnia, pre-term birth) • Access samples without disease X, or “normals” of specified ancestry, or old normals • Phenome-wide association study (PheWAS): in development
Retrospective chart reviews Hypothesis generation Rapid preliminary data for grant submissions Feasibility assessment Research Use Cases
Examples of ICD-9 codes for rare diseases
cases controls Investigator query Data use agreement + IRB Approval +
cases controls Investigator query Data use agreement + IRB Approval Manual Review +
cases cases controls controls + Investigator query Data use agreement + IRB Approval Sample retrieval +
cases cases controls controls + Genotyping, genotype-phenotype relations Investigator query Data use agreement + IRB Approval Sample retrieval +
Nationally Prevalent Diseases in the African American Population