1 / 31

Jayne Duncan FRCPath Course 2010

Look at the Emerging Technologies and consider their Application in the Diagnostic Environment Bioinformatic Tools. Jayne Duncan FRCPath Course 2010. Keywords. Bioinformatics Variants of unknown clinical significance Non-synonymous Missense variants Splicing variants

umed
Download Presentation

Jayne Duncan FRCPath Course 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Look at the Emerging Technologies and consider their Application in the Diagnostic EnvironmentBioinformatic Tools Jayne Duncan FRCPath Course 2010

  2. Keywords • Bioinformatics • Variants of unknown clinical significance • Non-synonymous Missense variants • Splicing variants • Guidelines

  3. Bioinformatic Tools • Bioinformatic tools use computers and statistical techniques to analyse biological data. They are used in the diagnostic lab for interpretation of`; • non-synonymous missense mutations of unknown clinical significance • splice variants that lie out with the canonical splice acceptor and donor sites • Their use in the laboratory has increased over the last few years as a result of: -Increased scale and sensitivity of genetic analysis -Increased use of Sanger sequencing to screen candidate genes to diagnose single gene disorders. • With the advent of next generation sequence technologies and the ability to sequence the entire human genome their use in the diagnostic lab will increase even more in the future.

  4. Establishing Guidelines • Essential to have a set of agreed guidelines; • to assist in the determination of the clinical significance of variants identified in routine screening. • to educate referring clinicians so that they may inform their patients and families appropriately. • Guidelines ratified by the CMGS and the Dutch Society of Clinical Genetic Laboratory Specialists were drawn up in 2008. • Applicable to “the interpretation and reporting of sequence variants of uncertain pathogenicity in genes known to cause inherited Mendelian disease in which molecular genetic testing has a proven clinical validity and utility”.

  5. Proposed Guidelines for Hereditary Breast Cancer • Prior to the CMGS guidelines Vink et al 2005 proposed guidelines for the interpretation of variants of unknown clinical significance (UVs) in hereditary Breast Cancer. • All variants for which pathogenicity is not demonstrated or excluded in peer-reviewed published literature, in a mutation database, or on the basis of own findings are classified as UVs. • Patients are informed by the genetic counsellor of the possibility of finding an UV prior to mutation screening. • The diagnostic lab reports a detected UV to the requesting counsellor, who in turn reports it to the patient. • The uncertainties surrounding the pathogenicity of the UV are discussed, as is the possibility of classification of the UV after further research. An explanation that this may involve co-operation of the patients and their relatives should also be given. • Presymptomatic testing of family members is not offered. Surveillance is offered on the basis of the family history. If this fits a hereditary breast cancer syndrome, surveillance is offered as in families with a BRCA1/2 mutation. • Patients can request prophylactic surgery, but the decision to perform it should be based on the family history and not influenced by the detection of the UV.

  6. Proposed Guidelines continued • Understanding the clinical significance of UVs requires a multidisciplinary approach involving: • Protein function studies • Evolutionary gene sequence conservation • Linkage analysis • Population genetics studies • Other important studies for clarification of individual variants include: co-segregation analysis, RNA analysis and LOH analysis in tumour tissue. • Incorporating this data into a public resource will allow increased consistency in the reporting of UVs, clarification of cancer risk. • Leading to patients receiving more balanced information about their cancer risk.

  7. Interpretation of non-synonymous missense variants • Non-synonymous variants are single base pair substitutions in the DNA that alter the amino acid in the resulting protein. • CMGS guidelines recommend the use of specific tools for the interpretation of such variants. These include: • Polyphen • SIFT • Align-GVGD • Other available sites include: PMut, SNP3D and Panther (output is in the form of a probability, no need for alignments)

  8. Polyphen • Polyphen (Polymorphism phenotyping) is a freely available web based tool • Considers evolutionary conservation, through multiple sequence alignment, physiochemical differences and the proximity of the substitution to predicted functional domains and/or structural features. • It specifically uses annotated UniProt entries to predict whether the amino acid substitution occurs within an important structural or functional site for example, active or binding sites and residues involved in disulphide formation. • Predictive value (accuracy in correctly calling pathogenic mutations) is reliant on the protein of interest having a known annotated crystal structure, or the presence of a similar modelled protein in the UniProt database. • Its scores can be classified as probably damaging (≥2.00), possibly damaging (1.50-1.99), potentially damaging (1.25-1.49), borderline (1.00-1.24) or benign (0.00-0.99). • Recently the Polyphen-2 algorithm has replaced Polyphen. According to Adzubei et al 2010 Polyphen-2 differs from the original Polyphen in the set of predictive features, the alignment pipeline and the method of classification. Although like the original Polyphen the user is unable to input their own alignment into the Polyphen software.

  9. Case studies supporting the use of Polyphen • Lee et al 2007 sequenced the BRCA1 and 2 genes in 1469 population based female breast cancer patients diagnosed between 20 and 49 years of age. • 147 UVs were detected and classified as high risk or low risk based on 5 methods. • Polyphen algorithm, sequence conservation, Grantham matrix scores and a combination of Grantham matrix score and sequence conservation. • Also examined whether women with high risk UVs have characteristics similar to those with known deleterious mutations (e.g. early age at diagnosis, family history and negative oestrogen/progesterone receptor tumours) • All 5 classification methods yielded similar results. However Polyphen was better at isolating BRCA1 UV carriers likely to have a family history of breast or ovarian cancer and may help classify BRCA1 variants

  10. Overview of Polyphen

  11. Polyphen: A Worked Example

  12. SIFT • Sort Intolerant from Tolerant (SIFT) is also a freely available web based tool. • Uses sequence homology of related proteins to predict if an amino acid substitution is likely to be deleterious to protein function based on the degree of conservation of the amino acid through evolution. • Orthologous or Paralogous sequences can be utilised in the evolutionary sequence alignment. • Uses orthologous sequences increases predictive value of SIFT as the encoded proteins will have same function. • SIFT can choose homologous sequences automatically or the user can submit selected pre-aligned sequences to the programme. • An amino acid that is not present at the substitution site in the multiple alignment can still be predicted to be tolerated if there is an amino acid in the alignment that has a similar charge or hydrophobicity. This may not reflect true in vivo situation

  13. Case studies supporting the use of SIFT • Flanagan et al 2010 tested the predictive value of SIFT and PolyPhen on 141 missense variants (131 known pathogenic, of which 66 gain of function and 67 loss of function and 8 known neutral polymorphisms) identified in the ABCC8, GCK and KCNJ11 genes. • SIFT and Polyhen both predicted the pathogenicity of 69.5% of missense variants. When they were used individually this rose to 84%, demonstrating a lack of concordance between programmes. • When results were combined only 56% of variants were called correctly. • Both programmes were better at predicting loss of function mutations rather than gain of function. • Reasons for this are unknown, however it is possible that the substitution of one amino acid for another with a large change in physiochemical properties will cause a loss of function as a result of protein misfolding. Such large amino acid changes are likely to increase the confidence with which SIFT and Polyphen make their predictions. • Gain of function mutations may have a more subtle effect on protein structure, resulting in lower confidence with which the programmes can base their prediction. • Gain of function mutations are also predicted to be less common, therefore the data sets on which predictions are based are likely to be limited • Taken together these two limitations will result in less pronounced change in the parameters of SIFT and Polyphen to predict pathogenicity of gain of function compared to loss of function mutations, resulting in many being classified as benign.

  14. Overview of SIFT

  15. SIFT: A Worked Example

  16. Align GVGD • Align GVGD combines Grantham Variation (GV) (how much evolution variation there is at a given point) and Grantham Distance (GD) (difference between evolutionary amino acid and variant) to give a Grantham score. • Only the most extreme values are classified as most and least likely to interfere with protein function. • Align-GVGD highly dependent on alignment used

  17. Case studies supporting the use of Align-GVGD • Mathe et al 2006 carried out a three step analysis of 1514 missense substitutions in the DNA Binding Domain of TP53, the most frequently mutated gene in human cancers. • Using multiple sequence alignment for each substitution they calculated the GV and the GD. • They then used Align-GVGD to predict the transactivation of each missense substitution. • They compared the predictions against experimentally measured transactivation activity (yeast assays) and predictions made by SIFT to evaluate accuracy. • Predictions showed a high degree of accuracy for mutants showing a loss of transactivation (~88%) with lower prediction accuracy for mutants with a transactivation similar to wild type. • Align-GVGD results were comparable to SIFT and indicate that Align-GVGD can be used as a UV prediction tool.

  18. Align-GVGD: A Worked Example

  19. Interpretation of Nonsynonymous Variants • No Bioinformatic tool is 100% accurate at predicting pathogenicity, • Results should be interpreted with caution and backed up with further functional studies. • Analysis of missense variants should be performed using at least two different programmes, as conflicting results can be generated. • This must taken into account when deciding the likelihood of pathogenicity.

  20. Splice Site Prediction Tools • There are several splice prediction tools commonly used by diagnostic laboratories. • None of these have been fully validated for use in a diagnostic setting and so must be used with caution. • The user is able to adjust the settings on these sites and no information is available on how best to alter these settings. • According to the CMGS best practice guidelines users should use the default settings unless otherwise stated. • Laboratories should be aware that any sequence changes and not just those adjacent to intron/exon boundaries may actually be splice site mutations. • Silent and missense mutations should also be analysed for an effect on splicing, especially when AG or GT dinucleotide sequences are formed.

  21. NGRL have compiled a splice site tools analysis report to assess the performance of a number of tools in the prediction of splicing related variant pathogenicity. The report also assessed the scope of the splice site prediction tools to ensure they could be used in the most appropriate way The analysis allows scientists to use splice site prediction tools in the prediction of pathogenesis with more confidence. Analysis of Splice Site Tools

  22. Tools Chosen for Analysis • Included: GeneSplicer, Human Splice Finder, MaxEntScan, NetGene2, NNSplice and SSFL. • In each algorithmthe splice signal given by the wild type sequence is compared to the splice site signal given by the mutated sequence, supplied by the user. • All methods chosen because they had been recommended for use by UV Guidelines. (MaxEntScan chosen because it is included in the HSF and Alamut splicing interfaces.

  23. Methods • Pathogenic and non-pathogenic splice site related variants retrieved from the literature for analysis. The majority of pathogenic variants were located between 1 and 10 nucleotides from the splice junction. >40 were found within exons and pathogenic variants were also found >100bp from the splice junction. • Only 15 non-pathogenic mutations were found and they mainly occurred at positions further away from the splice site junction. The low number represents the non-reporting of negative results and increases the error rate of the specificity scores. • Default settings and recommended lengths of sequence were used. • Sensitivity (true +ves), specificity (true –ves) and accuracy (true +ves+ true-ves) were measured, as well as the standard errors for each of the statistics • A change in splice site signal of ≥10% was predicted to cause a pathogenic effect. • The UV Guidelines recommend the use of three algorithms to give a consensus prediction. To assign a variant as pathogenic two algorithms had to agree. • To test the range of predictions made by the algorithms at each intronic position near the splice site junction thirteen acceptor and donor splice site junctions from BRCA1 and 2 were analysed. • The wild type base at each position from +1→+10 and -1 →-10 was artificially mutated in silico to each of the remaining three nucleotides and the proportional change in splice signal was recorded for each algorithm

  24. Results • Sensitivity, specificity and accuracy scores showed NNSplice, MaxEntScan , GeneSplicer and SSFL performed the best with between 80 and 92% accuracy and sensitivity. • The removal of variants occurring at +1, +2, -1 and -2 positions reduced the performance of the algorithms, as expected as these always disrupt splice site signalling. • MaxEntScan and NNSplice still achieved an accuracy of >80%. Therefore these algorithms perform reasonably well even with variants where it is more difficult to predict the splicing effect. • The tools were most useful for the prediction of pathogenic and non-pathogenic variants when applied to positions between +3→+7 and -3 to at least -10. At positions further from the splice site no disruption was seen. • The scope of these tools can be defined as the prediction of the disruption of splice sites within these regions. The effect of variants on splice sites further than this cannot be predicted by any of the algorithms. • The tools can however predict new splice sites at other positions. This could occur if the variant caused the sequence surrounding the new splice site to become a closer match to the statistical models used by the tools.

  25. Results- Splice signal Strength

  26. The accuracy obtained by combining results from three algorithms as described in the UV guidelines did not improve the prediction rate of splice site junction variants. However as the Alamut interface performs all four (SSFL, MaxEntScan, NNSplice and Genesplicer) analyses simultaneously it is easy to compare the predictions without a formal consensus method. Results continued

  27. Results-Accuracy

  28. Predicting Splicing Motiffs • Methods such as ESE finder are available to predict splicing enhancer or silencer motifs and branch point motifs • These methods have not been assessed for use in the diagnostic laboratory • The mechanisms by which these motifs regulate splicing are less clearly understood and should only be used with caution. • As with the tools used for predicting the pathogenicity of UVs algorithms alone are not sufficient evidence for a clinical decision. • Results should always be backed up with further evidence.

  29. Adzhubei et al (2010) A method and server for predicting damaging missense mutations. Nature Methods 7 (4) 248-249 Bell et al (2008) CMGS Practice Guidelines for the Interpretation and Reporting of Unclassified Variants in Clinical Molecular Genetics. CMGS website. Flannagen et al (2010) Using SIFT and Polyphen to predict loss of function and gain of function mutations. Genetic Testing and Molecular Biomarkers 14 (4) 533-537. Kumar et al (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols. 4. (8) 1073-82 Lee et al (2007) Evaluation of unclassified variants in the breast cancer susceptibility genes BRCA1 and BRCA2 using five methods: results from a population-based study of young beast cancer patients. Breast Cancer Research. 10. (1) 1-12 Mathe et al (2006) Computational approaches for predicting the biological effect of p53 missense mutations; a comparison of three sequence analysis based methods. Nuc Acids Res 34. (5) 1317-1325 NGRL Splice Site Tools; A Comparative Analysis Report. Beth Hellen 2009 NGRL Best practice guidelines for UVs Vink et al (2005) Unclassified variants in disease-causing genes: non-uniformity of genetic testing and counselling a proposal for guidelines. E.J Hum Genet. 13 525-527 References

More Related