1 / 24

LS-SNP: Large-scale annotation of coding non-synonymous SNPs based on multiple information sources

-Bioinformatics April 2005. LS-SNP: Large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Motivation. Over 9 million snps in dbsnp with little functional annotation nsSNPs are critical importance for disease and drug sensitivity

adie
Download Presentation

LS-SNP: Large-scale annotation of coding non-synonymous SNPs based on multiple information sources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. -Bioinformatics April 2005 LS-SNP: Large-scale annotation of coding non-synonymous SNPs based on multiple information sources

  2. Motivation • Over 9 million snps in dbsnp with little functional annotation • nsSNPs are critical importance for disease and drug sensitivity • Prediction of functional snps enables targetting of snps to be genotyped in candidate gene studies • help identify causative snp within snps that are in ld

  3. Aims • Identify candidate functional SNPs in • Gene • Haplotype • pathway • Map nsSNPs onto protein sequences, functional pathways, comparative structure models

  4. Predictions of snp function • Predict positions where nsSNPs • rule based: • destabilize proteins, • interfere with formations of domain-domain interfaces • protein-ligand binding • supervised learning (svm): • severely affect human health

  5. Methods - pipeline • SNP-protein mapping • Sequence to structure (exp derived) • genomic seq, protein seq, protein structure • SNP prediction annotations combine: • rule based • supervised learning (svm)

  6. SNP Annotations-rule based • destabilizing (Sunyaev, et al., 2001) if: • RSA (rel solv access)< 25% and diff in accessible surface propensities (knowledge based hydrophobic potentials) > 0.75 • RSA>50% and diff in accessible surface propensities > 2 • RSA<25% and charge change • variant involves a proline ina helix

  7. rule based (cont.) • Interference with domain-domain if: • any of 4 rules combined and • within <=6A of an atom in an adjacent domain • effect protein-ligand binding is predicted • any of 4 rules combined and • ligand-binding if <=5A of a HETATM • (not covalently bonded to the protein, not one of the 20 aa nor in a water mol)

  8. (measure of strain) SNP Annotations-supervised learning (svm) (chemical similarity) • train svm to discriminate between mongenic disease nsSNPs from OMIM and neutral snps from dbSNP

  9. svm – training dataset • 1457 disease-associated • VARIANTS in SWISS and OMIM • 2504 neutral • neutral VARIANTS according to rules 1-4 • 3-fold cross validation • train on subset 1 and 2 test on 3 • repeated 10 times

  10. svm – training dataset • the absolute values gives confidence • exclude low confidence predictions • accuracy of 80.5%(+-0.3%) • false pos 19.7%(+-0.2%) • false neg 18.7%(+-0.8%) • 122 rejected on low confidence

  11. Results-mapping • snp to protein mapping • 28,043 (21,255 dbSNP) validated coding nSNPs • 70,147 (54,048 dbSNP) incl non validated

  12. Results-structure • 13,391(53%) proteins have modelled domains with equivalent residues • 13,062 (19%) nsSNPs (all) • 8725 (31%) nsSNPs (validated) • 67 nsSNPs appear in more than one protein (alt splicing)

  13. Results -function • 1886 destablizing nsSNPs (structural rules (1-4)) • 1317 monogenic disease-associated nsSNPs by svm • comparative models • conservation • sub properties

  14. Web resourcehttp://alto.compbio.ucsf.edu/LS-SNP/ • SCOP • swissprot • KEGG • UCSC • PDBSUM • MODBASE KEGG pathway,snp id(rs),hugo, swissprot filter

  15. genomic seq protein seq

  16. structure

  17. snp prediction annotations

  18. Discussion-data quality • validated/non validated snps? • multiple independent submissions • submitter confirmation • alleles observed in at least 2 chr • submision to hapmap • report non val and val snps with option to filter

  19. Discussion -ligands • local structural env of each snp-ligand cannot be evaluated by the pipeline • all contacts reported • some will not be biologically interesting • eg snp in proximity of glycerol will have no functional effect • but, in glycerolkinase, the snp could be important

  20. Discussion -structural annotations • ModSNP 4109 str annotations. 70% sequence identity cutoff • LS-SNP 13,062 dbSNP rsIDs (4907 validated) str annotations. No sequence identity cutoff- • instead, score given (0-1) based on seq identity and model assessment (avg identity ~28%)

  21. Discussion -structural annotations • ‘…because structure annotations are models, use properties that depend on correct fold assignment and a good target template alignments opposed to atomic-level structural details such as loss of either salt bridges or hydrogen or disulphide bonds.’

  22. Discussion -structural annotations • not possible to model effects such as changes in backbone geometry • or small side chain alterations

  23. Case study-Glutathione S-Transferase • GSTs play key role in cellular detoxification • domain interface • buried charge change • unfavourable change in accessible surface potential at buried postion • conserved in mouse, rat,chicken • combination of info sources build convincing case

  24. Caveats • only updated twice a year • dependant on structure (comparative modelling) • allowing predictions without structure data would have increased numbers • no option to add your own snps • no idea as to which predictors are best • combinations of predictors • domain-domain or ligand binding but no indication of how damaging this might be • next version will have hapmap snps • svm – monogenic • only chose small, subset of Sunyaevs rules - conservation?

More Related