100 likes | 301 Views
Selecting SNPs in high-throughput genotyping projects. JoaquÃn Dopazo Centro de Investigación PrÃncipe Felipe Valencia http://bioinfo.cipf.es http://www.pupasnp.org http ://pupasview.bioinfo.cipf.es. Design and implementation of a pipeline for high-throughput genotyping at the CeGen.
E N D
Selecting SNPs in high-throughput genotyping projects Joaquín Dopazo Centro de Investigación Príncipe Felipe Valencia http://bioinfo.cipf.es http://www.pupasnp.org http://pupasview.bioinfo.cipf.es
Design and implementation of a pipeline for high-throughput genotyping at the CeGen. (CeGen’s pilot project) Problem 1: feed the monster. E.g. Illumina: 150.000 genotipes at a time Problem 3: query the database... Problem 2: store results... Computer-aided selection. PupaSNP and PupasView Conde et al. 2004, 2005, NAR Cancer SNPs DB server Experimental design (linkage, pathway, etc) ...and submit to analysis programs ...along with clinical data October 2004: 45.000 SNPs designed LD, Case-control, haplotypes, ODD ratios, etc.
Genotyping using SNPs • Simplest and most frequent type of DNA variation in humans. (Collins et al., Nature, 2003) • Valuable as genetic markers due to their widespread distribution. (Risch, Nature, 2000) • Linkage analysis succeed in identifying genes responsible for mendelian traits (diseases, susceptibility to drugs, etc.)(Risch, Nature, 2000) • In complex multifactorial diseases, linkage analysis has not been so successful due to the weakness of the associations. (Hugot et al., Nature, 2001) • SOLUTIONS: • Increasing sample size (obvious, although usually not possible at the required level) • Functional SNPs will be an important factor for increasing the sensitivity of association tests. (Hugot et al., Nature, 2001) • How to predict functionality?
SF2/ASF SC35 SRp40 SRp55 SNPs that can affect transcription and/or gene products Triplex-forming sequences (Goñi et al. Nucleic Acids Res.32:354-60,2004) Human-mouse conserved regions Amino acid change Splicing inhibitors Transfac TFBSs (Wingender et al., Nucleic Acids Res., 2000) Intron/exon junctions ESE (exonic splicing enhancers) motifs recognized by SR proteins (Cartegni et al., Nature Rev. Genet., 2002) (Cartegni et al., Nucleic Acids Res., 2003)
For some platforms Pipeline of Analysis Objective: Linkage, association, ... Analysis of the results Selection of SNPs Genotyping Problem: high-throughput selection of SNPs. First step: focusing on lists of genes. In silico search of SNPs with potential phenotypic effect – PupaSNPFinder. Second step: focusing on genes. Filter SNPs using information on functionality, frequencies and LD – PupasView.
First step: High throughput search for putative functional SNPs -PupaSNP for... linkage or association http://www.pupasnp.org (Conde et al., Nucleic Acids Res., 2004)
Genes are linked to the Ensembl Genome Browser. InterPro domains are also linked to Ensembl. Accession numbers, peptide location and peptide variation of coding SNPs. NCBI accession numbers of dbSNP are linked to frequency data if available. PupaSNP – Results
Functional Frequencies LDs Final selection PupasView: sequential filtering of SNPs using information Information is applied as sequential filters (Conde et al., Nucleic Acids Res., 2005)
The database Intreface to add/edit clinical data Query the database Submit data to analysis programs Cancer SNP DB: one of the pilot projects of CeGen