310 likes | 876 Views
FUNCTIONAL GENOMICS FOR HEALTH Charles H.C.M. Buys Dept. of Medical Genetics, University of Groningen (c.h.c.m.buys@medgen.umcg.nl). med.gen . r u g. International Consortium Completes Human Genome Project All Goals Achieved; New Vision for Genome Research Unveiled
E N D
FUNCTIONAL GENOMICS FOR HEALTH Charles H.C.M. Buys Dept. of Medical Genetics, University of Groningen (c.h.c.m.buys@medgen.umcg.nl) med.gen.rug.
International Consortium Completes Human Genome Project All Goals Achieved; New Vision for Genome Research Unveiled BETHESDA, Md., April 14, 2003 - The International Human Genome Sequencing Consortium, led in the United States by the National Human Genome Research Institute (NHGRI) and the Department of Energy (DOE), today announced the successful completion of the Human Genome Project more than two years ahead of schedule. med.gen.rug.
The Human Genome 24,000 protein-coding genes 1,700 genes correlated with disease 44,500 mutations in these 1700 genes med.gen.rug.
MLH1 MLH3/PMS1 MLH1 PMS2 The human MMR system MSH2 MSH3 MSH2 MSH6 med.gen.rug.
DGGE analysis of hMSH2 med.gen.rug.
- mutation de novo - segregation with disease within pedigrees - absence in control individuals - change of aa polarity or size - aa change in evolutionarily conserved domain or shared domain within protein family - effect in functional in vitro system or in animal model med.gen.rug.
Beta-Galactosidase Expression of Two-Hybrid clones Activating domain DNA binding domian med.gen.rug.
genes 3p21.3 RBM6 RBM5 GNAT1 SEMA3F SLC38A3/G17 GNAI2 SEMA3B IFRD2 HYAL3 FUS2 HYAL1 HYAL2 FUS1 RASSF1 BLU NPRL2 101F6 PL6 CACNA2D2 NCI-H1450 GLC20 NCI-H740 med.gen.rug.
3p21 deletion in GLC20 FUS2 IFRD2 CACNA2D2 GNAI2 101F6 BLU/ZMY|ND10 RBM6 FUS1/PDAP2 HYAL2 HYAL1 G17/ SLC38A3 GNAT1 Genes Sema3F Sema3B RBM5 RASSF1 PL6 HYAL3 NPR2L TEL. CEN. PAC-4 PAC-1 PAC-7 PAC-2 PAC-8 PAC-3 PAC-5 PAC-6 med.gen.rug.
Transfection: PAC-1 + Tumour cell line clone PAC-2 + med.gen.rug.
Test of PAC integration: med.gen.rug.
Copy no. determination GLC45 Cl one 1 Clone 2 Cl one 3 Cl one 4 GLC45 Cl one 1 Clone 2 Cl one 3 Cl one 4 med.gen.rug.
Test of transfectants: • Find marker differences between parental cell line and PAC • Test transfectants for presence of PAC markers • Test increase of expression of transfected genes • Find UTR differences between parental and PAC genes • Test presence and expression of transfected genes med.gen.rug.
Tumourigenicity test: PAC-1 + Clone 1 PAC-1 Clone 1 PAC-2 Tumour cell line clone PAC-2 + med.gen.rug.
med.gen.rug. Roberts (2004) The Scientist 27 Sep, 22-24
GENE THERAPY Vectors: viral, nonviral Host cells: tissue biology, target specificity Problems: - low proportion of transduced cells - rapid loss of gene expression - risk of oncogenic transformation by insertional mutagenesis - antiviral or inflammatory responses Gene therapy still years away from the clinic med.gen.rug.
med.gen.rug. Lieberman et al (2003) Trends Mol Med 9: 397-403
RNAi BREAKTHROUGH OF THE YEAR 2002 (Science) gene knockouts/knockdowns for any cell type provide insights into gene functions and biological processes highly selective and highly potent action potentially enormous therapeutic applications (cancer, infectious diseases, neurodegenerative diseases) but hampered by the delivery problem - unmodified RNA rapidly degraded or excreted - expression vectors cf gene therapy - saturation of RNA-induced silencing complexes by excess of exogenous RNA and interference of normal cell functions? med.gen.rug.
In most studies in experimental biology a small number of variables is analysed and measurements are repeated so many times that statistical tests can discriminate between biological significance or random noise as the cause of the experimental results In microarray experiments thousands of variables are analysed but the high costs of the arrays don’t allow for more than a very low number of repeated measurements med.gen.rug.
med.gen.rug. Van ‘t Veer et al (2002) Nature 415: 530-536
med.gen.rug. Van ‘t Veer et al (2002) Nature 415: 530-536
Van ‘t Veer et al (2002): Identification of a gene expression signature based on 70 genes capable of predicting disease outcome: - Correlation between each gene’s expression and disease outcome measured for a randomly selected “training set” of 78 patients. - Genes ranked according to the correlation. - 70 most-correlated used to construct a classifier discriminating between patients with good and with poor prognosis. - 19 remaining patients served as “validation set” med.gen.rug.
Ein-Dor et al (2004) Bioinformatics 21: 171-178 - selected same 77/78 patients from the van ‘t Veer study and ranked all genes according to correlation with survival (5852 genes) - built series of classifiers based on consecutive groups of 70 genes - 7 other sets produced classifiers with same prognostic capabilities as those based on the top 70 genes - procedure repeated for 1,000 different compositions of training sets of 77 samples and test sets of 19 samples Classifiers based on very-low-ranked genes capable of predicting survival with quality similar to the high-ranking ones many sets of 70 genes can be used to predict survival med.gen.rug.
Discovery-based research Analysis -without a hypothesis- of large quantities of data to search for patterns that discriminate among groups of individuals with different diagnosis, prognosis or response to therapy Objective: to obtain a statistically reliable clustering of data (e.g. gene expression) corresponding with the phenotype of interest Problem: excess of datapoints per individual over the number of individuals. This easily leads to overfitting: random correspondence of clustered data with phenotype of interest med.gen.rug.
Solution for overfitting Apply pattern-recognition model derived in a “training set” to an independent “validation set” of individuals not used in the “training set” Ransohoff (2004) Nat Rev Cancer 4: 309-314 med.gen.rug.
Wang et al (2005) Lancet 365: 671-679 med.gen.rug.
Wang et al (2005) patients with lymph-node-negative breast cancer clinical outcome: 5 yrs metastasis-free survival vs metastasis within 5 yrs features on array: 22,000 pts in training set: 115 pts in validation set: 171 Many genes may correlate with survival Many gene combinations (even from some data set) can produce signatures with similar predictive power (Ein-Dor et al., 2004) Close to complete lack of agreement when comparing signature list with that of van ‘t Veer et al (2002) Much larger no. of observations needed for validation and creating predictive signatures on which consensus can be reached (Jensen and Hovig (2005) Lancet 365:634-635) med.gen.rug.
Suggested protocol to discriminate between overfitting and biological significance: - randomise phenotypes over arrays - derive signature - check predictive value If similar predictive value overfitting likely Alternative: - restrict number of genes by first selecting them on biological function - less overfitting since less excess of features over individuals med.gen.rug.
? Could a contribution from the RF with its strong tradition in the mathematical sciences be: formulating new statistical theories on the reliability of higher order clustering to be empirically tested in simulations Create links between Russian mathematicians and molecular biologists and teach students in mathematics the essentials of molecular biology med.gen.rug.
Acknowledgements Prof. Gerard J. te Meerman, statistical genetics Prof. Robert M.W. Hofstra, molecular developmental genetics med.gen.rug.