10 likes | 130 Views
Computational Methods for Identifying and Classifying Tuberculosis Isolates C. Ozcaglar 1 , B. Yener 1 , K. P. Bennett 12 (1) Computer Science Dept. and (2) Mathematical Sciences Dept., Rensselaer Polytechnic Institute, Troy, NY, 12180. Abstract. Methods. MakeSpoligoforest Algorithm.
E N D
Computational Methods for Identifying and Classifying Tuberculosis Isolates C. Ozcaglar1, B. Yener1, K. P. Bennett12 (1) Computer Science Dept. and (2) Mathematical Sciences Dept., Rensselaer Polytechnic Institute, Troy, NY, 12180 Abstract Methods MakeSpoligoforest Algorithm Correlations • Tuberculosis (TB) genotyping is now a routine part of TB control and surveillance programs in the United States. • Methods for genotyping of clinical Mycobacterium Tuberculosis Complex (MTC) strains have proven to be valuable tools for TB control. At the individual clinical management level, the application of genotyping enables the detection or exclusion of laboratory errors and the follow-up relapse cases to identify treatment failures, reactivations of latent disease, and exogenous reinfections. At the public health level, genotyping enables the detection of unsuspected outbreaks and the identification of transmission chains and secondary cases of infection. • The use of combinations of multiple phylogenetically informative markers is the best approach to the identification of strain lineages, because of the highly clonal nature M. Tuberculosis. • In this work, we present and evaluate a novel combination of genotyping methods, Spoligotyping and MIRU-VNTR typing, to be used for MTC strain lineage identification. Results • The Spoligoforest of CDC 2008 TB patient data produced by MakeSpoligoforest() is displayed in the figure below. • There are important correlations in Spoligoforest of TB patient data. The outdegree distribution of nodes in the spoligoforest follows Power Law. Moreover, the outdegree distribution of nodes for each CDC family in the spoligoforest follows Power Law. • There is an exponential distribution between Copy Number and Average Children Count of nodes, with a number of outliers. Background Conclusion • The combination of Spoligotyping and MIRU-typing clusters the TB patients into CDC families accurately. This gives a motivation to look for an underlying mathematical model including both Spoligotype and MIRU information of isolates. • Outdegree distribution of nodes in the spoligoforest of TB patient data follows Power Law. Spoligotype: Spacer Oligonucleotide Type. A binary sequence of length 43, representing the presence or absence of spacers. MIRU: Mycobacterial interspersed repetitive units. MTC: Mycobacterium Tuberculosis Complex. Copy Number: Total number of present spacers in a spoligotype. Spoligoforest: Forest of disjoints trees, where each node represents a distinct spoligotype. Hidden Parent Assumption: Spoligotypes evolve by the deletion of a single or multiple contiguous spacers. Lineage: A sequence of species that form a line of descent. References