340 likes | 437 Views
TB Data Visualization and correlations in TB Patient Networks. Outline. 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs. Outline. 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs. 1. Spoligoforests.
E N D
TB Data Visualization and correlations in TB Patient Networks
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs
1. Spoligoforests The 3-step algorithm to decide the deletion events in the spoligoforest uses two assumptions: a) Hidden Parent Assumption: Each spoligotype loses one or more contiguous spacer in a deletion event. b) Single Inheritance: Each spoligotype mutates from one spoligotype.
Child node and its possible parents • Hidden Parent Assumption assigns possible parents to a child node. Each node represents a spoligotype in a spoligoforest. • Before applying Single Inheritance, each node has multiple parents, which means that there are multiple sources of mutation which results in the spoligotype of the child node. • We find the unique and most likely source of mutation by Single Inheritance.
MAKESPOLIGOFOREST ALGORITHM HPA MiruHamming SpolHamming MiruL2 RandomPick
CDC DATA
East African Indian Indo Oceanic M. africanum Euro-American M. bovis East Asian
Genetic Diversity of TB in NYC NYC Isolates
Tanaka’s Model • Unambiguous edges (mutations, deletions): After applying Hidden Parent Assumption, some nodes in the spoligoforest have exactly one parent node. So, there is no need to apply Single Inheritance rule. • Tanaka et al. found out that Length of deletion frequency of unambiguous edges follows Zipf distribution.
Tanaka’s Model: Use of Zipf distribution and Single Inheritance • After assigning edge weights to all possible deletions according to this model, Tanaka’ s model pick the unique parent by choosing the deletion with maximum weight.
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs
2. Correlations in Spoligoforests Outdegree distribution vs. Outdegree: Follows Zipf distribution. Zipf Distribution: Preferential Attachment. Rich-gets-richer model. Outdegree of a spoligotype in the spoligoforest: The number of spoligotypes this spoligotype can mutate into by a deletion event.
2. Correlations in Spoligoforests Length of frequency distribution vs. Length of Frequency: Follows Zipf Distribution Zipf Distribution: Preferential Attachment. Rich-gets-richer model. We take all edges in the spoligoforest into account, compared to unambiguous edges only approach in Tanaka’s model.
Outline 1. Spoligoforests 2. Correlations in Spoligoforests 3. Patient graphs
Patient Graphs – NYC Data 4984 Patients 137 Countries 793 Spoligotypes 2648 RFLPs 3235 Distinct Genotypes 594 “Named” Clusters
Patient Graphs – Questions Is there a Patient-Pathogen trend that TB transmission follows? Is the demographic distribution of the patients infected by the bacteria of same genotype uneven? How can we fit a TB transmission and mutation model, given that the environment, such as the location on the world map, affects the transmission of TB?
Named clusters of interest: Cluster 3 Spoligotype: S00030 RFLP: C(3) 166 patients Euro-American
Named clusters of interest: Cluster 33 Spoligotype: S00034 RFLP: W(18) 21 patients East Asian W-Beijing
Named clusters of interest: Cluster 4 Spoligotype: S00009 RFLP: H(2) 99 patients Euro-American
Named clusters of interest: Cluster 29 Spoligotype: S00034 RFLP: N3(13) 21 patients East Asian
Questions Does the high transmission rate in an area increase the likelihood of mutation? How do MIRUs mutate? Is there a pattern of deletion events or an assumption such as Hidden Parent Assumption for 12-bit MIRU? Can we map the patterns of mutation events in SNPs of MIRU to 12-bit MIRU?