200 likes | 209 Views
This study explores the relevance of a machine learning approach in classifying metabolites associated with chronic obstructive pulmonary disease (COPD) and asthma. The aim is to differentiate the underlying cause of symptoms, leading to better treatment protocols. The study analyzes the metabolomics data using linear regression, principle component analysis, and multilayer perceptron algorithms.
E N D
Sri Ramachandra University Department of Bioinformatics Relevance of Machine Learning Approach in Classifying Metabolites Associated in COPD & Asthma Prof. PK.Ragunath
INTRODUCTION • Chronic obstructive pulmonary disease (COPD) and Asthma are the most frequent causes of respiratory ill health. • They cover all ages and several cases of comorbidity between the two conditions have been reported. • Asthma and COPD are different diseases each with a unique natural history and pathophysiology • Differentiating the underlying cause of their symptoms is difficult and often leads to generalized treatment protocols
ASTHMA • Asthma is characterized by airflow obstruction. • 300 million people of all ages and all ethnic background worldwide have Asthma. Globally 2,50,000 people die of Asthma every year(WHO, 2013-14). • India has an estimated 15-20 million asthmatics (ICMR,2012). • It is a condition due to inflammation of the air passages that which obstruct the flow of air in and out of the lungs. Affects the sensitivity of the nerve endings in the airways so they become easily irritated • Characterized by recurrent attacks of breathlessness and wheezing, which vary in severity and frequency. • According to the National Asthma Education and Prevention Program (NAEPP) and the Global Initiative for Asthma -- Asthma is additionally typified by variable and recurring symptoms, bronchial hyperresponsiveness and underlying inflammation of the airways.
COPD • Chronic obstructive pulmonary disease (COPD) is a “preventable and treatable disease with some significant extrapulmonary effects that may contribute to the severity in individual patients. - American Thoracic Society(ATS) • Its pulmonary component is characterized by airflow limitation that is not fully reversible. • The airflow limitation is usually progressive and associated with abnormal inflammatory response of the lung to noxious particles or gasses. • Chronic Obstructive Pulmonary Disease (COPD) affects 210 million people (WHO,2013-14) • Conditions that contribute to COPD : • Mucous hyper secretion with enlargement of tracheo-bronchial sub mucosal glands and a disproportionate increase of mucous acini. • Inflammation of bronchioli, mucous metaplasia and hyperplasia, with increased intralumenal mucus, increased wall muscle, fibrosis and airway stenoses. • Respiratory bronchiolitis is a critically important early lesion which may predispose to the development of centrilobular emphysema. • The severity of destruction of alveolar wall in emphysema appears to be the most important determinant of chronic deterioration of airflow.
Asthma and Chronic Obstructive Pulmonary Disease (COPD) are complex conditions with imprecise definitions • Definitive morphological comparisons difficult. • Broadly - the airways in asthma are occluded by tenacious plugs of exudate and mucus • Fragility of airway surface epithelium, thickening of the reticular layer beneath the epithelial basal lamina and bronchial vessel congestion and edema. • Increased inflammatory infiltrate comprising ‘activated’ lymphocytes and eosinophils with release of granular content in the latter • There is enlargement of bronchial smooth muscle particularly in medium sized bronchi
METABOLOMICS • Metabolomics involves quantitative measurement of time-related multi-parametric metabolic response of living systems to pathophysiological stimuli or change in gene expression profile - Daviss, Bennett (April 2005) • Comprehensive and simultaneous systematic determination of metabolite levels in the metabolome and their changes over time as a consequence of stimuli • Involves quantitative measurement of time-related multi-parametric metabolic response of living systems to pathophysiological stimuli or change in gene expression profile. • Steps in Analysing of metabolomics data : • Efficient and unbiased separation of analytes • Detection • Identification and quantification
LINEAR REGRESSION In correlation, the two variables are treated as equals.In regression, one variable is considered independent (=predictor) variable (X) and the other the dependent (=outcome) variable Y.
PRINCIPLE COMPONENT ANALYSIS • Multivariate analysis based on projection methods • Main tool used in chemometrics • Extract and display the systematic variation in the data • Each Principle Component (PC) is a linear combination of the original data parameters • Each successive PC explains the maximum amount of variance possible, not accounted for by the previous PCs • PCs Orthogonal to each other
MACHINE LEARNING - MULTILAYER PERCEPTRON The Multilayer Perceptron – is a type of machine learning • Used extensively for the solution of a number of different problems - pattern recognition and interpolation The basic Multilayer Perceptron learning algorithm is outlined below. • Initialize the network, with all weights set to random numbers between -1 and +1. • Present the first training pattern, and obtain the output. • Compare the network output with the target output. • Propagate the error backwards.
AIM & OBJECTIVES OF THE STUDY • To build a machine learning approach based model to classify metabolites associated with Asthma and COPD. Objectives : • To enlist the metabolites associated with Asthma & COPD • To generate molecular descriptors for all the chosen metabolite • To perform feature selection using Linear Regression to identify best descriptors • To perform feature extraction using PCA to generate Component Matrices • To build a machine learning approach (Multi Layer Perceptron) models to classify metabolites associated with Asthma and COPD based on both based on input data from Linear Regression and PCA and compare the efficiency of the 2 models.
WORK FLOW Text-mine to identify metabolites associated with COPD & Asthma Generate molecular descriptors for all the chosen metabolites using Descriptor Calculation Wizard of Molegro Virtual Docker Feature Selection using Linear Regression to identify best descriptors Feature Extraction using PCA to generate Component Matrices Generate a model to classify metabolites associated with Asthma & COPD by employing machine learning by Multilayer Perceptron
TEXT MINING A comprehensive literature mining of all eligible studies on IDC gene expression was carried out by searching the PubMed (as on March 2015) based on the following key terms M AND ((“C” OR “c“) AND (H OR h)) M AND (“A” AND (H OR h)) Where, M = Gene expression; C = Chronic Obstructive Pulmonary Disease; c = COPD; A = Asthma; H = Homo sapiens; h = human
Feature Selection Using Linear Regression To Identify Best Descriptors • The CFDM descriptors are obtained by calculating the minimum, maximum, and mean topological distance between all pairs of chemical features. • The topological distance is defined as the smallest number of covalent bonds between the two features.
SCREE PLOT DEPICTING PRINCIPLE COMPONENTS Only the top 7 principle components which had Eigen value of > 1 were selected
Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on features selected by Linear Regression
Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on Feature Extraction by PCA ROC plot of sensitivity versus Specificity
Insights From The Result • The Multi Layer Perceptron model to classify metabolites associated with Asthma and COPD based on Feature Extraction by PCA showed a greater efficiency of >90 % in comparison to the model based on feature selection by linear regression which showed a efficiency of ~87% • Scope : Similar metabolite classifying models can be built using Radial Basis Function and it’s efficiency can be compared with the current model. • A model for classifying metabolites associated with comorbid conditions can be attempted in future
Acknowledgement • Our sincere thanks to Molegro Virtual Docker (MVD) for providing a limited period trial version with which Molecu.es descriptors were calculated • We thank Dr. Baljit Ubhi, Ph.D., Global Technical Marketing Manager Metabolomics & Lipidomics for providing critical Insights on applications of metabolomics in COPD without which this study would have been impossible.
Sri Ramachandra University My Research Team Thank You