190 likes | 376 Views
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network. Learning from multiple data sources ; Learning causality in Motifs ; Learning GRN with feedback loops ;. Learning from multiple data sources.
E N D
Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning GRN with feedback loops;
Learning from multiple data sources • We have gene expression data and topological ordering information; • Incorporating some other data sources as prior knowledge for the learning; • Transcription factor binding location data; • … Example: Partial regulatory network recovered using expression data and location data.
Learning Causality in Motifs • They be used to assemble a transcriptional regulatory network. • Network motifs are the simplest units of network architecture.
Learning GRN with feedback loops (Con’dProtein-Protein Interactions
Future work and Possible Project Topics in protein interaction • Learning from multiple data sources; • Disease related protein-protein interactions; • Learning from different species;
Learning from Multiple data sources • Gene Neighbor: identifies protein pair encoded in close proximity across multiple genomes. • Rosetta Stone • Phylogenetic Profile • Gene Clustering: • closely spaced genes, and assigns a probability P of observing a particular gap distance
Disease related protein-protein interactions; Disease Related??? -- Query NCBI OMIM Database
Projects for BioQA • Learning • Given a set of relevant abstracts, what kind of features can we obtain to enhance our queries? • Given a set of questions from users, how can we identify keywords from the questions to form queries? • Answer Presentation • Given a relevant abstract/article, • how can we retrieve the relevant passage with respect to the user’s question? • how to extract answers?
Projects for BioQA • Automatic Extraction • Extract relations of gene-disease, gene-biological process (also their corresponding organisms) • Uniquely identify the genes • A gene symbol can be associated with multiple gene identifiers. Which gene identifier is the right one? • Can these extraction processes be generalized? • Sortal Resolution • Given an abstract and query, perform sortal resolution (but not on pronouns) • Example: • Given the following abstract: • “In this report, we show that virus infection of cells results in a dramatic hyperacetylation of histones H3 and H4 that is localized to the IFN-beta promoter. … Thus, coactivator-mediated localized hyperacetylation of histones may play a crucial role in inducible gene expression. [PMID: 10024886] • and the query about histones, perform resolution on histones • Results: histones refer to H3, H4.
Projects for BioQA • Semantics of Words • Dealing with the semantics of words to improve the retrieval of answers • Example: semantic relation between “role” and “play” • Gene symbol variants, disambiguate gene symbols, entity recognition • Generate gene symbol synonyms and variants given a gene symbol in a query • Example: variants of “CDC28” can be written as “Cdc28”, “Cdc28p”, “cdc-28” • “GSS” is a synonym of “PRNP”, but “GSS” itself is also a gene which is unrelated to “PRNP”. • Improve on recognition of diseases, biological processes • Extension of Ontology • To capture biological processes and their possible relations to diseases • Examples: • learning and/or memory can influence Alzheimer’s disease • Degradation of ubiquitin cycle can cause extra long/short half-life of genes • Extra long/short half-life of genes can cause cancer
Build an Ontology • Build an ontology for a domain for which we do not have an ontology yet. • Verify its consistency.
Various kinds of text extraction systems • TREC suggested ones • Which method/protocol is used in which experiment/procedure • Gene – disease – role • Gene – biological process – role • Gene – mutation type – biological impact • Gene – interaction – gene – function – organ • Gene – interaction – gene – disease – organ • Protein Lounge inspired • Kinase-phosphatase • transcription factor • peptide antigen
Drug classification in Pharmacogenetics Experimental Data available • Drug response on cell lines; gene expression data; gene copy data; mutation analysis data; RNAi data Data from literature • Mutation data (Sanger lab); NCI-60 drug response data; Mutation analysis data; Pathway data (e.g. BIND); Gene Ontology • Proprietary data • Where does the drug physically interact? (600 Kinase – IC 50) • Gene expression data of patients after treatments Goal: • Given a patient, what kinds of data do we need in order to determine if a drug should be applicable to that patient or not? How do we develop a classifier using these kinds of data? • Find gene and protein interaction network (or components) using these data.