110 likes | 250 Views
Welcome to BCB4003/CS4803 BCB503/CS583 Biological and Biomedical Database Mining. Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI. Why this course?. Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays.
E N D
Welcome toBCB4003/CS4803BCB503/CS583 Biological and Biomedical Database Mining Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI
Why this course? Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays Proteome 1990’s-2000’s Protein structure, protein-protein interactions, protein pathways Genome 1980’s-1990’s Sequencing, sequence analysis, … Biological and Biomedical Research Problems Applications 2000’s Organism-organism interactions Organism-environment interactions Genome-wide association studies Cancer therapies Drug development Biological Function 2000’s Central dogma: DNA (trascription) RNA (translation) Protein
This all has generated … • Data • Massive datasets and databases of sequence, gene, gene expression, protein, biological function, clinical information, … • Text • Annotations in data sources, abstracts (e.g., Medline), research articles, medical literature(e.g., PubMed, NCBI Bookshelf, Google Scholar), patients records, … • Ontologies • Description of terms and their relationship • (e.g., Gene Ontology)
Current challenges • To make sense of and put to use all this information. • How? Computational tools and techniques are needed to help humans in integrating, summarizing, understanding, and taking advantage of accumulated information • Data mining • Text mining • Data and text mining together
What is Data [text] Mining?or more generally, Knowledge Discovery in Databases (KDD) “Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [text]” (Fayyad et al., 1996) • Raw Data [Text] Data [Text] Mining • Patterns • Analytical Patterns (rules, decision trees) • Statistical Patterns (data distribution) • Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.
Data mining Methods in bioinformatics • Clustering • Sequence Mining • Bayesian Methods • Expectation Maximization (EM) • Gibbs Sampling • Hidden Markov Models • Kernel methods • Support Vector Machines
Text mining in bioinformatics • Document indexing • Information retrieval • Lexical analysis (Sentence tokenization, Word tokenization, Stemming, Stop word removal) • Semantic analysis • Query processing • Text classification • Text clustering • Text summarization • (Semi-) Automatic curationof literature repositories • Knowledge discovery from text, hypothesis generation
data analysis • data mining • analytical • statistical • visual cleaned data models • data “pre”- • processing • noisy/missing data • feature selection information sources • data • management • databases • data warehouses • model/pattern • evaluation • quantitative • qualitative data “good” model • model/patterns • deployment • prediction • decision support new data Data/text mining Process (KDD)
Putting all together … • Data / Text / Information Integration • Mining over data and text combined • Visualization • Other real-world issues • Developing tools and techniques that are efficient, scalable, and user friendly
Interdisciplinarytechniques come from multiple fields • Natural Language Processing (AI) Computational Linguistics • Contributes text analysis techniques • Databases • Contributes efficient data storage, data cleansing, and data access techniques • Data Visualization • Contributes visual data displays and data exploration • High Performance Comp. • Contributes techniques to efficiently handling complexity • Signal processing • Image Processing … • Biology and Biomedicine • Contributes domain knowledge • Machine Learning (AI) • Contributes (semi-)automatic induction of empirical laws from observations & experimentation • Statistics • Contributes language, framework, and techniques • Pattern Recognition • Contributes pattern extraction and pattern matching techniques
QUESTIONS? * Images in this presentation were downloaded from Google images