1 / 11

Welcome to BCB4003/CS4803 BCB503/CS583 Biological and Biomedical Database Mining

Welcome to BCB4003/CS4803 BCB503/CS583 Biological and Biomedical Database Mining. Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI. Why this course?. Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays.

Download Presentation

Welcome to BCB4003/CS4803 BCB503/CS583 Biological and Biomedical Database Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome toBCB4003/CS4803BCB503/CS583 Biological and Biomedical Database Mining Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI

  2. Why this course? Transcriptome mid 1990’s-2000’s Gene expression, DNA/RNA microarrays Proteome 1990’s-2000’s Protein structure, protein-protein interactions, protein pathways Genome 1980’s-1990’s Sequencing, sequence analysis, … Biological and Biomedical Research Problems Applications 2000’s Organism-organism interactions Organism-environment interactions Genome-wide association studies Cancer therapies Drug development Biological Function 2000’s Central dogma: DNA  (trascription)  RNA (translation)  Protein

  3. This all has generated … • Data • Massive datasets and databases of sequence, gene, gene expression, protein, biological function, clinical information, … • Text • Annotations in data sources, abstracts (e.g., Medline), research articles, medical literature(e.g., PubMed, NCBI Bookshelf, Google Scholar), patients records, … • Ontologies • Description of terms and their relationship • (e.g., Gene Ontology)

  4. Current challenges • To make sense of and put to use all this information. • How? Computational tools and techniques are needed to help humans in integrating, summarizing, understanding, and taking advantage of accumulated information • Data mining • Text mining • Data and text mining together

  5. What is Data [text] Mining?or more generally, Knowledge Discovery in Databases (KDD) “Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data [text]” (Fayyad et al., 1996) • Raw Data [Text] Data [Text] Mining • Patterns • Analytical Patterns (rules, decision trees) • Statistical Patterns (data distribution) • Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.

  6. Data mining Methods in bioinformatics • Clustering • Sequence Mining • Bayesian Methods • Expectation Maximization (EM) • Gibbs Sampling • Hidden Markov Models • Kernel methods • Support Vector Machines

  7. Text mining in bioinformatics • Document indexing • Information retrieval • Lexical analysis (Sentence tokenization, Word tokenization, Stemming, Stop word removal) • Semantic analysis • Query processing • Text classification • Text clustering • Text summarization • (Semi-) Automatic curationof literature repositories • Knowledge discovery from text, hypothesis generation

  8. data analysis • data mining • analytical • statistical • visual cleaned data models • data “pre”- • processing • noisy/missing data • feature selection information sources • data • management • databases • data warehouses • model/pattern • evaluation • quantitative • qualitative data “good” model • model/patterns • deployment • prediction • decision support new data Data/text mining Process (KDD)

  9. Putting all together … • Data / Text / Information Integration • Mining over data and text combined • Visualization • Other real-world issues • Developing tools and techniques that are efficient, scalable, and user friendly

  10. Interdisciplinarytechniques come from multiple fields • Natural Language Processing (AI) Computational Linguistics • Contributes text analysis techniques • Databases • Contributes efficient data storage, data cleansing, and data access techniques • Data Visualization • Contributes visual data displays and data exploration • High Performance Comp. • Contributes techniques to efficiently handling complexity • Signal processing • Image Processing … • Biology and Biomedicine • Contributes domain knowledge • Machine Learning (AI) • Contributes (semi-)automatic induction of empirical laws from observations & experimentation • Statistics • Contributes language, framework, and techniques • Pattern Recognition • Contributes pattern extraction and pattern matching techniques

  11. QUESTIONS? * Images in this presentation were downloaded from Google images

More Related