190 likes | 274 Views
Application of Unstructured Learning in Computational Biology. Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz. Computability. Before computers were built, mathematicians knew what they could do arithmetic (e.g. missile trajectories)
E N D
Application of Unstructured Learning in Computational Biology Tony C Smith Department of Computer Science University of Waikato tcs@cs.waikato.ac.nz
Computability • Before computers were built, mathematicians knew what they could do • arithmetic (e.g. missile trajectories) • search (e.g. keys for secret codes) • sort (census information) • … anything with a mathematical algorithm Unstructured learning in computational biology Tony C Smith
Artificial Intelligence • Computers do things only human brains can otherwise do expert expert Unstructured learning in computational biology Tony C Smith
Artificial Intelligence • Computers do things only human brains can otherwise do expert system expert Unstructured learning in computational biology Tony C Smith
Artificial Intelligence • Computers do things only human brains can otherwise do expert system learning system Unstructured learning in computational biology Tony C Smith
Machine learning • creating computer programs that get better with experience • learn how to make expert judgments • discover previously hidden, potentially useful information (data mining) What is machine learning? How does it work? • user provides learning system with examples of concept to be learned • induction algorithm infers a characteristic model of the examples • model is used to predict whether or not future novel instances are also examples – and it does this very consistently, and very, very quickly! Unstructured learning in computational biology Tony C Smith
weight heavy normal light dirt firmness good mild clean hard soft good poor good poor Structured learning Mushroom Data Weight Damage Dirt Firmness Quality heavy high mild hard poor heavy high mild soft poor normal high mild hard good light medium mild hard good Light clear clean hard good normal clear clean soft poor heavy medium mild hard poor . . . Unstructured learning in computational biology Tony C Smith
Unstructured learning • data does not have fixed fields with specific values • examples: images, continuous signals, expression data, text • learning proceeds by correlating the presence or absence of any and all salient attributes Document Classification • given examples of documents covering some topic, learn a semantic model that can recognize whether or not other documents are relevant • prioritize them: i.e. quantify “how relevant” documents are to the topic • not limited to keywords (nor is it misled by them) • adapt to the user’s needs (ephemeral or long-term) Unstructured learning in computational biology Tony C Smith
Document classification demo Unstructured learning in computational biology Tony C Smith
bioinformatics • Finding genes • Determining gene roles • Determining protein functions • Empirical tests • Sequence similarity comparison • Literature Unstructured learning in computational biology Tony C Smith
GO-KDS demo Unstructured learning in computational biology Tony C Smith
Amino Acid R group Amide group Carboxyl group Unstructured learning in computational biology Tony C Smith
Amino Acid tyrosine glycine Unstructured learning in computational biology Tony C Smith
DNA encodes amino acids Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Unstructured learning in computational biology Tony C Smith
Rasmol demo Unstructured learning in computational biology Tony C Smith
Biotechnology • Biologists know proteins, computer scientists know machine learning • Together, they can find out a lot of hidden information about genes and proteins • Biotechnology is a multi-billion dollar industry • Biotechnology is one of the best funded areas of scientific research Unstructured learning in computational biology Tony C Smith