1 / 15

Data Mining for BioInformatics at Ewha CSE

Data Mining for BioInformatics at Ewha CSE. Dec. 14, 2001 Hwan-Seung Yong ( Gene: ACTGAAAGGGCTCTCAAA ) Dept. of Computer Science & Engineering Ewha Womans Univ. BioInformatics and Computer Science. Computer: 2 진법 시스템 (0/1) designed by Human Living things: 4 진법 (A/G/C/T) designed by Nature

Download Presentation

Data Mining for BioInformatics at Ewha CSE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining for BioInformatics at Ewha CSE Dec. 14, 2001 Hwan-Seung Yong (Gene: ACTGAAAGGGCTCTCAAA) Dept. of Computer Science & Engineering Ewha Womans Univ.

  2. BioInformatics and Computer Science • Computer: 2진법 시스템(0/1) designed by Human • Living things: 4진법(A/G/C/T) designed by Nature • 컴퓨터 기술의 발전 • 데이터 분석 + 데이타베이스 = 데이터 마이닝 (At present) • 고성능 병렬 컴퓨터 기술 • 분산 처리 및 웹/X ML 기술 • 지식관리(Knowledge Management) 기술의 등장 • 인간이 컴퓨터를 만든 이유 • 4진법속에 담긴 생명의 비밀을 찾아서 • 신의 영역에 도전 For BioInformatics

  3. BioInformatics and Computer Science • BioInformatics • DNA 코드 Reader(biotechnology) 및 Alignment 기술 개발 • 유전자의 전체 시퀀스를 겨우 만든 상태 • 이것으로 부터 의미(유전자 등)를 찾는 것. • Binary Object로 부터 Source Code를 찾는 기술 • Disassembler와 Reverse Engineering 기술 전문가가 필요 • 데이타마이닝이 중요한 적용 기술임. Computer System Binary Code Assembly Code Source Code DNA Sequence 유전자 단백질 Living Things: Nature

  4. Why Ewha CSE is appropriate for BioInformatics • Recent focus of CSE’s Research Area • As a BK Project Plan: Knowledge Engineering Framework • Data Warehousing and OLAP • Data Mining • XML Technology • Knowledge Engineering Enabling Technology • Knowledge Engineering Application • Electronic Commerce • BioInformatics • 본교 관련 연구기관 • 분자생명과학대학원 (BK) • 한국과학재단 SRC(세포신호전달센터) • 정통부 컴퓨터 그래픽스/가상현실 연구센터 • 기존의 관련연구(직접) • 검찰청 유전자 검색 및 자동분석 프로그램 개발 • 국립과학수사연구소 유전자 정보 관리 시스템 개발

  5. 유전자 자동분석 프로그램 유전밴드 인식, 코드 등록 프로그램

  6. DNA Locus Registration Interface

  7. Data Warehousing, OLAP and Data Mining • Data Warehousing and OLAP • ETL Methodology (Extraction, Transformation and Loading) • Data Warehouse Architecture • OLAP Server Development • Multidimensional Data Processing • Metadata Handling • Data Quality Control • Data Mining • Classification and Analysis of Data Minig Technique • Clustering Algorithm • Association Algorithm • Classification Algorithm • CRM Appliation based on Web Log Mining • Text Mining for XML Data

  8. XML and Supporting Technology • XML Related Area • XML Server Development • Query Processing and Storage System • XML document Mining • Knowledge Enabling Technology • Multimedia Highspeed Network • Component based Software Engineering • Security • Multimedia DBMS • Natural Language Processing • Computer Graphics and Virtual Reality

  9. Research Requirement for BioInformatics • Large Volume of Data including multimeia data • High Performace Computing System • Massively Parallel Processing Hardware and Software • XML related work is important • For exchange of bio data • Gene Annotation • Web based collaborative system • Require web based interoperable application and standard • Distributed processing technique • CORBA, SOAP, Microsoft .NET framework • Data Mining • For Gene Prediction, Functional Genomics

  10. Bio Data Mining Research • XML Standard for Bio Data • Graphical User Interface for XML Data • Data Converter to XML • Convert Existing Bio Data to XML Standard • Convert between Some XML Standard • Integration Methodology with Existing DB • SOAP(Simple Object Access Protocol) • WSDL(Web Service Description Language)

  11. XML Standard for Bio Data • Before • FASTA format, GenBank format, GFF(General Feature Format) • XML Format • AGAVE (Architecture for Genomic Annotation, Visualization and Exchange) • Developed by Double Twist, Inc. • Released in June 2000 • Open Source licence in August 2001. • AGAVE 3.2 version with Prophecy 3.0 in Sept. 2001 • Refer http://www.agavexml.org • Genome XML Viewer by Labbook • BSML

  12. XML standard for Bio Data • BioXML Standard and GAME • an open-source/free software organization dedicated to providing a set of standard xml formats for the exchange of biological data • GAME(Genomic Annotation Markup Language) • Created at BDGP (Berkeley Drosophila Genome Project) • Current Version 1.1 released in March 2000 • http://www.bioxml.org • Follow WikiWeb scheme • collaborative web site that can be edited by anyone • Community documentation system • Everyone can edit sharing web pages

  13. New algorithm design • Simulated annealing • Other optimization techniques • Phylogenetic Tree Visualization • Tree drawing algorithms • Graph drawing algorithms 컴퓨터이론 및 보안 연구실 Whole genome sequence annotation • Known gene • Sequence similarity • Unknown gene • Neural networks • Hidden Markov models Unknown gene prediction Microarray data analysis Phylogenetic prediction Clustering classification tools Data mining tools Phylogeny inference Phylogenetic analysis Comparative genomics Two samples comparison Multiple samples comparison

  14. Open Source Project • Open BioInformatics Foundation • http://www.open-bio.org • Umbralla group for various bio*.org group • bioxml.org, bioperl.org, biopython.org, biojava.org, biocorba.org • biopathways.org • bio-ensembl.org • Annotation for human genome • The First Bioinformatics Open Source Conference (BOSC'2001) was held, August 2001 at San Diego. • Many Open System Activities

  15. Vision and Future Prediction • Ewha will • Contribute something in Bio Data Mining Area • Have Bio Informatics Institute or Research Center • Have strong bio-industry relationship • Closing Comment ATGCCGTCGGGCCCCGGGGC => Thank You를 4진법으로 표현

More Related