310 likes | 488 Views
Chapter 7 Bioinformatics. 生物資訊學的發展. 1990 年代 : 人類基因體計劃 1982 年 : 美國國家衛生院 (NIH) 建立了 GenBank 1988 年 : 建立 NCBI (National Center for Biotechnology Information). Definition of Bioinformatics.
E N D
生物資訊學的發展 • 1990年代: • 人類基因體計劃 • 1982年: • 美國國家衛生院(NIH)建立了GenBank • 1988年: • 建立NCBI (National Center for Biotechnology Information) 1-
Definition of Bioinformatics • Research, development, or application ofcomputational tools and approaches forexpanding the use of biological, medical,behavioral or health data including thoseto acquire, store, organize archive,analyze, or visualize such data. 1-
Why use bioinformatics? • an explosive growth in the amount of biological information • a more global perspective in experimental design.. • data-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms. From http://www.ncbi.nlm.nih.gov 1-
生物資訊分類 • 生物資訊可略分為四類: • 有關生物之結構、形態、顏色等巨觀及微觀之資訊 • 生物遺傳物質DNA及基因體序列及其特性的資訊 • 生物大分子如蛋白質及碳水化合物結構與特性之資訊 • 其他有關生物之生化、生理、遺傳、演化等之特性 1-
Types of bioinformatics tools • Database • Software • Web resource • 演算法 • 圖像及訊號處理 • 電腦架構及資料庫管理 • 電腦語言 • 程式設計 • 人工智慧及訊息理論 • 設計與模擬作業 • 數值分析 • 統計學 • 軟體工程及自動化 1-
主要生物資訊網站 • NCBI (National Center for Biotechnology Information) • ExPASy (Expert Protein Analysis System) • EMBnet (European Molecular Biology network) 1-
主要的核酸與蛋白質資料庫 • GenBank(美國), EMBL (歐洲) 及DDBJ (日本) • PDB/RCSB(Protein Database), PIR(Protein Information Resource), Pfam(Protein Family database) 1-
解析生物資訊之網路工作站 • EMBOSS (European Molecular Biology Open Software Suite) • SDSC-Biology Workbench 1-
生物資訊學之應用 • (1) 數據取得及處理 • (2) 基因定位 • (3) 基因體圖譜及比較 • (4) 分子模型構築及模擬 • (5) DNA及蛋白質序列及結構比較 • (6) 大分子結構預測及藥物設計 • (7) 分子演化等領域。 1-
DNA Sequencing • Acquire Sample information, chromatograms, assembled data • Store Data and information, backup data • Analyze Quality assessment, filter and assemble data • Predict and discover gene function • Study genetic variation and gene expression • Distribute Data to collaborators and customers • Research findings to the scientific community 1-
發現新基因--傳統方法 找到帶病(突變)的個體 比較正常/變異個體gene 表現不同之處 尋找突變gene 發現致病gene 1-
發現新基因--genomics Sequence Data Gene Finding Function Prediction Novel Gene?? 1-
發現致病基因--genetics linkage 帶遺傳疾病的個體 利用家族圖譜尋找genetic marker與疾病遺傳的關係 找到致病gene 找到與致病gene有關的marker 1-
Some Problems in Bioinformatics • Sequence comparison • Fragment assembly of DNA sequences • Physical mapping • Evolutionary trees • Molecular structure prediction 1-
Sequence Comparison • Goals: • Database search: Given a sequence S and a set of sequences G, to find all the sequences in G, which are similar to S. • Similarity: To find which parts of the sequences are alike and which parts differ. - Sequence alignment (global alignment) - Local alignment 1-
Sequence Alignement • Global alignment • Local alignment 1-
Longest Common Subsequence(1) • To find a longest common subsequence between two strings. string1: TAGTCACG string2: AGACTGTC LCS : AGACG • Dynamic programming: 1-
Longest Common Subsequence(2) TAGTCACG AGACTGTC LCS: 1-
Edit Distance(1) • To find a smallest edit process between two strings. TAGTCACG AGACTGTC Operation: DMMDDMMIMII 1-
Edit Distance(2) TAGTCACG AGACTGTC 1-
Similarity • Two sequences s1 and s2. • p is the match value if ai = bj, else it is the mismatch value. • g is the gap penalty. 1-
Sequence Alignment a = TAGTCACG b = AGACTGTC ----TAGTCACG TAGTCAC-G-- AGACT-GTC--- -AG--ACTGTC • Which one is better? 1-
Sequence Alignment Formula c0,0 = 0 ci,0 = i c0,j = j if ai bj if ai = bj 1-
Sequence Alignment Example TAGTCAC-G-- -AG--ACTGTC 1-