1 / 45

An Introduction to Bioinformatics

An Introduction to Bioinformatics. 北京大学医学部医学信息学系 崔庆华 11-16, 2008. Introduction of basic concepts. Bioinformatics-- a definition -- by NIH(1995).

tambre
Download Presentation

An Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Introduction to Bioinformatics 北京大学医学部医学信息学系 崔庆华 11-16, 2008

  2. Introduction of basic concepts

  3. Bioinformatics-- a definition--by NIH(1995) Bioinformatics is defined as a scientific discipline that encompasses all aspects of biological information acquisition, processing, storage, distribution, analysis and interpretation, that combines the tools and techniques of mathematics, computer science and biology with the aim of understanding the biological significance of a variety of data.

  4. Bio-informatics– the term • Bio-informatics • Computational biology • Biological computing

  5. ★ Large-scale and high-throughput ★ High-dimensional ★ Non-linear ★ Noisy ★ Unequally distributed Data……

  6. Bioinformatics– what is the most important • Algorithms? • Data? • Questions!

  7. Bioinformatics– 误解 • 什么都能做? • 生物学/信息学 Biology Experimental Theoretical Computational

  8. Sequences & Structures

  9. Alignment E<10-20 • blastall • blastp • blastn • blastx • tblastn • Tblastx • clusterX

  10. Evolution Selection • Coding region: Ka, Ks (dn,ds), Ka/Ks (dn/ds) • PAML • Kaks_calculator • K-estimator • Mega • Database: UCSC or ENSEMBL • Non-coding region • Ralph Haygood (Nature Genetics 2007) • Recent populations • LRH test (Sabeti et al., Nature 2002 • iHS test (Voight et al., Plos Biology 2006) • XP-EHH (Sabeti et al., Nature 2007) Constructing phylogenetic trees • Phylip • Clustalw • PAML • MEGA (Kumar et al., Briefings in Bioinformatics 2004)

  11. Evolution—An application • Recent positive selection • SLC24A5, SLC45A2, skin pigment, Europe population • LARGE, DMD, Lassa fever virus, Africa population • EDAR, EDA2R, the development of hair, teeth and exocrine glands, Asia population (Sabeti, Nature 2007).

  12. Alternative Splicing (AS) • Predicted from ESTs • Predicted from cDNA clones • Prediction of tissue-specific AS • Splicing graphs and EST assembly problem

  13. Functional Domain • TF binding sites • TRANSFAC: a TF binding site database • TESS: a web-based program • Exons, introns, 5’UTR, 3’UTR • UCSC • Promoter • CorePromoter • Motif • Weeder • RNA family • Rfam • Protein domain • Pfam: database • InterPro: database • HMMER: a program based on HMM

  14. Finding genes

  15. Sequence mutations PIK3CA Gymnopoulos et al., pnas 2007 • Tool: SIFT & Sapred • Conservation score? • Near functional sites? • Similarity score? • Surface? • ……… Huang et al., Science 2007

  16. Modeling structures • RNAfold • RNAStructure

  17. Modeling structures • Homology modeling • ESyPred3D • Swiss Model • Ab initio prediction • Rosetta • Single mutation modeling • Modeller • Visualization • Pymol

  18. 目标 约束 解 最优化算法 目标:max (或min)Y=f(x) 约束:x>=0 解:求x=?

  19. 确定性优化算法-智能优化 遗传算法、模拟退火

  20. DNA microarray data analysis

  21. Microarray总流程 Biological Question Data Analysis & Modelling Sample Preparation MicroarrayDetection Microarray Reaction Taken from Schena & Davis

  22. s1 s2 s3• • • • • • • • sj • • • • • sM g1 g2 • • • • gi • • • • • gN gene profile Gi Mi,j array profile Aj Microarray data matrix

  23. 数据预处理 • 数据缺失 • 原因 • 图像受到污染 • 图像分辨率不足 • 片上灰尘或刮痕 • 缺失数据的处理方法 • 舍弃该数据(同时丢掉了有用信息!) • 再做一次实验 (太昂贵了!) • 用某个数取代,比如样本均值 • K-nearest neighbors估计 • 奇异值分解(SVD) 估计 • 标准化 • Log变换 • 线性回归 • 伸缩+平移

  24. Microarray数据模式分类 X Y F(X) 训练样本 预处理 特征提取 机器学习 决策 新样本 分类器 决策

  25. G1 x2 L: c1x1+c2x2-c=0 G2 x1

  26. 模式分类算法 • 线性分类器 • 神经网络 • 最近邻 • 贝叶斯分类器 • 隐马尔科夫模型分类器 • 决策树 • 支持向量机

  27. Microarray数据模式聚类 • 层次聚类 • K-means 聚类 • Fuzzy C-means聚类 • 自组织映射 • Replicator dynamics (Cui, 2004)

  28. 基因表达特征抽取 • 差异表达基因 • Gene set or pathway • PCA • SVD • ISOMAP • MDS • 区分男女的特征 • 头发长度? • 皮肤光滑度? • 嗓音? • 身高? • 力量? • 穿着? • 姿态? • XX/XY

  29. 基因关系的刻划 • Static relationship • Pearson’s correlation • Spearman’s correlation • Mutual information • Other similarity metric • Dynamic relationship • Dynamic regression (Cui, 2005) • Window based correlation

  30. 基因表达网络 • Pearson’s correlation • Hard threshold • Weighted • Mutual information • Bayesian network

  31. Computational Systems Biology

  32. What is Systems Biology? • Not a new concept! • Systems biology is an emergent field that aims at system-level understanding of biological systems (Kitano 2002). • To understand biology at the system level, we must examine the structure and dynamics and cellular organismal function, rather than the characteristics of isolated parts of a cell or organism.

  33. E _ B D + + + A C 0 Why Systems Biology? http://www.newvisions.ucsb.edu/background/images/elephant.gif

  34. Why Computational Systems Biology? • Golden opportunity, now! ★ More than 16 international meetings in 2006 Large-scale, high-throughput data ★ More than 10 books in the past two years ★ Journals: Molecular systems biology (Nature & EMBO), BMC systems biology, IET systems biology, EURASIP Journal on Bioinformatics and Systems Biology etc.

  35. Fields of Computational Systems Biology? • Biological networks construction, such as gene regulatory networks, cellular signaling networks, metabolic networks, protein-protein interaction networks, genetic interaction networks, gene co-expression networks, literature networks.

  36. Fields of Computational Systems Biology? • Properties of systems, such as topology, robustness, tolerance. Albert et al., Nature 2000

  37. Fields of Computational Systems Biology? • Biological questions on systems-levels, such as diseases, evolution, medicine etc. Ras region TGFβ region P53 region Goh et al., PNAS 2007 Cui et al., MSB 2007

  38. M1 D1 M2 D2 M3 M4 D1 D1 D3 一个应用:microRNA-disease systems biology

  39. Human microRNA disease network

  40. 我的建议以及需要大家帮助的问题

  41. My Suggestions • 第一,相关参考文献通读一遍,相关数据要记录下来。 • 第二,浏览本ppt一遍或者咨询生物信息学专业人士看有无Bioinformatics就可以解决的问题 • 第三,所阅读文献中数据本身有无生物信息学分析的可能,比如Meta-analysis, Systems biology. • 第四,包括生物信息学在内的新知识并不难,当你亲自完成一个项目的时候就会深有体会!

  42. 我们需要实验验证的工作 • The functions of mir-423, mir-608 that are under recent positive selection • SLC24A5, SLC45A2, skin pigment, Europe population • LARGE, DMD, Lassa fever virus, Africa population • EDAR, EDA2R, the development of hair, teeth and exocrine glands, Asia population (Sabeti, Nature 2007). • Experimental validation of a potential liver-disease related microRNA: miR-149 • SNP: rs2292832, CEU and YRI 80% C 20% U; CHB and JPT 20% C 80% U. • Host gene is GPC1(Glypican 1,硫酸乙酰肝素蛋白聚糖), which is overexpressed in pancreas cancer; and another member (GPC3) of this host gene family is a liver cancer marker. • GPC1是肝素结合生长因子的受体 • Not expression in liver/ Expression in liver • Target HEV and HGV • Free energy: C: -54.9; U: -52.7

  43. 我们需要实验验证的工作 • Cardiovascular • miR-1 • miR-133 • miR-199a • miR-21 • miR-23a • miR-23b • miR-208 • Liver (miR-122) • Kidney • Brain • Lung • ………

  44. 崔庆华:15801250611,82801585 Email: cuiqinghua@bjmu.edu.cn 您身边最好的裁缝 谢谢大家 欢迎指导

More Related