1 / 61

Biomedical informatics for proteomics

Biomedical informatics for proteomics. Boguski , M. S. and M. W. McIntosh (2003). Nature 422(6928): 233-237. 指導 老師 : 趙坤茂 Kun-Mao Chao 組員 : 施光偉   蕭雅 茵 計 佩 岑 葉 衍陞  葉 欣綺  鍾宇彥 蘇鈺惠 陳雲濤. Outline. Introduction Study design and sample quality

hedva
Download Presentation

Biomedical informatics for proteomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biomedical informatics for proteomics Boguski, M. S. and M. W. McIntosh (2003). Nature 422(6928): 233-237. 指導老師 : 趙坤茂 Kun-Mao Chao 組員 : 施光偉  蕭雅茵 計佩岑 葉衍陞  葉欣綺  鍾宇彥 蘇鈺惠 陳雲濤

  2. Outline • Introduction • Study design and sample quality • Protein databases • Protein identification by database searching • Pattern matching without protein identification • Conclusions and future challenges

  3. Introduction reporter:施光偉

  4. Introduction • The subtitle:“Genes WereEasy”. • We have transitioned rapidly from a large but finite and completehuman genome to a seemingly infinite biological universe. • Proteomics is often referred to as a ‘post-genome’ science, but its antecedents actually predate the Human Genome Project by two to three decades. • Although medical informatics has until recently been largely detached from bioinformatics, the emergence of clinical genomics and proteomics increasingly requires the integrated analysis of genetic, cellular, molecular and clinical information and the expertise of pathologists, epidemiologists and biostatisticians.

  5. Introduction • Proteomics is the latest functional genomics technology to capture our imagination and it is instructive to review some lessons learned during the earlier adoption of another functional genomics technology, namely gene expression analysis using microarrays and similar technologies. • There are many implications of biomedical informatics for proteomics, including multiple platform technologies, laboratory information-management systems, medical records systems, and documentation of clinical trial results for regulatory agencies. • In the present work, we confine our discussions to mass spectrometry-based proteomics, and to study design and data resources, tools and analysis in a research setting.

  6. Introduction • Proteomics depends upon careful study design and high-quality biological samples, advanced information technologies. • Proteome analysis is at a much earlier stage ofdevelopment than genomics and gene expression (microarray) studies. • Fundamental issues involvingbiological variability, pre-analytic factors and analytical reproducibility remain to be resolved.

  7. Study design and sample quality reporter:蕭雅茵

  8. Glossary • Case-control and cohort study Observational studies: Case → O/X of the phenotype(case/control) Cohort → Participants based on O/X of risk factor of interest and over time for development of an outcome • Confounder/Confounding Distort an apparent relationship between an exposure and a phenotype of interest • Plasma: fluid, non-cellular Serum: protein solution remaining after blood coagulated • Pre-analytical variables Variables that present before laboratory test and data analysis • Randomized clinical trial Treatments are randomly assigned in order to prevent confounding

  9. Study design and sample quality • Potter describes 4 study design • However, the distinction between observational and experimental design isn’t made as well as proteomics studies.

  10. Study design and sample quality • Observational studies of gene expression and proteomic analysis involving human→①bias & confounding factor • Human plasma and serum proteomics are susceptible to observational biases→confused with a specific characteristic of the disease process→mislead • Each may induce a change in total protein concentrations by ± 10%. • Highlighting human serum proteome→nature but confounding variables may complicate finding

  11. Study design and sample quality • No adjust for confounding even, only to have careful design and specimen ascertainment • ②quality ③number • Margolin has admonished that ”Scientists...need to avoid the tendency, often driven by the high price of some of the newer techniques, of running under-controlled experiments or experiments with fewer repeated conditions than would have been accepted with standard techniques.” • Proteomics discovery has no priori enumeration of targets and lacks described procedural structure.

  12. Protein database reporter:計佩岑

  13. Proteome Genome DNA mRNA Proteome Proteins

  14. Protein databases • Collections protein sequences date back to the1960s. • Utilitarian goal of protein databases (1990s~today) • Minimal redundancy • Maximal annotation • Integration with other databases

  15. Protein databases • Current molecular sequence databases are classified according to their evolutionary history inferred from sequence homology. • excellent tools for gene discovery, comparative genomics and molecular evolution • much work to be done to even minimally serve the needs of proteomics and integrative biological science

  16. Protein databases • Today's principal protein databases emphasize • molecular • cellular features • annotation • are not well suited to represent physiology. • A more ideal database for plasma proteome studies would classify proteins from a functional, rather than an evolutionary, viewpoint

  17. Data standards • Multiple or specialized file formats has hindered accessibility, information exchange and integration • eXtensibleMarkup Language (XML) • an Internet standard for describing structured and semistructured data • most of the main databasesmake their data available in XML and make it easy to publish and exchange XML data

  18. Protein databases • PDB(Protein Data Bank ) • GenBank • SWISS-PROT • EMBL • HPRD(Human Protein Reference Database)

  19. Protein identification by database searching reporter: 葉衍陞  葉欣綺  鍾宇彥

  20. Purpose of Protein identification by database searching • NOT the species or remoteness of the relationship • infer similarity of function from similarity of sequence • study the evolution of protein families or domains • Different aims and therefore require different strategies and tools

  21. Analysis of human serum • interested in identifying proteins they are not normally present • match between subsequences • weak similarities

  22. Statistical significance • statistical significance is important, but not in the sense of the probability that two sequences are related by chance • deviates significantly from a normal range of values. • If it is met, one is then interested in attempting to demonstrate a significant correlation

  23. 影響database原因之一 DNA 1 mRNA 2 protein 1.Transcription Post translational modification 2.translation(proteolytic processing glycosylation, methylation, phosphorylation, Met切除, 雙硫鍵形成, acetylation, hydroxylation )

  24. Post translational modification proteolytic processing • 移除訊號序列胺基酸殘基 • 移往特定細胞 • 特殊胜肽水解酶移除 glycosylation • Asn和 Ser或Thr • 主要場地內質網 • 有潤滑作用的含有寡糖類之鏈

  25. Post translational modification methylation • 特定Lys殘基進行 • 某些肌肉蛋白、組蛋白、與色素細胞c phosphorylation • 多接在-OH 基的胺基酸 • 調控蛋白質酵素活性

  26. Post translational modification Met切除 • N端的Met往往在多胜肽鏈合成前被切除(AUG) 雙硫鍵形成 • mRNAhas no coding acetylation • 組蛋白調控轉錄作用 hydroxylation • 膠原蛋白等

  27. Peptide analysis • Error Tolerance • Scoring methods

  28. Peptide analysis- Experimental process • Cut to mixture of short peptides • Specific: restriction enzyme • Mass Spectrometry • Detect the m/z of the compounds • Tandem mass spectrometry (MS/MS) • Fragments of specific m/z • Chromatography • Separation before MS

  29. Tandem mass spectrometry http://en.wikipedia.org/wiki/Tandem_mass_spectrometry

  30. Chromatography Dionex

  31. Peptide analysis- Mass Spectrometry • Several Approach • Analytic peptide-mass fingerprint • used as profile • Compare with the predicted spectrum • match to database • De novo sequence interpretation • Manual interpretation by expert • Time consumption high

  32. Consideration of Error Tolerance • Restriction enzyme non-specificity • Precursor charge errors • Get more than one charge in ionization • Isotope • Mass measurement errors • Related to accuracy of instrument • Unsuspected modifications • Ex: post-translational modification • Primary sequence variations • deletions, insertions, substitutions [2002] Error tolerant searching of uninterpreted tandem mass spectrometry data

  33. Scoring methods description • In general, each scoring algorithm designates a quantity related to the probability that the candidate peptide could have produced the observed spectrum by chance • Ranking is required for high-throughput automated analysis

  34. Example of peptide identification PB cannot be identified due to high variation Solutions: reduce the number of target peptides

  35. Another challenge • Another automating proteomics challenge : the best match of a scoring algorithm is simply not good enough. • Establishing a criteria for acceptance overall therefore becomes the main focus of automated proteomics.

  36. Scoring Threshld ,P value • It is generally assumed that higher-scoring assignments are more likely to be correct than lower-scoring assignments. • Threshold:i : Sensitivity ii : Specificity iii:Mixture , sequence data base • P values : If p values<0.05 , 5% of all false tests will be misidentified as true.

  37. Scoring Threshld ,P value Probability http://rating.com.vn/home/_/Y-nghia-cua-tri-so-P-tuc-P-value.26.1080

  38. P value-like quantities • Keller et al. estimate the reference distributions of the correct and incorrect assignments within any experiment. • Keller et al. describe an approach that may allow a scoring algorithm to be converted into P value-like quantities that can then be used to control error rates.

  39. *Pattern matching without protein identification *Conclusions and future challenges reporter:蘇鈺惠 陳雲濤

  40. Time-of-flight mass spectrometry(TOF)

  41. Time-of-flight mass spectrometry • mass spectrometry • ions are accelerated by an electric field • velocity of the ion depends on the mass-to-charge ratio • Time is measured • Compared with known experimental parameter, we can get the ion of mass-to-charge ratio.

  42. Time-of-flight mass spectrometry • Time-of-flight mass spectrometry (TOFMS) is a method of  mass spectrometry in which ions are accelerated by an electric field of known strength. This acceleration results in an ion having the same kinetic energy as any other ion that has the same charge. The velocity of the ion depends on the mass-to-charge ratio. The time that it subsequently takes for the particle to reach a detector at a known distance is measured. This time will depend on the  mass-to-charge ratio of the particle (heavier particles reach lower speeds). From this time and the known experimental parameters one can find the  mass-to-charge ratio of the ion. The elapsed time from the instant a particle leaves a source to the instant it reaches a detector.from wikipedia

  43. Principle & method • Ep=q*U • Ek=1/2*m*v^2 • Ek=Ep q*U=1/2*m*v^2 (v=d/t)t=k*sqrt(m/q) ; k=d/sqrt(2*U) • The velocity is determined by time-of-flight tube length(d) and time of the flight of the ion (t) v=d/t

  44. application • Matrix-assisted laser desorption ionization time of flight spectrometry(MALDI-TOF) is a pulsed ionization technique that is readily compatible with TOF MS. • 1. ionize molecule via laser pulse • 2. separate molecule according to mass to charge ratio • 3. mainly used for detection of large biomolecule.

  45. Component of MALDI-TOF

  46. http://www.youtube.com/watch?v=gTRsaAnkRVU

  47. Drawback of TOF • Each m/z value of the spectrum reflects the abundance of possibly many peptides having a similar mass. Thus, with complex mixtures, these TOF methods are not able to identify individual peptides.

  48. Using TOF • When used with complex mixtures, analysis methods are intended to identify peaks, or features, of the spectrum that can segregate identifiable groups • When evaluating expression array, using Clustering methods, Pattern matching for alignment and peak identification.

  49. expression array (圖) ig 2. 

  50. Cluster Algorithm • 一種分類的方法: 由一個基準點,描述其在有限範圍(Eps)內包含不少於MinPt個點的群集 • 範圍以歐幾里得距離或曼哈頓距離算之 • 用途廣泛: 諸如商業市場分析、生物分類研究、生醫資訊領域、Data mining、Machine Learning、圖像分析 • 種類:Partitioning MethodsHierarchical Methodsdensity-based methodsgrid-based methodsModel-Based Methods

More Related