380 likes | 740 Views
医疗卫生大数据 从 理 论 到 实 践. 雷健波 北京大学医学信息学中心. 1. 医疗卫生跨入“大数据”时代. 2. 学科学基础 -BMI 生物医学信息学. 3. 新的研究方法 -DRR. 4. Case Studies-- 案例研究. 演 讲 提 纲. 92% of the world’s data was created in just the past two years. Smolan & Erwitt: The Human Face of Big Data.
E N D
医疗卫生大数据从 理 论 到 实 践 雷健波 北京大学医学信息学中心
1 医疗卫生跨入“大数据”时代 2 学科学基础-BMI生物医学信息学 3 新的研究方法-DRR 4 Case Studies--案例研究 演 讲 提 纲
92% of the world’s data was created in just the past two years. Smolan & Erwitt: The Human Face of Big Data
By Steven Wilkes, from Smolan & Erwitt: The Human Face of Big Data (The average person today processes more data in a single day than a person in the 1500s in an entire lifetime)
(During the first day of a baby’s life, the amount of data generated by humanity is equivalent to 70 times the information contained in the Library of Congress.) Smolan & Erwitt: The Human Face of Big Data
Public Health Billing Clinical Imaging Physiology Administrative Genomic Laboratory Medical Knowledge
The Size of the Human Genome 3 billion bases Human Genome 200 copies of White Pages 9.5 years to read out loud 3 Gigabytes (1 DVD)
Sequencing a Whole Human Genome = The Price of a Dental Crown $1000 Genome is coming
Health Data from Body Sensors Imec: Human+++: body networks http://www.imec.be/ScientificReport/SR2008/HTML/1225020.html
The Explosive Growth of Human Genomic Data Source: Open Source Electronic Health Record Agent (OSEHRA)
1 医疗卫生跨入“大数据”时代 2 学科学基础-BMI生物医学信息学 3 新的研究方法-DRR 4 Case Studies--案例研究 演 讲 提 纲
卫生信息化 • 行业(Industry) • 应用(Application) • 卫生信息技术 (HIT) • 数字化医疗(院) • 学术(Academia) • 学科(Discipline) • 医学信息学
“生物医学信息学”的重要概念体系 Bioinformatics Imaging Informatics 临床信息学 公共卫生信息学 组织器官水平 个体水平 人群和社会 分子细胞水平 应用和实践 Biomedical Informatics Methods, Techniques, and Theories 基础研究 Biomedical Informatics(医学信息学) ≠ Bioinformatics(生物信息学) Biomedical Informatics (医学信息学) ≠ Health Informatics(卫生信息学) Health Informatics
美国主要的医学信息学系 哥伦比亚大学 • Stanford University • Utah University • University of Texas at Houston • Harvard University • Vanderbilt University • Pittsburg University
美国主要的医学信息学学位 M.S. • M.A. • PhD • Post-Doc • Fellowship
1 医疗卫生跨入“大数据”时代 2 交叉学科的产生-BMI生物医学信息学 3 新的研究方法-DRR 4 Case Studies--案例研究 演 讲 提 纲
National Science Foundation • National Institutes of Health • Department of Defense • Department of Energy • National Geographic Survey
BIG DATA Analytics • Descriptive: • What happened? • Disease categories • Adverse events Data Information Acquisition Storage Processing Integration Retrieval Display Knowledge Wisdom • Predictive: • What might happen? • High-risk patients • Genetic risks Take Tylenol 39°= Fever • Prescriptive: • What should we do? • Minimize readmissions • Personalized therapeutics 39° 39 Health Prevention Biomedical Discovery Healthcare Delivery
大数据时代的思维变革 • 不是随机样本,而是全体数据 • 不是精确性,而是混杂性 • 不是因果关系,而是相关关系
大数据和新学科 催生“科学研究方法”的巨大转变 传统:假说驱动的方法 --->“发现、数据”驱动的方法 Hypothesis Driven Research -->Data (Discover) Driven
传统科研方法的问题 • 源于:16、7世纪“启明运动” • 哲学根源:唯物主义,“能观察、能测量” • “假说驱动的科研方法”步骤: • 发现问题(identify problem) • 形成假说(formulate hypothesis) • 实验验证(Experiment and Data Collection) • 数据分析(Data Analysis) • 结论推广(Conclusion and Generalization) • 局限:只能解决“Know Know” Problem • 局限:内在逻辑上的问题(a->b; 非a->非b) • 局限:Costly, time-consuming, non-reusable, low thrpt • 局限:过分强调微观、局部
新的科研方法的光芒 • 源于:信息化革命,存储能力+运算能力+人工智能 • 哲学根源:idealism,“不能用传统的方法观察、测量” • “数据驱动的科研方法”步骤: • 制定标准(数据标准、功能标准、传输标准) • 信息化建设(收集数据,建立数据仓库) • 算法研究(n种数据挖掘算法的研究) • 多假说自动筛选(Pattern Identification) • 验证与结论 • 好处:解决所有“unKnow” Problem • 好处:Low Cost, time-saving, reusable, high thrpt • 好处:解决宏观、整体
“未知”世界(科学问题) 数据驱动研究 Data Driven Research 发现驱动研究 Discover Driven Research know 规律驱动研究 Pattern Driven Research unknown
1 医疗卫生跨入“大数据”时代 2 学科学基础-BMI生物医学信息学 3 催生新的研究方法-DRR 4 Case Studies--案例研究 演 讲 提 纲
http://www.businessinsider.com/10-ways-the-ipad-is-changing-healthcare-2010-7?op=1http://www.businessinsider.com/10-ways-the-ipad-is-changing-healthcare-2010-7?op=1 Automatic Clinical Summarization What do you do when your new patient brings all these records to you?
Diabetic Nephropathy(kidney failure caused by diabetes) Genome-wide-Association (GWA) study to identify polygenic disease model Unsuccessful, thus far… http://homes.cs.washington.edu/~suinlee/research.html
Diabetic Nephropathy -Medical Literature Mining Gene Ontology: Concepts & Relationships • Preliminary Study: • Potential polygenic disease model • Need more research Disease Gene Network
Developing a New Drug is Very Costly • $800 million • 10 - 17 years • 10% success rate http://www.ncats.nih.gov/research/reengineering/process.html
Discover New Functions of Existing Drugs from Medical Literature MEDLINE 20 million articles Pre-clinical Screening 50 million facts Predictions Scalable Search and Inference
Discover New Functions of Existing Drugs from EHR Free Text Data: Admit 10/23, 71 yo woman h/o DM, HTN, chronic diarrhea, admitted with SOB. CXR NLP Diabetes: Metformin Non-Diabetes Diabetes: Other Drugs Structured, Extracted Data: <problem name=“Diabetes Mellitus”, text=“DM’> < status value=“history of”> <code value=“C0011849“ scheme=“UMLS”> </problem> Diabetes: Insulin
Source: Nigel Holmes 2012 / Smolan & Erwitt: The Human Face of Big Data
“BIG DATA” revolution is going to be BIGGER than the “Internet” revolution!
THANK YOU! JBLEI@hsc.pku.edu.cn 雷健波:13901295033 38