310 likes | 461 Views
بنام خدا. داده كاوي و كاربرد آن در پزشكي. نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه (سمينار درس كاربرد فناوري اطلاعات در پزشكي). Why DATA MINING?. Necessity is mother of invention Huge amounts of data
E N D
بنام خدا داده كاوي و كاربرد آن در پزشكي نام دانشجو : بابك رزاقي شماره دانشجويي : 85233510 استاد راهنما : جناب آقاي دكتر توحيد خواه (سمينار درس كاربرد فناوري اطلاعات در پزشكي)
Why DATA MINING? • Necessity is mother of invention • Huge amounts of data • Electronic records of our decisions • Choices in the supermarket • Financial records • Our comings and goings • We swipe our way through the world – every swipe is a record in a database • Data rich – but information poor • Lying hidden in all this data is information!
What is DATA MINING? • Extracting or “mining” knowledge from large amounts of data • Data -driven discovery and modeling of hidden patterns in large volumes of data • Extraction of implicit, previously unknown and unexpected, potentially extremely useful information from data
Data visualization Data mining Large database Data visualization • Ways of seeing patterns in large data sets • Uses the efficiency of human pattern recognition
Terminology • Gold Mining • Knowledge mining from databases • Knowledge extraction • Data/pattern analysis • Knowledge Discovery Databases or KDD
__ ____ __ ____ __ ____ Patterns and Rules Knowledge Discovery Process Integration Interpretation & Evaluation Knowledge Data Mining Knowledge Raw Data Transformation Selection & Cleaning Understanding Transformed Data Target Data DATA Ware house
Data Mining Central Quest Find true patterns and avoid overfitting (false patterns due to randomness)
Major Data Mining Tasks • Classification: predicting an item class • Clustering: finding clusters in data • Associations: e.g. A & B & C occur frequently • Visualization: to facilitate human discovery • Summarization: describing a group • Estimation: predicting a continuous value • Deviation Detection: finding changes • Link Analysis: finding relationships
DATA MINING CHALLENGES • Computationally expensive to investigate all possibilities • Dealing with noise/missing information and errors in data • Choosing appropriate attributes/input representation • Finding the minimal attribute space • Finding adequate evaluation function(s) • Extracting meaningful information • Not over fitting
Data Mining Software • INSIGHTFUL MINER • Angoss Knowledge ACCESS • ARMiner • Eudaptics Viscovery • Goal TV • MDR • ViscoverySOMine • SPSS
DATA MINING APPLICATIONS • Science: Chemistry, Physics • Bioscience • Sequence-based analysis • Protein structure and function prediction • Protein family classification • Microarray gene expression • Financial Industry - banks, businesses, e-commerce • Stock and investment analysis • Pharmaceutical companies • Health care • Sports and Entertainment
Clinical Data Mining processes • Digital format for all pertinent data • Create structure • Obtain coded information • Natural language understanding • Create a widely accessible repository
Minimum systolic blood pressure over a 24-hour period following admission to the hospital > 91 <= 91 Age of Patient Class 2: Early death <=62.5 >62.5 Class 1: Survivors Was there sinus tachycardia? Classification example for Medical Diagnosis and Prognosis Heart Disease YES NO Class 2: Early death Class 1: Survivors
Genome, DNA & Gene Expression • An organism’s genome is the “program” for making the organism, encoded in DNA • Human DNA has about 30-35,000 genes • A gene is a segment of DNA that specifies how to make a protein • Cells are different because of differential gene expression • About 40% of human genes are expressed at one time • Microarray devices measure gene expression
Microarray Raw Image Gene Value D26528_at 193 D26561_cds1_at -70 D26561_cds2_at 144 D26561_cds3_at 33 D26579_at 318 D26598_at 1764 D26599_at 1537 D26600_at 1204 D28114_at 707 raw data Scanner enlarged section of raw image
Microarray Potential Applications • New and better molecular diagnostics • New molecular targets for therapy • few new drugs, large pipeline, … • Outcome depends on genetic signature • best treatment? • Fundamental Biological Discovery • finding and refining biological pathways • Personalized medicine ?!
Microarray Data Mining Challenges • Avoiding false positives, due to • too few records (samples), usually < 100 • too many columns (genes), usually > 1,000 • Model needs to be robust in presence of noise • For reliability need large gene sets; for diagnostics or drug targets, need small gene sets • Estimate class probability • Model needs to be explainable to biologists
Conclusion • Discover useful relationships in data • Discover information otherwise overlooked • Provide intelligence to improve various phases • Intellectual property • Competitive advantages: • Getting more out of your data • Finding other relevant information faster • Exploratory, hypothesis-generating analyses • Increase productivity – reduced amount of time and money
Thank You All razaghi.b@gmail.com