410 likes | 440 Views
Lecture-1 By Prof.K.K.Achary Yenepoya Research Centre Yenepoya University. Syllabus. Unit 1. Basic concepts and descriptive statistics Unit 2. Sampling techniques and probability distributions Unit 3. Testing of hypothesis Unit 4. Correlation &Regression Techniques
E N D
Lecture-1 By Prof.K.K.Achary Yenepoya Research Centre Yenepoya University Prof.K.K.Achary,YRC
Syllabus Unit 1. Basic concepts and descriptive statistics Unit 2. Sampling techniques and probability distributions Unit 3. Testing of hypothesis Unit 4. Correlation &Regression Techniques Diagnostic tests & reliability tests Prof.K.K.Achary,YRC
Course objectives Understand the definition of Statistics Know about different types of variables & measurement scales To understand data, data management Using visualization techniques to understand data Learn descriptive & inferential methods Sample size determination & sampling methods Analyze research data, draw conclusions& include them in research report/article/thesis. Acquire basic knowledge of using a statistical package – SPSS. Prof.K.K.Achary,YRC
What is Statistics? Statistics is a very broad subject, with applications in a vast number of different fields. We can say that statistics is the methodology for collecting, analyzing, interpreting and drawing conclusions from information. Statistics is the methodology which scientists (statisticians) have developed for interpreting and drawing conclusions from collected data. Everything that deals even remotely with the collection, processing, interpretation and presentation of data belongs to the domain of statistics, Prof.K.K.Achary,YRC
Statistics – some definitions Scientific study of numerical data based on natural phenomena Statistics is a body of methods for collecting and analyzing scientific data. Statistics is a set of analytical tools designed to quantify uncertainty Prof.K.K.Achary,YRC
“Statistics is the technology of scientific method” Different people give different definitions focusing on certain aspects of the subject Prof.K.K.Achary,YRC
Statistics is the science whereby inferences are made about specific random phenomena, on the basis of relatively limited sample data. Statistics is the science of learning from data, and measuring, controlling and communicating uncertainty; and thereby provide the navigation essential for controlling the course of scientific and social advances ( American Statistical Association) Prof.K.K.Achary,YRC
The word ‘statistics’ is understood in two different ways • As a singular noun it refers to the subject /discipline/branch of study • In plural sense it refers to collected facts or information, i.e.data/summary based on data • When we use in singular sense, it is written as “Statistics” Prof.K.K.Achary,YRC
What are the different views? Mathematical Statistics – mainly deals with developing theories,models,techniques, computational algorithms etc. Applied Statistics -- deals with application of statistical methodology in different areas of study- mostly dealing with natural phenomena wherein numerical facts/data are observed on single or several aspects. Prof.K.K.Achary,YRC
Examples – Applied Stat. Anthropometry Agricultural Statistics Biometry/Biostatistics Chemometrics Demography Econometrics Environmetrics Forestry Statistics/Fisheries Statistics Geostatistics Psychometry Sociometrics Technometrics ------- Prof.K.K.Achary,YRC
Etymology of the word ‘statistik’ –German word which means’science of state’ or ‘political arithmetic’ ‘statisticumcollegium’ – Latin word which means ‘ council of states ‘ ‘statista’ – Italian word meaning ‘statesman’ All these words mean ‘political state’ 18th century origin Historically, Statistics was the ‘science of statecraft’ Prof.K.K.Achary,YRC
What is Biostatistics? • Biostatistics deals with the application of statistical methods to biological/medical data to analyze, interpret and draw inferences/conclusions from the derived results. • It encompasses design and analysis of • biological experiments- randomised experiments, • clinical trials in biology, medicine,pharmaceutical science • Epidemological studies etc. Prof.K.K.Achary,YRC
Early contributors who are responsible to build strong theoretical foundations to develop Statistical theory and its applications are coming from different backgrounds– mostly mahtematicians, engineers,geneticists,biologists etc. Most of them are from UK and USA. Indian statisticians have also made significant contributions Sir Ronald Aylmer Fisher is called Father of Modern Statistics Prof.P.C.Mahalanobis is called ‘Father of Statistics in India’ Prof.K.K.Achary,YRC
A genius who almost single-handedly created the foundations for modern statistical science • Statistical methods for Research workers ( 1925 ) • Tests of significance , experimental design etc. Prof.K.K.Achary,YRC
Correlation coefficient • Chi-square test • Foundations of hypothesis testing • Pearson’s system of curves • Started BIOMETRIKA Prof.K.K.Achary,YRC
Regression theory • Psychometry • Inheritance of intelligence • Anthropometrics • Extinction of family names • Karl Pearson was his student Prof.K.K.Achary,YRC
Statistical graphics (used pie chart) • Polar area diagram • Mortality in army due to poor sanitation • First elected female member of Royal Statistical Society Prof.K.K.Achary,YRC
Pen name “Student • Student’s t-distribution & t – test • Design of experiments Prof.K.K.Achary,YRC
Neyman-Pearson lemma which laid the foundation for testing statistical hypothesis • Stratified sampling • Confidence interval Prof.K.K.Achary,YRC
Only son of Karl Pearson • Neyman-Pearson lemma • Likelihood ratio criterion Prof.K.K.Achary,YRC
Father of modern statistics in India • Indian Statistical Institute ( 1932 ) • Sample surveys • Pilot survey concept • Mahalanobis distance • Founder Director of ISI • Five year plans Prof.K.K.Achary,YRC
Cramer-rao inequality • Rao-Blackwell theorem • Score test • Worked in most of the emerging areas • Eberly Professor at Univ. of Pittsburg • Director of ISI Prof.K.K.Achary,YRC
Kallianpur-Kunita theorem • Kallianpur-Robbins law • Kallianpur-Striebel formula • Director of ISI Prof.K.K.Achary,YRC
Block designs • Bose-Mesner algebra • Algebraic analysis and construction of block designs Prof.K.K.Achary,YRC
Considered as the father of modern probability theory • Axiomatic and measure theoretic foundations of probability theory Prof.K.K.Achary,YRC
Major contributions are in the areas of quality control,acceptance sampling and sampling theory Prof.K.K.Achary,YRC
Experimental designs • First female statistician elected to International Statistical Institute Prof.K.K.Achary,YRC
Cooley-Tukey algorithm • Exploratory data analysis • Box plot • Tukey’s test • Tukey’s lambda distribution • Coined the terms”bit” and “software" Prof.K.K.Achary,YRC
Geneticist & evolutionary biologist • Genetic linkage in mammals • Population genetics • Coined the term “clone” • J.B.S. Prof.K.K.Achary,YRC
Geneticist • Path analysis • Inbreeding coefficient • Distribution of gene frequencies( with R.A.Fisher & Haldane ) Prof.K.K.Achary,YRC
If you feel the subject is hard,then follow these tips; • Understand the basic concepts and relate them to your domain • Workout examples using simple data sets • You can learn statistics by working out variety of examples from different areas of interest and interpreting the results Prof.K.K.Achary,YRC
The aim of statistics is twofold: . Descriptive statistics: Summarizing and describing observed data such that the relevant aspects are made explicit. . Inferential statistics: Studying to what extent observed trends/effects can be generalized to a general (infinite) population Prof.K.K.Achary,YRC
“Data reduction:” Summarize data in compact form • Minimum • Maximum • Mean • Standard deviation • Range, etc. • Some visualisation tools –charts,graphs/plots which help summarization of data Prof.K.K.Achary,YRC
Techniques make use of probability theory, probability distributions, sampling methods, etc. Tests of hypothesis, ANOVA, Designs of Experiment, Asociation between variables - correlation model fitting and prediction – regression , logistic regression etc. Prof.K.K.Achary,YRC
Data are distinct pieces of factual information, usually formatted in a special way . It is collective information (collection of facts or statistics ) Data is the plural of datum, a single piece of information. In practice, however, people use data as both the singular and plural form of the word. Data is generally numeric ( quantitative ) information in Statistics. But now, it need not be so ! Prof.K.K.Achary,YRC
Height of an individual – single piece of information Height of a group of 50 individuals – collective information or data/statistics of heights of individuals You may consider data of name,gender,state of origin/native place,% of marks in qualifying examination, marks in entrance exam,etc. pertaining to the new batch of students admitted to YU. This is collective information from which we can extract lot of additional information or knowledge. What are the additional information or knowledge you can extract from this data ? Prof.K.K.Achary,YRC
Data are plain facts. When data are processed, organized, structured or presented in a given context so as to make them useful, they are called information. Data by themselves are fairly useless. But when these data are processed to determine their true meaning, they become useful . This useful information can be called knowledge. Prof.K.K.Achary,YRC
The history of temperature readings all over the world for the past 100 years is data. If this data is organized and analyzed to find that global temperature is rising, then that is information/knowledge. • The number of visitors to a website by country is an example of data. Finding out that traffic from India is increasing while that from Australia,it is decreasing is meaningful information. • Data could be primary or secondary. Prof.K.K.Achary,YRC
Research data is data that is collected, observed, or created, for purposes of analysis to produce original research results. It could be primary or secondary. Research data can be generated for different purposes and through different processes. Can be divided into different types, depending on the study design. Prof.K.K.Achary,YRC
Observational: data captured in real-time, for example, sensor data, survey data, CTscan/MRI images • Experimental: data from lab equipment, often reproducible, but can be expensive. For example, gene sequences, chromatograms • Simulation: data generated from test models . For example, climate models, economic models • Reference data sets: collection of smaller (peer-reviewed) datasets, most probably published and archived. For example, gene sequence databanks, chemical structures, economic databases, epidemiological databases etc. Prof.K.K.Achary,YRC
Measure or observe the characteristics of interest like : height , weight, gender , BP , sugar level, cholesterol level patient’s condition during admission to ICU pain level before and after treatment no. of days for recovery anesthesia dose family size, no. of siblings, family income etc. The characteristics may be qualitative or quantitative in nature Prof.K.K.Achary,YRC