490 likes | 505 Views
This lecture by Prof. K.K. Achary explains the concept of variables and different measurement scales used in statistical analysis. It covers qualitative and quantitative variables, discrete and continuous variables, as well as the NOIR scale (Nominal, Ordinal, Interval, Ratio).
E N D
Lecture – 2 By Prof.K.K.Achary YRC, Yenepoya university 18/08/2016
Examples Class standing of the members of this class relative to each other Admitting diagnosis of patients admitted to a mental health clinic Weights of babies born in a hospital during a year Gender of babies born in a hospital during a year Range of motion of elbow joint of students enrolled in a university health sciences curriculum Under-arm temperature of day-old infants born in a hospital Ages of individuals willing to participate in a substance abuse survey BMI of school children in the age group 10 – 15 years
In statistical language, any characteristic that can have different values for different cases/subjects is called a variable The characteristics of interest measured or observed by the researcher are variables Qualitative characteristic qualitative variable ( also called attribute )—observed Examples: eye colour , skin colour ,… Quantitative characteristic quantitative variable/variable – measured/counted Examples: height,BMI,fasting blood sugar, toxicity…
A variable assuming values which are whole numbers( integers ) only is called a discrete variable. Eg: family size, no. of children in family, no. of accidents in a month, no. of days to recover A variable which takes continuous values in an interval is called a continuous variable. Eg: height , weight , blood sugar , BMI etc
For continuous variables, whatever two values you mention, it is always possible to have more values (in the interval) between them. Example: height of an individual may be 1.21 metres when measured on 10th March this year, and 1.27 metres on 10th March next year. In the intervening 12 months, individual will have been not just 1.22 or 1.23 or 1.24 and so on up to 1.27 metres, but will have been all the measurements possible, however small they might be, between 1.21 and 1.27
Sometimes the distinction between discrete and continuous is less clear. A person’s age, which could be discrete if the stated age is at a particular time, say 42years in 2007 Or continuous, because there are many possible values between the age today (42 years, 7 weeks and 3 days) and the age next week (42 years, 8 weeks and 3days)
There are four measurement scales (or levels of data measurement):( NOIR scale ) NOMINAL ORDINAL INTERVAL RATIO These are simply ways to categorize different types of variables. A psychologist named Stanley Stevens gave these terms.
Nominal Nominal scales are used for labeling variables, without any quantitative value. “Nominal” scales could simply be called “labels” and they cannot be arranged in ‘ order ‘ Here are some examples: Gender, hair colour, name, place of birth, etc. Given two nominal variables, say A and B you can answer the question : Is A different from B? Can be string ( alphanumeric )or numeric as in coding 1=male,2=female
OrdinalWith ordinal scales, the order of the values is important and significant, but the differences between each one is not really known. • Example: • Do you feel stressed today? • No stress , mild stress, moderate stress, severe stress, extreme stress - on a five point measuring scale • Participants with severe stress will have more serious condition than participants with mild stress
How is the pain today? • No pain, mild pain, moderate pain, severe pain, extreme pain • How was the Ph.D. entrance examination? • Very easy, easy, OK, difficult , very difficult • In all these examples, response is subjective • Scores such as Apgar score: 0,1,2,…10 • Appearance,Pulse,Grimace,Activity,Respiration • ( American Pediatric Gross Assessment Record )
IntervalInterval scales are numeric scales in which we know not only the order, but also the exact differences between the values. The classic example of an interval scale is Celsius temperature because the difference between each value is the same. For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees. Time is another good example of an interval scale in which the increments are known, consistent, and measurable. they don’t have a “true zero.” !
Interval scales are nice because the realm of statistical analysis on these data sets opens up. For example, central tendency can be measured by mode, median, or mean; standard deviation can also be calculated.
Ratio Ratio scales are the ultimate when it comes to measurement scales They tell us about the order, the exact value between units, and they also have an absolute zero–which allows for a wide range of both descriptive and inferential methods to be applied Good examples of ratio scale variables include height and weight Ratio scales provide a wealth of possibilities when it comes to statistical analysis
These variables can be meaningfully added, subtracted, multiplied, divided (ratios). Central tendency can be measured by mode, median, or mean; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales. By how many times A is bigger than B?
A categorical variable is a variable that can take one of a limited, and usually fixed number of possible values, thus assigning each individual to a particular group or "category.” Examples: blood group, gender, hair color, soil type, diabetic status It is nominal type and no ordering can be considered If ordering is possible then it is ordinal type For statistical analysis we may give numeric values (labels) which are not ‘values’ Each of the possible values of a categorical variable is referred to as a level.
A categorical variable with two levels is called a binary variable ( dichotomous variable ) Sometimes we discretize/categorize continuous data into categories. Height data may be broadly classified as tall,medium and short. If we label short=0,medium = 1 and tall=2, we have three categories or levels. The categories here are actually ordinal type. An ordinal type variable is categorical variable with ordinal arrangement of its values/levels.
Identification and classification of study variable is very important in research It depends on your research concept, study objective, hypothesis of interest etc. Suppose you want to study the factors leading to anger in an individual How will you plan a study? What are the variables you observe/measure?
Activity… • Prepare a list of discrete, continuous and categorical variables in your discipline. • Classify them according to different measurement scales.
Organization of data is a means to efficient research, not an end in itself • Know your data completely by variables and their type,scales of measurement,units of measurement, number of significant digits considered etc. • Be sure that the data you collect meets your needs as far as the objectives of study, hypothesis proposed and the statistical analysis to be done. • Some extra work when you collect material/data may prevent a lot of future problems; think of what information you need to document now so that your data files make sense to you (and others) in the future.
What does Data Organization mean? Data organization, in broad terms, refers to the method of classifying and organizing data sets to make them more useful. In Statistics ,data organization means organizing data in a meaningful and presentable form. Different types of presentations are- tabular presentation ( tables ) diagrammatic presentation graphical presentation
The process of placing classified data into tabular form is known as tabulation. A table is a systematic arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double ( two way ) or complex depending upon the type of classification.
Types of Tabulation: Simple Tabulation or One-way Tabulation: When the data are tabulated with one characteristic, it is said to be simple tabulation or one-way tabulation. Example: Tabulation of data on world population classified by one characteristic like Religion/continents/countries is an example of simple tabulation
Two-way Tabulation: When the data are tabulated according to two characteristics at a time,it is said to be two-way tabulation. Example: Tabulation of data on population of world classified by two characteristics like countries and religion is an example of two-way tabulation. .
Complex Tabulation: When the data are tabulated according to many characteristics, it is said to be complex tabulation. Example: Tabulation of data on world population classified by characteristics like countries,religion, sex , literacy etc…is example of complex tabulation
There are some rules for preparing a statistical table. However, the following points should be borne in mind while preparing a table. A good table must contain all the essential parts, such as, Table number, Title, Head note, Caption, Stub, Body, Foot note and source note. Table should be simple to understand. It should also be compact, complete and self-explanatory.
A good table should be of proper size. There should be proper space for rows and columns. One table should not be overloaded with too many details. Sometimes it is difficult to present entire data in a single table. In that case, data are to be divided into more number of tables. A good table must have an attractive look. It should be prepared in such a manner that a scholar can understand the problem without much difficulty. Table No. and title should be given at the top
In all tables the captions and stubs should be arranged in some systematic manner. The manner of presentation may be arranged alphabetically, or chronologically depending upon the requirement. The unit of measurement should be mentioned in the head note.The unit has to be uniformly the same for a variable. The figures should be rounded off to the nearest hundred, or thousand or lakh. It helps in avoiding unnecessary details. Use fixed number of decimals uniformly.
Percentages and ratios should be computed. Percentage of the value for item to the total must be given in parenthesis just below the value. In case of non-availability of information, one should write N.A. or indicate it by dash (-). Ditto marks should be avoided in a table. Similarly the expression ‘etc’ should not be used in a table.
An example Prepare a table to present data collected from patient records month-wise , including the following information: age group, gender , type of service( OPD/in-patient). The table should give information on no. of patients in the different age groups,gender and type of service.
Cross tabulation Cross tabulation is a type of tabulation in which two characteristics are cross tabulated with sub categories under each characteristic and frequency counts are determined. Suppose A and B are two characteristics. Suppose A has m sub categories and B has n subcategories. Then cross tabulation of A versus B will produce an m x n table. Such tables are also called contingency tables.
Cross tabulation could be considered with categorical/nominal variables, categorical v/s nominal , nominal v/s nominal, categorical v/s categorical. Even continuous variables can be grouped into categories and cross-tabulation can be done. Example: three age groups v/s height( tall , medium and short )
B A
Example Diabetic status Gender Consider the cross tabulation of a group of patients by gender and diabetic status.
Consider the cross tabulation of patients based on three age groups( below 45 ,45-59 and 60and above) versus health status ( diabetic, hypertensive and both diabetic and hypertensive ). Construct crosstabs with different examples.
Diagrammatic representation Diagrams are important tools for presenting numerical data in visual mode. Diagrams help to understand trends and variations in data Trends/variations are better captured in a diagram than by casual scrutiny of numbers Diagrams must be simple; do not include too much information in a single diagram. Details will usually be lost when numbers are transformed to diagrams If scale is not properly chosen / size is disproportionate,there is every chance for misinterpretation.
Diagrams are used for the following different purposes • To compare different categories/nominal variables • To represent the distribution of one or more categories along with variations in the component values • To compare the changes in the values of an interval or ratio type variable over a period of time • To express the co-variation/relationship between two variables which are observed togrther/in pair
Modern statistical/graphical softwares have made the art of diagrammatic presentation quite flexible with large number of features and options, which make a diagram attractive. Further,the types of diagrams have also increased compared to the pre-computer days. However ,note that while using different types of diagrams for presenting research data, choosing the right type of diagram is important; choice of style and beautification is secondary
Types of diagrams Bar diagram Simple bar diagram: This diagram is drawn to compare the values across different nominal / categorical variables The variables are taken along the X-axis and the values are shown along the Y – axis Vertical bars with heights equal to the magnitude of the variable are drawn for each nominal/categorical variable Horizontal bars are also considered & in that case the labels of the axes are reversed All bars should be of equal width Title should be clear and provided at the bottom of the figure Appropriate legends should be used to show different categories/variables/ components
Simple bar chart Quarters Fig.1 Total page views of a website by quarter
Multiple(Cluster) bar diagram • In this type of bar chart we draw two or more bars associated with a nominal/categorical variables • Useful for comparing data for one particular nominal/categorical variable and also across various nominal/categorical variables • Examples: • Comparing three different types of diseases in five regions ( or five years ) • Comparing export of three products to different countries in a given year • Comparing tumor sizes in three different types of cancer
Stacked bar chart In this type of bar diagram, we draw simple bars corresponding to each nominal/categorical variable and these bars are then subdivided into components showing the component values Useful to compare the total values and also the component values If the totals are different,then it is better to component values as percentages with the total representing 100%. It is called Percentage bar Chart
STACKED BAR CHART Fig. 5 Distribution of male and female patients in three study groups
Pie diagram This diagram is constructed to show the components by dividing the area of a circle in to sectors Total of all components is taken as the total area of the circle and the component value is proportional to the area of the sector If data are to be compared between two or more situations, we construct different pie diagrams
Line diagram • New types of diagrams, like bubble diagram,doughnut diagram , heat maps etc. • How to construct these diagrams using softwares? • What are the important points to be remembered while constructing the diagrams? • How do you choose the right type of diagram?