1 / 57

Understanding Variables in Statistical Analysis

Learn about different types of variables in statistical analysis, from qualitative to quantitative, discrete to continuous. Explore measurement scales like nominal, ordinal, interval, and ratio with practical examples.

irvinb
Download Presentation

Understanding Variables in Statistical Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture – 2 By Prof.K.K.Achary YRC, Yenepoya University

  2. In statistical language, any characteristic that can have different values for different cases/subjects is called a variable The characteristics of interest measured or observed by the researcher are variables Qualitative characteristic  qualitative variable ( also called attribute )—observed Examples: eye colour , skin colour ,… Quantitative characteristic  quantitative variable/variable – measured/counted Examples: height,fasting blood sugar, weight,BP, birthweight

  3. A variable assuming values which are whole numbers( integers ) only is called a discrete variable. Eg: family size, no. of children in family, no. of accidents in a month, no. of days to recover A variable which takes continuous values in an interval is called a continuous variable. Eg: height , weight , blood sugar , birthweight etc

  4. For continuous variables, whatever two values you mention, it is always possible to have more values (in the interval) between them. Example: height of an individual may be 1.21 metres when measured on 10th March this year, and 1.27 metres on 10th March next year. In the intervening 12 months, individual will have been not just 1.22 or 1.23 or 1.24 and so on up to 1.27 metres, but will have been all the measurements possible, however small they might be, between 1.21 and 1.27

  5. Sometimes the distinction between discrete and continuous is less clear. A person’s age, which could be discrete if the stated age is at a particular time, say 42years in 2007 Or continuous, because there are many possible values between the age today (42 years, 7 weeks and 3 days) and the age next week (42 years, 8 weeks and 3days)

  6. There are four measurement scales (or levels of data measurement):( NOIR scale ) NOMINAL ORDINAL INTERVAL RATIO These are simply ways to categorize different types of variables. A psychologist named Stanley Stevens gave these terms.

  7. Nominal Nominal scales are used for labeling variables, without any quantitative value.  “Nominal” scales could simply be called “labels” and they cannot be arranged in ‘ order ‘  Here are some examples: Gender, hair colour, name, place of birth, etc. Given two nominal variables, say A and B you can answer the question : Is A different from B? Can be string ( alphanumeric )or numeric as in coding Male =1 Female=2

  8. OrdinalWith ordinal scales, the order of the values is important and significant, but the differences between each one is not really known. • Example: • Do you feel stressed today? • No stress , mild stress, moderate stress, severe stress, extreme stress - on a five point measuring scale • Participants with severe stress will have more serious condition than participants with mild stress

  9. How is the pain today? • No pain, mild pain, moderate pain, severe pain, extreme pain • How was the Ph.D. entrance examination? • Very easy, easy, OK, difficult , very difficult • In these examples, response is subjective • Scores such as Apgar score: 0,1,2,…10 • Appearance,Pulse,Grimace,Activity,Respiration • ( American Pediatric Gross Assessment Record ) • (Used to assess the condition of new born babies)

  10. IntervalInterval scales are numeric scales in which we know not only the order, but also the exact differences between the values.  The classic example of an interval scale is Celsius temperature because the difference between each value is the same.   For example, the difference between 60 and 50 degrees is a measurable 10 degrees, as is the difference between 80 and 70 degrees.  Time is another good example of an interval scale in which the increments are known, consistent, and measurable. They don’t have a “true zero.”  !

  11. Interval scales are nice because the realm of statistical analysis on these data sets opens up.  For example, central tendency can be measured by mean, median, or mode; standard deviation can also be calculated.

  12. Ratio Ratio scales are the ultimate  when it comes to measurement scales They tell us about the order, the exact value between units, and they also have an absolute zero–which allows for a wide range of both descriptive and inferential methods to be applied  Good examples of ratio scale variables include height and weight Ratio scales provide a wealth of possibilities when it comes to statistical analysis

  13.  These variables can be meaningfully added, subtracted, multiplied, divided (ratios).  Central tendency can be measured by mean, median, or mode; measures of dispersion, such as standard deviation and coefficient of variation can also be calculated from ratio scales. By how many times A is bigger than B? This question can be answered with ratio scale data.

  14. A categorical variable is a variable that can take one of a limited, and usually fixed number of possible values, thus assigning each individual to a particular group or "category.” Examples: blood group, gender, hair color, soil type, diabetic status When it is nominal type, no ordering can be considered If ordering is possible then it is ordinal type For statistical analysis we may give numeric values (labels) which are not ‘values’ Each of the possible values of a categorical variable is referred to as a level.

  15. A categorical variable with two levels is called a binary variable ( dichotomous variable ) Sometimes we discretize/categorize continuous data into categories. Height data may be broadly classified as tall,medium and short. If we label short=0,medium = 1 and tall=2, we have three categories or levels. The categories here are actually ordinal type. Discretization of a continuous variable should be avoided while analyzing data. An ordinal type variable is categorical variable with ordinal arrangement of its values/levels.

  16. Activity… • Prepare a list of discrete, continuous and categorical variables in your discipline. • Classify them according to different measurement scales.

  17. Know your data completely by variables and their type,scales of measurement,units of measurement, number of significant digits considered etc. • Be sure that the data you collect meets your needs as far as the objectives of study, hypothesis proposed and the statistical analysis to be done. • Some extra work when you collect material/data may prevent a lot of future problems; think of what information you need to document now so that your data files make sense to you (and others) in the future.

  18. What does Data Organization mean? Data organization, in broad terms, refers to the method of classifying and organizing data sets to make them more useful. In Statistics ,data organization means organizing data in a meaningful and presentable form. Organization of data is a means to efficient research, not an end in itself Different types of presentations are- tabular presentation ( tables ) diagrammatic presentation graphical presentation

  19. The process of placing classified data into tabular form is known as tabulation. A table is a systematic arrangement of statistical data in rows and columns. Rows are horizontal arrangements whereas columns are vertical arrangements. It may be simple, double ( two way ) or complex depending upon the type of classification.

  20. Types of Tabulation: Simple Tabulation or One-way Tabulation: When the data are tabulated with one characteristic, it is said to be simple tabulation or one-way tabulation. Example: Tabulation of data on world population classified by one characteristic like Religion/continents/countries is an example of simple tabulation

  21. Two-way Tabulation: When the data are tabulated according to two characteristics at a time,it is said to be two-way tabulation. Example: Tabulation of data on population of world classified by two characteristics like countries and religion is an example of two-way tabulation. Tabulation of patient data based on gender and disease type/severity gives a two-way tabulation. .

  22. Complex Tabulation: When the data are tabulated according to many characteristics, it is said to be complex tabulation. Example: Tabulation of data on world population classified by characteristics like countries,religion, sex , literacy etc…is example of complex tabulation

  23. The following points should be borne in mind while preparing a table: A good table must contain all the essential parts, such as, table number, title, head note, caption, stub, body, foot note and source note. Table should be simple to understand. It should also be compact, complete and self-explanatory.

  24. A good table should be of proper size. There should be proper space for rows and columns. One table should not be overloaded with too many details. Sometimes it is difficult to present entire data in a single table. In that case, data are to be divided into more number of tables. A good table must have an attractive look. It should be prepared in such a manner that a scholar can understand the contents without much difficulty. Table No. and title should be given at the top.Vertical lines separating the columns in the body of the table should be avoided.

  25. In all tables the captions and stubs should be arranged in some systematic manner. The manner of presentation may be arranged alphabetically, or chronologically depending upon the requirement. The unit of measurement should be mentioned in the head note.The unit has to be uniformly the same for a variable. The figures should be rounded off to the nearest hundred, or thousand or lakh. It helps in avoiding unnecessary details. Use fixed number of decimals uniformly.

  26. Percentages and ratios should be computed. Percentage of the value for item to the total must be given in parenthesis just below the value. In case of non-availability of information, one should write N.A. or indicate it by dash (-). Ditto marks should be avoided in a table. Similarly the expression ‘etc’ should not be used in a table.

  27. Table 1. Distribution of injuries

  28. Cross tabulation Cross tabulation is a type of tabulation in which two categories/characteristics are cross tabulated with sub categories under each category/characteristic and frequency counts are given in the cells.Suppose A and B are two categories, where A has m sub categories and B has n subcategories. Then cross tabulation of A versus B will produce an m x n table. Such tables are also called contingency tables.

  29. Cross tabulation could be considered with categorical/nominal variables, categorical v/s nominal , nominal v/s nominal, categorical v/s categorical. Even continuous variables can be grouped into categories and cross-tabulation can be done. Example: three age groups v/s height( tall , medium and short )

  30. B A

  31. Example Diabetic status Gender Consider the cross tabulation of a group of patients by gender and diabetic status.

  32. Consider the cross tabulation of patients based on three age groups( below 45 ,45-59 and 60and above) versus health status ( diabetic, hypertensive and both diabetic and hypertensive ). Homework: Construct crosstabs with different examples.

  33. 2.2:Diagrammatic representation Diagrams are important tools for presenting numerical data in visual mode. Diagrams help to understand trends and variations in data Trends/variations are better captured in a diagram than by casual scrutiny of numbers Diagrams must be simple; do not include too much information in a single diagram. Details will usually be lost when numbers are transformed to diagrams If scale is not properly chosen / size is disproportionate,there is every chance for misinterpretation.

  34. Diagrams are used for the following different purposes • To compare different categories/nominal variables • To represent the distribution of one or more categories along with variations in the component values • To compare the changes in the values of an interval or ratio type variable over a period of time • To express the co-variation/relationship between two variables which are observed together/in pair

  35. Modern statistical/graphical softwares have made the art of diagrammatic presentation quite flexible with large number of features and options, which make a diagram attractive. Further, the types of diagrams have also increased compared to the pre-computer days. However ,note that while using different types of diagrams for presenting research data, choosing the right type of diagram is important; choice of style and beautification is secondary

  36. 2.3: Types of diagrams Simple bar diagram: This diagram is drawn to compare the values across different nominal / categorical variables The variables are taken along the X-axis and the values are shown along the Y – axis Vertical bars with heights equal to the magnitude of the variable are drawn for each nominal/categorical variable Horizontal bars are also considered & in that case the labels of the axes are reversed All bars should be of equal width Title should be clear and provided at the bottom of the figure Appropriate legends should be used to show different categories/variables/ components

  37. Simple Bar Chart Fig.1 Distribution of accidents by type of impact

  38. Fig. 2 Distribution of accidents by type of injury

  39. Multiple(Cluster) bar diagram • In this type of bar chart we draw two or more bars associated with a nominal/categorical variables • Useful for comparing data for one particular nominal/categorical variable with several sub-attributes. • Examples: • Comparing three different types of diseases in five regions ( or five years ) • Comparing export of three products to different countries in a given year • Comparing tumor sizes in three different types of cancer • Comparing types of injuries in accidents with position of impact

  40. Fig. 4 Tumor size distribution in three types of cancer

  41. Fig.5 Distribution of accidents by impact and injuries

  42. Stacked bar chart In this type of bar diagram, we draw simple bars corresponding to each nominal/categorical variable and these bars are then subdivided into components showing the component values Useful to compare the total values and also the component values If the totals are different,then it is better to show component values as percentages with the total representing 100%. It is called Percentage bar Chart

  43. STACKED BAR CHART Fig. 6 Distribution of male and female patients in three study groups

  44. Pie diagram This diagram is constructed to show the components by dividing the area of a circle in to sectors Total of all components is taken as the total area of the circle and the component value is proportional to the area of the sector If data are to be compared between two or more situations, we construct different pie diagrams

More Related