380 likes | 487 Views
Pharmaceutical Statistics. Basic Concepts. Definitions. Statistics is the field of study concerned with: The collection, organization, summarization and analysis of data, and The drawing of inferences about a body of data when only apart of the data is observed.
E N D
Pharmaceutical Statistics Basic Concepts
Definitions • Statistics is the field of study concerned with: • The collection, organization, summarization and analysis of data, and • The drawing of inferences about a body of data when only apart of the data is observed. • Statistics has its own vocabulary: • Data • Variables (quantitative, qualitative, random, discrete random, continuous random). • Population • Sample
Data • Data are the raw material of statistics. We may define data as numbers. • There are two kinds of numbers that we use in statistics: • Numbers resulting from taking a measurement (Cmax, tablet hardness, patient temperature etc.) • Numbers that result from a process of counting (patients discharged from a hospital per day, tablets produced by a certain process per shift etc) • Data may be obtained from one or more of the following sources: • Routinely kept records • Surveys • Experiments • External resources
Statistics and Data Statistical Analysis (AVG, STD) Performance of the student
Variables • Variable: a characteristic monitored, capable of taking different values (diastolic blood pressure, heights of students, hardness of tablets). • A quantitative variable is one that can be measured in the usual sense (heights, weights, drug content, plasma concentration at some time point). • A qualitative variable is one that can not be measured. It can be described or categorized (race/ethnicity of patients, discoloration in produced tablets). Qualitative variables can be counts to produce frequencies.
Variables • Random Variable: Whenever the values obtained from a measurement arise as a result of chance factors, so that they can not be exactly predicted in advance • Continuous random variables take an infinite number of possible values. Continuous random variables are usually measurements. Examples include height, weight, the exact amount of active ingredients in an amoxicillin suspension (0, 0.1, 0.558, 1, 1.009, 2, 3.008,…etc) • Discrete random variables may take on only a countable number of distinct values such as 0,1,2,3,4,........ Discrete random variables are usually (but not necessarily) counts.. (number of students in a school, # tablet/hour).
Samples & Population • Population: A population is an entire group, collection or space of objects which we want to characterize (we want to study the bad effects of smoking on UJ’s student: The population is all students in University of Jordan). In case Population is too large to study we need to take a reprehensive sample • Sample: A sample is a collection of observations on which we measure one or more characteristics. Frequently, we use (small) samples of (large) populations to characterize the properties and affinities within the space of objects in the population of interest. For example, if we want to characterize the US population, we can take a sample (poll or survey) and the summaries that we obtain on the sample (e.g., mean age, race, income, body-weight, etc.) may be used to study the properties of the population, in general.
Samples & Population • In statistics, we reach a conclusion about a population on the basis of information contained in a sample that has been drawn from that population. • There are many kinds of samples that may be drawn from a population. The simplest type of scientific samples that may be used for analysis is the simple random sample.
Simple Random Sample • If you use the letter N to designate the size of a finite population and the letter n to designate the size of finite sample then: • If a sample of size n is drawn from a population of size N in such a way that every variable in the group N has the same chance of being selected, the sample is called a simple random sample. • Each individual in n is chosen randomly and entirely by chance [N] [n] N=400 n=48
Data Organization Ordered array • Measurements that have not been organized, summarized or otherwise manipulated are called raw data. • Unless the number of observations is extremely small, it will be unlikely that these raw data will impart much information until they have been put into some kind of order. • Always it is easier to analyze organized data
Grouped DataThe frequency distribution • Although a set of observation can be made more comprehensible and meaningful by means of an ordered array, further useful summarization may be achieved by grouping the data. • To group a set of observations, we select a set of non-overlapping intervalssuch that each value in the data set of observations can be placed in one, and only one, interval. • These intervals are usually referred to as Class Intervals. • Usually class intervals are ordered from smallest to largest. • Interval width=19-10+1=10 or 20-10=10 • Total range=69-10+1=60 or 10*6=60
Grouped dataThe relative frequency distribution • It may be useful sometimes to know the proportion rather than the number, of values falling between a particular class interval. • We obtain this information by dividing the number of values in the particular class interval by the total number of values. • We refer to the proportion of values falling within a class interval as the relative frequency of values in that interval. • We may sum (cumulate) the frequencies and relative frequencies to facilitate obtaining information regarding frequency or relative frequency of values within two or more contiguous class intervals.
Histogram • We may display a frequency distribution (or a relative frequency distribution) graphically in the form of a histogram, which is a special type of bar graphs. • When we construct a histogram, the variable under consideration are represented by the horizontal (x) axis, while the the frequency (or relative frequency) of occurrence is the (y) axis. Histogram
Class Problem II • A school nurse weighed 30 students in Year 10. Their weights (in kg) were recorded as follows: • 50 52 53 54 55 65 60 70 48 63 • 74 40 46 59 68 44 47 56 49 58 • 63 66 68 61 57 58 62 52 56 58 • 1. Use the data above to construct a frequency table Range = 74-40=34 Let width of class interval =5 #intervals=34/5=7 There are 7 class intervals. This is reasonable for the given data. The frequency table is as follows: 2. Complete the table to calculate: cumulative frequency, relative frequencies, cumulative relative frequencies (HW1-C) Dr. Alkilany 2012
Biostatistics Lecture 3 Descriptive Statistics Dr. Alkilany 2012
Descriptive Statistics • With interval scale (continuous measurement) data, there are two aspects to the figures that we should be trying to describe: • How large are they? ‘indicator of central tendency’ • How variable are they? ‘indicator of dispersion’ • FBG for two sets of patients as follows: Set A: 84, 85,89, 89, 93, 94. Set B:72, 82,89, 89, 96, 106. which is larger? Which is more variable? • ‘indicator of central tendency’ describes any statistic that is used to indicate an average value around which the data are clustered • Three possible indicators of central tendency are in common use – the mean, median and mode. Dispersion Central tendency Dr. Alkilany 2012
Mean • The usual approach to showing the central tendency of a set of data is to quote the average or the ‘mean’. • Example: Potency data of different vaccine batches. • Each batch is intended to be of equal potency, but some manufacturing variability is unavoidable. • A series of 10 batches has been analyzed and the results are shown in the following table: Sum = 991.5 n=10 Mean=99.15 Dr. Alkilany 2012
Types of Mean • Arithmetic mean • Geometric Mean • Harmonic mean • Arithmetic mean Dr. Alkilany 2012
Arithmetic Mean • Arithmetic mean represents the balance point of the distribution. Symmetrical Tail to the right Tail to the left Mean ( ) • The arithmetic mean has the following properties: • Uniqueness, for a given set of data there is one and only one mean. • Simplicity, easily to be understood and computed. • Not robust to extreme values, it is affected by each value in the data. • e.g. 5,10,15: mean=10…………..5,10,150: mean=55 Dr. Alkilany 2012
Geometric Mean • Geometric Mean: • In mathematics, the geometric mean is a type of mean or average, which indicates the central tendency or typical value of a set of numbers by using the product of their values (as opposed to the arithmetic mean which uses their sum) • is the anti-log of the average of the logarithms of the observations. Dr. Alkilany 2012
Geometric Mean • Example, for the values 50, 100, 200 Geometric mean = Antilog[(log50+log100+log200)/3]=100 while the arithmetic mean is 116.67. • Is meaningful for data with logarithmic relationships as in the case of the current procedure in bioequivalence studies where the ratios of log-transformed parameters are compared (log (AUC), log(Cmax)).
Harmonic Mean: Is the appropriate mean following reciprocal transformation. Example, the half-lives of a certain drug in 3 subjects were 2, 4, 8 hrs. determine the harmonic mean half-life for this drug? While the arithmetic mean is 4.667 hrs. Harmonic Mean Dr. Alkilany 2012
Median • The point that divides the distribution into two equal parts, or the point between the upper and lower halves of the distribution. • Accordingly, if we have a finite number of values, then the median is the value that divides those values into two parts such that the number of values equal to or greater than the median is equal to the number of values equal to or less than the median. 7 variables 7 variables Median Dr. Alkilany 2012
Median (example) • Fifteen patients were provided with their drugs in a child-proof container of a design that they had not previously experienced. • The time it took each patient to open the container was measured. • The results are shown below. • The mean = 7.09 s, Is this the most representative/descriptive figure? Some outliers shifted the mean and thus median can tell us better information in this case Values are clustering here Dr. Alkilany 2012
Median • Most patients have got the idea more or less straight away and have taken only 2–5 s to open the container. • However, four seem to have got the wrong end of the stick and have ended up taking anything up to 25 s. • These four have contributed a disproportionate amount of time (65.6 s) to the overall total. This has then increased the mean to 7.09 s. • We would not consider a patient who took 7.09 s to be remotely typical. In fact they would be distinctly slow. Dr. Alkilany 2012
Median (other example) • This problem of mean values being disproportionately affected by a minority of outliers arises quite frequently in biological and medical research. • A useful approach in such a case is to use the median. eg: Blood Glucose Level (mg/dl): 80, 81, 82, 83, 84, 84, 86, 86, 180 Mean: 93 Median: 84 The outlier 180 shifted the mean to higher value, which is not descriptive for the data set in this case!! outlier Values are clustering here Dr. Alkilany 2012 Median Mean
Median (how to determine it in ordered array?) • When n is an odd number, then the median is the value number (n+1)/2 in an ordered array Example, what is the median for the following data set: 10, 15, 12, 25, 20. • Rank the data: 10, 12, 15, 20, 25. • The median is the value number (n+1)/2 (5+1)/2=3rd so the median is 15. • When n is an even number, then the median is the mean of the two middle values (n/2)th and ((n/2) + 1)th in an ordered array . Example, what is the median for the following data set: 10, 15, 20, 25, 30, 5 • Rank the data: 5, 10, 15, 20, 25, 30 The median is the average of (n/2)th and the (n/2 + 1)th values: 3rd and 4th (15+20)/2= 17.5 Dr. Alkilany 2012
Mean Vs. Median The median is robust to extreme outliers. The term ‘robust’ is used to indicate that a statistic or a procedure will continue to give a reasonable outcome even if some of the data are aberrant. eg. 2, 4, 6, 8, 10 median=6 2, 4, 6, 8, 1000 median=6 If last variable increased to 1000 instead of 10, the median will stay the same (6), while the the mean would be hugely inflated!!!! Dr. Alkilany 2012
Mode • Mode: value which occurs most frequently. • If all values are different there is no mode and a set of values may have more than one mode. • Used for quick estimation and for identifying the most common observation. • Properties: • Not unique • Simple • Not robust, less stable than the median and the mean. Dr. Alkilany 2012
Mode • The condition of sixty patients with arthritis is recorded using a global assessment variable. • A positive score indicates an improvement and a negative one a deterioration in the patient’s condition after treatment. • The mean (0.77) [Do you think the mean is the best descriptive parameter for these data? Dr. Alkilany 2012
Mode • A histogram of the above data shows that there are two distinct sub-populations. • Slightly under half the patients have improved quality of life, but for the remainder, their lives are actually made considerably worse. Dr. Alkilany 2012
Mode • Neither the mean nor the median indicator remotely describes the situation. • The mean is particularly unhelpful as it indicates a value that is very untypical – very few patients show changes close to zero. • We need to describe the fact that in this case, there are two distinct groups. The data consisted of values clustered around some central points. Dr. Alkilany 2012
Mode • Data distribution can be ‘unimodal’ or ‘polymodal’ in the case with several clustering. • If we want to be more precise, we use terms such as bimodal or trimodal to describe the exact number of clusters. Dr. Alkilany 2012
How mean, median, and mode are related? For symmetric distributions: the mean and median are equal For skewed distributions with a single mode the three measures differ Dr. Alkilany 2012
How mean, median, and mode are related? For skewed distributions with a single mode the three measures differ: mean>median>mode (positively skewed distributions) mean<median<mode (negatively skewed distributions) Dr. Alkilany 2012