770 likes | 787 Views
Explore the basics of statistics in biomedicine. Learn about data, branches of statistics, and sources of data. Understand the importance of biostatistics in research and decision-making.
E N D
BIOSTATISTICS Dr Atef A Masad PhD Biomedicine
Some Basic Concepts The Meaning of Statistics “Statistics is the study of the methods and procedures for collecting, classifying, summarizing, and analyzing data and for making scientific conclusions from such data.” Data - Data are the raw materials of statistics. - Data are the numerical information (numbers) that has been obtained on a set of objects and we use to interpret reality. Objects can be anything…e.g., people, animals Example: The number of traffic accidents at certain junctions, the size of employments, or the number of patients visiting a clinic
Statistics is also used to indicate characteristics calculated for a set of data-for example, mean, standard deviation. Biostatistics: When the data being analyzed are derived from the biological sciences and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts.
Why Study Statistics? • A knowledge of statistics is essential for people going into research management or graduate study in a specialized area. • Researchers use statistical methods to quantify uncertainty in the outcomes, summarize and make sense of data, and compare the effectiveness of different treatments. • Federal government agencies and private • Companies rely heavily on statisticians’ input. • Important to review and understand the writings in scientific journals, which use statistical terminology and methodology.
An understanding of statistics can help anyone discriminate between fact and fancy in everyday life-in reading newspapers and watching television, and in making daily comparisons and evaluations. • Finally, a course in statistics should help one know when, and for what purpose a statistician should be consulted.
Branches of Statistics • There are two major categories of statistics: • Descriptive and Inferential statistics. • 1 Descriptive statistics الإحصاء الوصفي • Comprise those methods concerned with collecting organizing, picturing, summarizing and describing a set of data so as to yield meaningful information. • Descriptive statistics provides information only about the collected data. • The construction of tables, charts, graphs, falls in the area categorized as descriptive statistics.
An example of descriptive statistics is the decennial survey in some countries, in which all residents are requested to provide such information as age, sex, race, and marital status. • Example: presenting students marks in graphs or tables • The data obtained in such survey can then be compiled and arranged into tables and graphs that describe the characteristics of the population at a given time.
2 Inferential statistics (الإستدلالي (الإحصاء الإستنتاجي • Comprise those methods concerned with the analysis of information from a sample or a subset of data to draw conclusions regarding the population or the entire set of data. • An example is an opinion survey, such as the pre-election Poll, which attempts to draw inferences as to the outcome of an election. • In such survey, a sample of individuals (frequently fewer than 2000) is selected; their preferences are tabulated, and inferences are made as to how millions of persons would vote if an election were held that day.
Sources of Data • 1. Routinely kept records: Hospital medical records, for example, contain immense amounts of information on patients. • 2. Surveys: Data is gathered by asking people questions. Followed when the data needed are not available from routinely kept records. • For example, if the admission forms do not contain a question on mode of transportation, we may conduct a survey among patients to obtain this information. • The study of the psychological effects of explosion of the atomic bomb on the inhabitants of Hiroshima and Nagasaki is another example of a survey.
Methods of survey: • Personal interview • Telephone interview • Questionnaires 3. Experiments: for example, the effect of specific medication on specific disease. 4. Observations attained by naked eyes or through video camera,…….. etc. 5. External sources: e.g. published reports, commercially available data banks, or the research literature.
Variables • The variable is the characteristic of the people or object included in a study to be measured or observed. Examples include: Age, weight, height, marital status, or blood group • 2 Types of Variables • Qualitative (independent) variables متغيرمستقل وثابت • Variables that yield observations on which individuals can be categorized according to some characteristic, examples include; occupation, sex, marital status, and education level. • Quantitative (dependent) variablesمتغير تابع ومتغير • Variables that yield observations that can be counted or measured, example include; weight, height, and serum cholesterol.
Quantitative variables can be further classified as discrete or continuous. • Discrete variablesالمتغير المنفصل • يأخذ قيما قابله للعد • A variable is called discrete if it has only a countable number of values. Usually these will be whole numbers (integers) such as 3, 5, 9, and 23 and none in between (i.e. they usually don’t have decimal values). • For example, the number of patients in a hospital may be 178 or 179, but it cannot be any value between these two. • The number of syringes used in a clinic on any given day. • The number of children in one family • The number of times that you visited a doctor. • The number of missing teeth
Continuous • المتغير المتصل • يكون تأثيره على فترات متواصله • ex. Temperature can be increased or decreased gradually not suddenly). • They are measured on some scale in terms of some measurement unit such as kilograms, meters, mmol/l, degree Centigrade, weight, height, length…..etc. • Example: a researcher can study the effect of sex (independent variable) or income (continuous variable) on education progress(Qualitative (dependent) variable).
Scale (or Level) of Measurements ((مستويات القياس • Giving a numerical value for observations or things seen • There are four levels of measurement or scales of measurements into which data can be classified. • These are: nominal إسمي, ordinal ترتيبيّ, interval المسافي, or ratio المسافي.
1• Nominal إسمي يستخدم لتحديد هوية الافراد او العناصر • The lowest level of the measurement scale types is the nominal scale. Nominal scale data are divided into qualitative categories or groups, such as male/female, black/white, well/sick, child/adult, and married/ not married. • Numbers, when used, are used as labels only and have no numerical meaning such as telephone or car numbers. Nominal scale data cannot be arranged in an ordering scheme. • Arithmetic operations of addition, subtraction, multiplication, and division are not performed for nominal data.Example: Gender Data • Male, Female
2• Ordinal مقياس الرتبه والترتيب - ترتيبيّ • Data at the ordinal level may be arranged in some order, but actual differences between data values either cannot be determined or are meaningless. • For example, class may be ranked 1st/2nd/3rd. • There is no information about the size of the interval • no conclusion can be drawn about whether the difference between the first and second student is the same as the difference between the second and third. • More examples include, the degree of pain (severe, moderate, mild, none), the age group (baby, infant, child, adult, geriatric).
3. Interval مقياس المسافة • For example, on the Celsius scale the difference between 100 and 90°C is the same as the difference between 50°C and 40°C. • However because interval scales do not have an absolute zero, ratios of scores are not meaningful: it is not correct to say 20 °C is twice as hot as 10 °C or 100°C is not twice as hot as 50°C, because 0°C does not represent the point at which there is no heat, but the freezing point of water. • Calendar times are also interval measurements, since the date 0 A.D. does not signify “no time”.
Also, An IQ “Intelligent Quotient” of zero would not mean no intelligence at all, but serious thinker or problem in using materials of the test. • Interval-level data have no absolute zero point or starting point. • Consequently, differences are meaningful but ratios of data are not.
• Ratio النسبي • A ratio scale has the same properties as an interval scale; but because it has an absolute zero, meaningful ratios do exist. • Most biomedical variables form a ratio scale: weight in grams or pounds, time in seconds or days, blood pressure in millimeters of mercury and pulse rate are all ratio scale data. • The only ratio scale of a zero pulse rate indicates an absolute lake of heartbeat. • Therefore, it is correct to say that a pulse rate of 120 is twice as fast as a pulse rate of 60. More examples include; height, heart beat,…………..etc
2 Populations and Samples ( (المجتمعات و العينات • The term population refers to a collection of people or objects that share common observable characteristics. • For example, a population could be all of the people who live in your city, all of the students enrolled in a particular university, or all of the people who are troubled by a certain disease (e.g., all women diagnosed with breast cancer during the last five years). • Generally, researchers are interested in particular characteristics of a population, not the characteristics that define the population but rather such features as • height, weight, gender, age, heart rate, and systolic or diastolic blood pressure.
A sample is a subset of data selected from a population. The number of elements in the sample is called the sample size.The primary objective for selecting a sample from a population is to draw inference (conclusion) about that population.Parameters and statistics • A parameter is a characteristic of or a fact about a population.• A statistic is a characteristic of or a fact about a sample.SamplingProcedure by which some members of a given population are selected as representatives of the entire population
Why do we sample? Why not study the entire population? • There are many reasons to study samples instead of populations: • A study of an entire population is impossible in most situations because the population may be hypothetical (e.g. patients who may receive a treatment in the future). • less expensive than studying an entire population. • It is labour-intensive to study the entire population • more quickly than populations. • Some testing is inherently destructive: We can't drain all the blood from a person and count every white cell.
How Samples Are Selected? • Simple random samples العينة العشوائية البسيطة • It is a sample drawn so that every • element in the population has an equal probability of being included. • There are two ways for selecting simple random sample • 1. Lottery method • Procedure • - Number all units • - Randomly draw units • Example: evaluate the prevalence of tooth decay among the 800 children attending a school
List of children attending the school • Children numerated from 1 to 800 • Sample size = 40 children • Random sampling of 40 numbers between 1 and 800
2. Random number table (described at the end) Systematic sampling طريقة العينة المنتظمة We randomly select a first case and then proceed by selecting every nth case. Where n is determined by dividing the number of items in the sampling frame (a complete list of people or objects constituting thepopulation) by the desired sample size.
Example: Consider theabove children example andselect a sample size of 40. • Solution: Divide the 800 children by 40(sample size) = 20, so every20th children is sampled. Inthis approach, we must selecta number randomly between1 and 20 first, and we thenselect every 20th children. • Suppose we randomly selectthe • children with ID number8. • Then, the systematicsample • consists of children with ID • numbers 8, 28, 48, 68, 78, • and so on; eachsubsequent number is determined by adding 20 to the last IDnumber.
Stratified Sampling طريقة العينه الطبقية • Strata are groups or classes inside apopulation that share a common characteristic. • Procedure for selecting of stratified sampling • The population is first divided into at least two distinct strata or groups. • Then a simple random sample of a certain size is drawn from each stratum. • The groups or strata are often sampled in proportion to their actual percentage of occurrence in the overall population. • Combine results of all strata.
This stratification results in greater representativeness. For example, instead of • drawing one sample of 10 people from a total population consisting of 500 blackand 500 white people, two random samples of five could be taken from eachracial group (or stratum) separately, thus guaranteeing the racialrepresentativeness of the resulting overall sample of 10. • Other strata might bemen or women, department, location, age, field of study and so on.
Cluster Sampling طريقة العينه العنقودية • In cluster sampling we begin by dividing the demographic area into sections. • Then we randomly select sections or clusters. • Every member of the cluster is included in the sample. For example, in conducting a survey of school children in a large city, we could first randomly select 4 schools • and then include all the children from each selected school. • This technique is more economical than the random selection of persons throughout the city.
Convenience Sampling ( (العينة الميسرة • Convenience sampling is just what the name suggests: the patients or samples are selected by a random method that is easy to carry out. • Some researchers refer to these types of samples as “grab bag” samples. • Ex: MOH evaluation • Example, when studying patients with a particular clinical condition, • we may choose a single hospital, and investigate some or all of the patients with the condition in that hospital. • Which is not representative to the whole population • Convenience sampling is very easy to do, but it's probably the worst technique to use.
For example suppose a catheter ablation treatment is known to have a 95% chance of success. • That means that we expect only about one failure in a sample of size 20. • However, even though the probability is very small, it is possible that we could select a random sample of 20 individuals with the outcome that all 20 individuals have failed ablation procedures. • Results of studies based on convenience samples are descriptive and may be used to suggest future research, but they should not be used to draw inferences about the population under study.
Judgment Sampling ( (العينة الهادفة • The person most knowledgeable on the subject of the study selects elements of the population that he or she feels are most representative of the population. • Example: A reporter might sample three or four ministers, judging them as reflecting the • general opinion of the government.
How to select a Random Sample? • One of the easiest ways to select a random sample is to use a random number table. • Such tables are easy to find; they are in many statistical texts and mathematical handbooks. • Many calculators and computers also generate random numbers. • A portion of a random number table is reproduced in Table 2.1. • Random number tables are prepared in such a way that each digit gets equal representation.
Selecting a random sample involves three steps: (1) Define the population. (2) Enumerate or number it. (3) Use a random number table to select the sample.
Example: • Select 10 persons from a population of 83 cases in a hypertension (high blood pressure) study (see Table 2.2). • Solution: • Observe that the population is clearly defined: 83 cases classified according to their diastolic blood pressure, sex, and dietary status. • Also note that the cases have been numbered arbitrarily from 1 to 83.
If the random number table covered four pages, you might flip a coin twice and arbitrarily agree to assign HH (two heads) to page 1, HT to page 2, TH to page 3, and TT to page 4. • Suppose you flip heads on both tosses (HH); you then turn to page 1. • To choose an arbitrary starting place, you could blindly stab at row 19 and column 31. (This procedure is illustrated in Table 2.1.) The row-column • intersection of the starting place should be recorded, just in case you wish later to verify • your selection and hence your sample. • Next, read the two-digit numeral that falls at that spot.
Why use two digits? • Because sampling frame is identified by two-digit numbers. The first number selected is • 24. By advance agreement, you could proceed by • reading down the column: 66, 29, 7, 97, and so on. • Alternatively, you could agree to read the table in • some other reasonable way, • say from left to right. • Whatever the pattern you choose, • you cannot change it during the • selection process.
Continuing to read down the columns, you would select the individuals in the table (to right). • Why was number 97 excluded? • The answer is simple: The sampling frame defines • only numbers 01 through 83, thus disregard all • others. • A corollary problem is the duplication of a number already selected; in practice, the duplicate is simply ignored.
Simple random sampling is one of the statistician's most vital tools and is used in countless applications. • It is the basic building block for every method of sampling, no matter how sophisticated. • Occasionally however, it may be uneconomical or impractical to implement a random selection scheme that requires details of the entire population.
Table 2.2 Hypertension Study Cases by Diastolic Blood pressure, Sex, and Dietary Status
3 Organizing and Displaying Data • Any survey or experiment yields a list of observations. • These need to be organized and summarized in a logical fashion so that we may perceive the outcome clearly. • Tables and graphs are popularly used to organize and summarize data and description of data.
A. Frequency Tables/ Frequency Distributions • Considerable information can be obtained from large masses of statistical data by grouping the raw data into classes and determining the number of observations that fall in each of the classes. • Such an arrangement is called a frequency distribution or frequency table التوزيعات التكرارية. • Frequency table may be the most convenient way of summarizing or displaying data. • The types of frequency distributions that will be considered here are categorical or qualitative frequency distributions, and grouped frequency distributions.
Categorical frequency distributions • represent data that can be placed in specific categories, such as gender, hair color, or blood group. • Example: The blood types of 25 blood donors are given below. Summarize the data • using a frequency distribution.
Solution: We will represent the blood types as classes and the number of occurrences for each blood type as frequencies. The frequency table (distribution) in the following table summarizes the data.
Grouped Frequency Distributions • A grouped frequency distribution is obtained by constructing class intervals فئاتfor the data, and then listing the corresponding number of values (frequency count) in each interval. • Tables 3.2 and 3.3 are examples of frequency tables, constructed from the systolic blood pressure readings (by smoking status) of Table 3.1.
How to construct a frequency table? • 1. Arrange the data into an array, a listing of all observations from smallest to largest in order to determine the interval الفئهspanned by the data. • We find that the blood pressure interval for smokers for example is 98-208.
2. Determine the range المدى(R) from the difference between the smallest and largest value in the set of observations i.e. R = largest data value – smallest data value • = 208-98 =110 mm. • 3. Divide the range into a number of equal and non overlapping segments called class intervals طول الفئه. • Important Note • The number of intervals in general should range from 5 to 15.