410 likes | 541 Views
Statistical concepts. Module 1, Session 2. Objectives. From this session participants will be able to: Define statistics Enter simple datasets once the data entry form is set up Recognise the type of each variable in a dataset Know some ways to summarise data of each main type
E N D
Statistical concepts Module 1, Session 2
Objectives From this session participants will be able to: • Define statistics • Enter simple datasets once the data entry form is set up • Recognise the type of each variable in a dataset • Know some ways to summarise data of each main type • Explain how statistical investigations deal with variability • Differentiate between descriptive and inferential statistics
Activities • This introduction • Entry of the data from the CAST survey • Discussion/presentation on statistical concepts • Using the data entered • And other case studies • The statistical glossary • For when you need to remind yourself about terminology
What is statistics - 1? From RSS webpage: 1. Statistics changes numbers into information. 2. Statistics is the art and science of deciding: • what are the appropriate data to collect, • deciding how to collect them efficiently • and then using them to give information, • answer questions, • draw inferences • and make decisions.
What is statistics - 2? 3. Statistics is making decisions when there is uncertainty. • We have to make decisions all the time, • in everyday life, • and as part of our jobs. • Statistics helps us make better decisions. 4. Statistics is NOT just collecting a lot of numbers • It is collecting numbers for a purpose
What is statistics - 3? From Wikipedia: 5. Statistics is a mathematical science pertaining to the • collection, • analysis, • interpretation or explanation • and presentation of data. 6. Statistics are used for making informed decisions • and misused for other reasons in all areas of business and government
What is statistics - 4? From the book “Statistics: A guide to the unknown”: 7.Statistics is the science of learning from data. Question 1 in the practical sheet From these 7 definitions – in the practical sheet • either chose the one you think is most appropriate • or make your own a) A one – line definition b) A longer definition
Data checking and entry – Question 2 • What can we learn from the data you collected? • Work in pairs or small groups • First check the data from the CAST survey • Check each others, not your own • Is it legible? • Can it be entered into the computer? • Is the response to the open-ended question clear? • Can the text be simplified? • If there are many points, ask the respondent to state which are the most important 2 or 3. • Brief notes (as a report) to be made • to establish the data are ready for entry
Data entry into Excel Just type the number. The label is automatic
Data entry and checking – Question 3 • The data are now entered • This can be a class exercise • on a single computer • Data is entered by someone else • for each respondent (never by themselves) • Then it must be checked • read it out • check by reading back • Put the record number from the Excel form • on your original sheet • or add your names as another field in the Excel sheet • Why might it be better to just have a number?
Data entry and checking • You should now have completed question 3 • On the practical sheet • How long to you estimate • For 1000 records to be entered?
Once the data are entered • Remember: “Statistics is the science of learning from data.” • To learn as much as possible • we must have confidence in the data • so they must be entered and checked well • This is what we have done in the groups • Now the data are ready for the analysis • Before that, look at some other data sets • Look for the common points • That apply to all the sets • and look for differences
Types of data - 1 • The analysis depends on the type of data • What are the types in the CAST qusetionnaire? • For questions 1 to 6 • Your answer was one of 5 categories • e.g. 1: Strongly agree, 2: Agree, … 5: Strongly disagree • These categories have an ordering • from strongly agree to strongly disagree • This type of data are called • categorical • or factor • or qualitative • With the ordering, they are sometimes called • ordered categorical data
Types of data - 2 • The last question in the survey • was a sentence or two that was written • This is also an example of qualitative data • It is an open-ended response • These data can be reported • and reporting the sentences can be very useful • So it is good if they are entered as they stand • To summarise perhaps the responses can be coded?
Coding open-ended questions –Question 4 • This is question 4 in the practical sheet • Looking at the responses in your groups • Could you code them? • What different codes would you have? • How would you enter the codes? • Might you lose anything by coding • For a quick analysis • Could you enter the complete texts • And analyse the other columns • And then code later? • What might you lose by coding?
Coding and entering open-ended data • Discuss the suggestions for the codes. • If some points are made by many students then prepare a summary, • how many as a frequency • and as a percentage • With the small number of responses • there is no need to enter them into the computer • But discuss how it could be done • It is an example of a multiple response question • because respondents may give no points • or more than one point • If you ask for the most important observation • then it becomes a single qualitative response
Other data sets • Zambia rainfall data • Tanzania agriculture survey • Look for the layout of the data • is it the same as for the simple CAST survey? • Look for the types of data • Which are the qualitative variables? • are they ordered? • Which are the quantitative variables? • which of them are discrete? • and which are continuous? • have any been coded to become qualitative?
Discussion- Question 5 • The layout of the data • Was always the same! • In a rectangle • Each row is a record • There are as many records (rows of data) • as there were respondents, or students, or units • Each column is a variable • Variables can be qualitative • or they can be quantitative • Discuss which type they are • For each data sets • complete the tables in the practical sheet, question 5
Qualitative variables • They are categorical • They may be nominal, (which implies there is no ordering) • Give some examples from the Tanzania survey • They may be ordered – as in the CAST survey • Give an ordered example from the Tanzania survey
Examples of analysis – Tanzania surveyQuestion 6 • There are 3223 records, • but just take the 18 you can see in the figure • Count the values for Q0123 – head of household • There were 6 Females and 12 Males • So 2/3 of the 18 households had a male head • That’s about 70% • but percentages are a bit misleading with so few numbers • Now you give a similar summary for Q021 • type of agricultural household • And also Q3464 • how often did the household have food problems
Add a simple chart • A simple chart can also be sketched • Here is one by Excel • But a sketch can be “by hand” • Excel will be used for these tasks from Session 3
Examples of analysis – CAST survey Question 7 • Do a similar analysis of the CAST survey • To make it quick • each group could initially process just one question • then report the results to the class • Include a hand drawn chart • Sketch a simple bar chart • and include the numbers on the chart • as shown earlier
Quantitative variables- Question 8 • They may be discrete (whole numbers) • Give examples from the climatic data • And the Tanzania survey • They may be (conceptually) continuous • Give examples from the data sets • Also they may be coded into (ordered) categories • Give an example from the Tanzania survey
Examples of analysis – Tanzania survey • An analysis of the 18 values in Q3462 • The number of times meat was eaten last week • minimum = 0 • maximum = 5 • adding the values: total = 31, • so the mean = 31/18 about 1.7 times per week • Note: the mean does not have to be an integer • just because the individual values are whole numbers • Repeat this analysis • for Q3463 – times fish eaten last week • and HHsize
Data analysis • As the layout of the data is always the same • Once you know how to analyse one data set • You will have the principles to analyse them all • And we have just done one analysis! • You have seen that • The appropriate analysis depends on the type of data • So what are the principles • of analysing (summarising) data • of the different types?
The methods of analysis • How many? • are questions for qualitative variables • for example the CAST survey, the Tanzania survey • You used summaries • Like counts, or proportions or percentages • How large? • How variable? • are questions for quantitative variables • for example the climatic data or the Tanzania survey • We used summaries • Like averages, extremes and measures of spread
A toolkit for analysis • Different types of graph are also used • Qualitative data • “how many” • Quantitative data • how large • how variable
Statistics and variation • In the CAST survey - why not just ask one student? • In the climatic data - why not just use one year? • In the agriculture survey - why not just use one household? • Because there is variation between the responses • Remember this definition? • “Statistics is making decisions • when there is uncertainty.”
Variation is everywhere! • In the book “Statistics a guide to the unknown” • “Variation is everywhere. • Individuals vary • Repeated measurements on the same individual vary • The science of statistics • provides tools for dealing with variation” • So statistics is concerned with making sense from data, when there is variation
Fighting the curse of variation • To do good statistics you must • tame variation • fight the curse of variation • You have 2 main strategies for overcoming variation • 1. Take enough observations • In the Tanzania survey there were 3223 households just from this one region • 2. Measure characteristics that explain variation • Variation itself is not necessarily the problem • Variation you do not understand is the problem
An example: explaining variation • Take the CAST survey • Add a new record for an imaginary student • Make it VERY DIFFERENT to the existing records • So if most students were positive about CAST • Then make this record very negative, etc • You have added variation • Now what could you (should you) have measured • to explain this variation?
What you could have measured • This little survey only asked about CAST • It did not ask about you, e.g. • male/female • experience • age • computer access • etc • These measurements could help • to understand the difference with this new student • The Tanzania survey also asked about • Education • Possessions, etc • Why – to be able to understand/explain variation
Analysis and variation together • For statistical analysis you have: • summarised columns of data • i.e. summarised individual variables • You did this for qualitative and quantitative variables • To fight the curse of variation • You take measurements • So you add to the rows of data • That helps you to explain the variation • That’s statistics for you! • You analyse the columns, i.e. the variables • And you understand variability by looking at the rows
Types of statistics • Wikepedia says roughly: • Statistical methods can be used to summarize • or describe a collection of data; • this is called descriptive statistics. • In addition, patterns in the data may be modelled • and then used to draw inferences about the process or population being studied; • this is called inferential statistics. • Both descriptive and inferential statistics • comprise applied statistics.
Descriptive and inferential statistics • We have just done descriptive statistics • We will only do descriptive statistics in this module • The sample in the Tanzania agricultural survey • was 3223 households • That’s just under 1% of the households in the region • See the column called WT – with values like 137 • So each observation “represents 137 households • But with such a large sample • The inferences for the whole region • Will be quite precise • So most of what we need now is descriptive tools • In the later modules • we add ideas of inferential statistics
Glossary of statistical terms • Each subject becomes easier • when you understand the terms • A glossary is supplied • Called the SSC Statistical Glossary • It explains most of the terms • For the 3 levels of this course • So some terms may be new to you now • An example is on the next slide • You can print the glossary if you wish • But it is good to look on-line • Then all the terms in blue are links • So you can easily move about in the document
Example from the glossary • Descriptive statistics • If you have a large set of data, then descriptive statistics provides graphical (e.g. boxplots) and numerical (e.g. summary tables, means, quartiles) ways to make sense of the data. • The branch of statistics devoted to the exploration, summary and presentation of data is called descriptive statistics. • If you need to do more than descriptive summaries and presentations it is to use the data to make inferences about some larger population. • Inferential statistics is the branch of statistics devoted to making generalizations.
Can you now: Define statistics Enter simple datasets once the data entry form is set up Recognise the type of each variable in a dataset Know some ways to summarise data of each main type Explain how statistical investigations deal with variability Differentiate between descriptive and inferential statistics Learning objectives