370 likes | 749 Views
S AMPLING. OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling methods census. S AMPLING AND C ENSUS. collection methods for data Sampling any data collection that is not a controlled experiment
E N D
SAMPLING OUTLINE: sampling and census sampling surveys, frame, size probability and non-probability sampling methods census
SAMPLING AND CENSUS • collection methods for data Sampling • any data collection that is not a controlled experiment i.e. percentage of greenhouse gases in atmosphere above Winnipeg
SAMPLING AND CENSUS Census • survey whose domain is the characteristics of an entire population • any study of entire population of a particular set of ‘objects’. i.e. female polar bears in western Hudson Bay human residents of Heidelberg the number of Epacris impressa plants on a single hillside in Riding Mountain National Park
SAMPLING AND CENSUS • collect, analyse or study only some members of a population then we are carrying out a survey • aim is to make observations at a limited number of carefully chosen locations that are representative of a distribution • use sample to predict the overall character of the population – accuracy will depend on quality of sample
SAMPLING SURVEYS • done for several reasons: • costs less than a census of the equivalent population • they are carried out to answer specific questions, • sample survey will usually offer greater scope than a census(larger geographical area, greater variety of questions)
SAMPLING SURVEYS development of sampling survey: • state objectives of survey • define target population • define data to be collected • define the required precision and accuracy • define the measurement ‘instrument’ • define the sample frame, sample size and sampling method, then select the sample
SAMPLING SURVEYS • process of generating a sample requires several critical decisions to be made: • sample frame • sample size • sampling method • errors will compromise the entire survey
SAMPLE FRAME • if frame is wrongly defined, sample may not be representative of the target population. • frame might be ‘wrong’ in three ways: • contains too many individuals (membership is under-defined) • contains too few individuals (membership is over-defined) • contains the wrong set of individuals (membership is ill-defined)
SAMPLE FRAME Two-stage process: • divide the target population into sampling units i.e. households, trees, light bulbs, soil samples, cities, individuals • create a finite list of sampling units that make up the target population. i.e. names, addresses, identity numbers, # of 50 mL sample bottles
SAMPLING UNITS • member of a sample/sample frame • in geomatics – points, lines (transects) and areas (quadrats) i.e. measuring snow depth at 10 cm intervals along a 10 m line measuring all features that fall within 10 m of a line
SAMPLE SIZE • quantity is not better than quality • in statistics – sample size of 30 or greater is ideal • in geomatics – appropriate sample size is directly related to a distribution’s variability
SAMPLING METHOD • aim is to obtain a sample that is representative of the target population. • when selecting a sampling method, we need some minimal prior knowledge of the target population • how we actually decided which sampling units will be chosen makes up the sampling method.
SAMPLING METHOD • most sampling methods attempt to select units such that each has a definable probability of being chosen - probability sampling methods. • we can ignore probability of selection and choose samples on some other criterion – non-probability sampling methods.
NON-PROBABILITY SAMPLING • units that make up the sample are collected with no specific probability structure in mind i.e. units are self-selected units are most easily accessible units are selected on economic grounds units are considered to be typical of pop’n units are chosen without an obvious design
NON-PROBABILITY SAMPLING • considered inferior to other method - no statistical basis upon which the success of sampling method can be evaluated. • may be unavoidable – regard as a ‘last resort’ when designing a sample scheme.
PROBABILITY SAMPLING • basis is the selection of sampling units to make up the sample based on defining the chance that each unit in the sample frame will be included i.e. have 100 students, need 10 to fill out a survey, each student has a 1 in 10 chance or being selected (probability of selection is 0.1)
PROBABILITY SAMPLING • each time we apply the same method to the same frame, we will generate a different sample • concerned with probability of each sample being chosen, rather than with the probability of choosing individual units • number of probability sampling strategies
PROBABILITY SAMPLING Simple random sampling • simplest way • select n units such that every one of the possible samples has an equal chance of being chosen • generate a sample by selecting from the sample frame by any method that guarantees that each sampling unit has a specified probability of being included • how we do the sampling is of no significance (I.e. random number tables, dice, …)
PROBABILITY SAMPLING Simple random sampling
PROBABILITY SAMPLING i.e. 94407382 94409687 93535459 94552345 94768091 93732085 94556321 94562119 93763450 94127845 94675420 94562119 93763450 94127845 Use random number table to generate six random number between 1 and 14 4, 6, 7, 9, 11, 13
PROBABILITY SAMPLING Stratified Sampling • used when you suspect the target population actually consists of a series of separate ‘sub-populations’ • stratification is the process of splitting the sample to take account of possible sub-populations • stratified sampling – total pop is first divided into a set of mutually exclusive sub-pops/strata • sub-populations may be of equal sizes or not depending on their relative sizes
PROBABILITY SAMPLING Stratified Sampling • within each strata, select a sample usually ensuring that the probability of selection is the same for each unit in each sub-pop – stratified random sample i.e. national polls and rating surveys
PROBABILITY SAMPLING i.e. 94407382 94409687 94535459 94552345 94768091 94732085 94556321 93562119 93763450 93127845 93675420 93562119 93763450 93127845 First split pop into sub-pops (based on the second number in this example) Then sample from these sub-pops (three from each using a random number table – 1, 2, 5)
PROBABILITY SAMPLING Systematic Sampling • decide sample size from the population size; population has to be organized in some way i.e. points along a river, simple numerical order • simpler in design and easier to administer
PROBABILITY SAMPLING Systematic Sampling • choose a starting point along the sequence by selecting the rth unit from one end of the sequence • then take the rest of the sample by a number to r
PROBABILITY SAMPLING i.e. 94407382 94409687 94535459 94552345 94768091 94732085 94556321 93562119 93763450 93127845 93675420 93562119 93763450 93127845 First order the sample units (in this case decreasing numerical order) Next, select the first point (r value) – 2 Then take every third sample after this (2, 5, 8, 11, 14)
CENSUS • aim is to identify and record all members of a population • most countries routinely carry out a census on its population i.e. Canada – performs a census every 5 years (1981, 1986, 1991, 1996, 2001) • original function to enumerate for electoral purposes, but encompasses a large range of information about national populations
CENSUS • collects important information about the social and economic situation of people living in an area • Population Counts • Age, Sex, Marital Status, Families (number, type and structure) • Structural Type of Dwelling and Household Size • Immigration and Citizenship, Education, Mobility, Migration • Mother Tongue, Home Language and official/Non-Official Languages • Ethnic Origin and Population Group (visible minorities) • Labor Market Activities, Household Activity, Place of Work and Mode of Transportation • Sources of Income, Total Income and Family and Household Income • Families: Social and Economic Characteristics, Occupied Dwellings and Household Costs
CENSUS disadvantages of census: • time consuming - require years of planning • laborious - requires thousands of workers/volunteers • costly - millions of dollars to survey everyone
CENSUS Errors in census data: • people respond dishonestly due to lack of confidence in confidentiality • full accounting of residences is difficult to document (i.e homeless) • recruiting substandard people to conduct surveys
CENSUS REGIONS • a census consists of “enumeration” data • counts tabulated or ‘aggregated’ by geographic areas • census regions/enumeration areas are not distributed uniformly and vary in shape, size and orientation • Canada divided into 51,500 enumeration areas • census regions are defined by political boundaries and natural and cultural landmarks
CENSUS REGIONS Enumeration Area (EA) • smallest reported census area • canvassed by one census representative • 125-440 dwellings, depending on situation in rural/urban area Census Tract (CT) • represent urban or rural communities in CMAs and Cas • populations range between 2,500 - 8,000 Census Subdivision (CSD) • term applied to municipalities or equivalent
CENSUS REGIONS Census Division (CD) • areas intermediate between municipality (CSD) and province level • represent counties, regional districts, regional municipalities Census Metropolitan Area/Census Agglomeration (CMA/CA) • CMA and CA are very large urban cores together with adjacent integrated urban and rural areas • urban core population >100,000 for CMA, >10,000 for CA • CMA may be combined with adjacent CAs to form ‘consolidated CMA’ Federal Electoral Districts (FED) • area entitled to elect a representative member to the House
CENSUS REGIONS • aggregate census information within the boundaries of the data collection regions. • reduce costs • confidentiality • GIS concerns • census region totals are more abstract and spatially inaccurate • mask the true nature of population distribution
REPORTING METHOD • aggregated data reported as census region totals – data presentation is a count by region • also report census totals at region centroids • center of area – balance point for census region shape • center of population – averaging x and y coordinates of the individual pop`n.
CENSUS AND GIS • census represents a very important source of data for GIS because: • it provides data of use in many areas of human geography: social, economic, political • the census goes back to Confederation, so historical analyses can be performed • the census provides data in a large variety of readily-mapped spatial zones (eg CMA, county)