260 likes | 402 Views
Data Mining&Business Planning of Engineering/Research Projects. Presentation 4 Dr. Gá bor Pauler , Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail: pauler@ t-online.hu. Primary quantitative research: Survey Definition, Purpose, Methods, Problems
E N D
Data Mining&Business Planning of Engineering/Research Projects Presentation 4 Dr. Gábor Pauler, Associate Professor Faculty of Sciences, University of Pécs Tel:30/9015-488 E-mail:pauler@t-online.hu
Primary quantitative research: Survey Definition, Purpose, Methods, Problems Sampling Plan Practical examples Miracle Roof Ltd. CarSculpturers Computing Gross- and Net Sample Size, Sampling Rate Format design of Questionnaires Types of questions Open question Closed question Single-choice question Multi-choice question Cardinal question Question matching Field names method Question numbering method Conditions of valid response Understanding the question Knowing the answer Remembering the information Thoughtful filling Willingness to answer Page break rules Technical question sequence Auxiliary parts of mail-in survey References Content of the Presentation
Survey: Definition, Purpose, Methods 1 • Besides qualitative and quantitative methodology, Marketing Research tools can be distributed by origin of data: • Primary Research (Elsődleges Kutatás): Data collection is designed by ourself: More exact More expensive • Secondary Research (Másodlagos Kutatás): Designed by somebody else: Less exact for our purose Less expensive • Survey (Kérdőíves lekérdezés) is the most important quantitative primary research tool: • Based on a Sampling Plan (Mintavételi terv), we query the Target Market (Célpiac) defined in Marketing Mix as Base population (Alapsokaság), taking Representative, Non-retrieve Sample (Reprezentatív, nem visszatevéses minta): sample is proportional to base population in terms of most important Socio-Demographic features, and everybody can be queried only once • It is important because only from this kind sample we can Infer (Következtet) behavior of base population, which is the main purpose of Survey. • Why we ask a sample even if we could make Full query(Teljes lekérdezés)? • Querying everybody would take too long time and too much money • Full query would not give much more exact info than a 10% sample, because further respondents give less and less new information: this is called Diminishing marginal return of information (Az információ csökkenő határhaszna) • Tipical industrial market survey in 10 million populated Hungary has 1000 respondents (1/10000)
Survey: Definition, Purpose, Methods 2 • In the sample, we query a Questionnaire (Kérdőív) with standardized format and content from Observed Objects/Cases (Megfigyelt egyedektől): • Content design (Tartalmi tervezés) of questionnaire bases on: product idea Brainstorming, secondary data collected about competitiors, results of Focus Group: there can be Obligatory (Kötelező) questions for everybody or Optional (Opcionális) depending on conditions • Format design (Formai tervezés) of questionnaire depends on the Survey medium (Lekérdezési eszköz) selected. There is no one best Survey medium, they have mutual advantages and disadvantages: • Answers given on questions form Variables/Fields (Változó/Mező) of a Statistical/Database Table (Statisztikai/Adatbázis tábla): • At Electronic Forms (Elektronikus űrlap) Recording (Adatrögzítés) and Verification (Adatellenőrzés) is automatic • At Paper-based Survey (Papír-alapú lekérdezés) it can be still manual
Problems of questionnaire design • Purpose of questionnaire design is to collect maximal quantity and quality information with minimal cost. There are several problems reaching this: • A questionnaire can be designed by the bosses blonde sec- retary in a piece of paper in 20 minutes: „Nusika, honey, writa’ couple of questions!” (typical Hungarian method) • Alternatively it can be designed by well paid professionals for two weeks before it goes on survey. • The problem is that in the first case, it won’t work, and you waste 2.5-3MFt cost of a 1000 respondent survey in 20 minutes! • Questionnaire is not a flimsy set of questions written in a piece of paper, but a Form (Űrlap), which is a Graphic User Interface (Grafikus felhasználói felület) of database table(s): even it is paper-based, it should comply strict data consistency requirements, otherwise, the database will not work at all! • Therefore it is important to involve database and statistic professionals into its design besides psychologists, sociologists: it is almost impossible mission for data miners to figure out from a bad surveys’s data, what was the original opinion of respondents. Let’s see what is necessary for good design: 1.Do you like when it is hot and red? yes/no 2.Size does matter for you also, or it can be small? yes/no 3.What type you would like?___ 4.Wouldn’t you rather buy Fiat 500 thats so cutie? yes/no
Sampling Plan: Practical example: Miracle Roof As we already mentioned in Presentation3, demand estimation necessary for quantitative business plan- ning should be based on a Representative Sample (Reprezentatív minta) of prospective customers identified by brainstorming and focus group. It can be ensured by correct Sampling Plan (Mintavételi terv): Eg.in Miracle Roof Case Study,target market is iden-tified as roof spaces above 100m2 area, owned by public or private real estate companies (Ingatlancég) with capitalization between EUR 18M-30M Aim of the sampling plan (Mintavételi terv) is to create a sample,where most important properties of respondents (Válaszadók) are representative to basic population: In the first step, we conduct research in specialized databases of a given area (eg. Hungarian Chamber of Architects (Magyar Építészkamara) http://www.mek.hu) to get the total number of prospective customers in tar-get area (eg.: 10,175 buildings), and their frequency distribution (Gyakorisági eloszlás) by their most impor-tant properties: Ownment:Public|Private (Tulajdonlás: önkormányzati|magán), Capitalization, M EUR (Alap-tőke, Millió EUR), Roof area, m2(Tető alapterület, m2) Then we define a filter about our target market (eg. EUR18M ≤ Capital ≤ EUR30M AND 100m2≤ Area) Then we sum up number of prospective cutomers comp-lying with the filter: this is Target Market Size (Célpiac-méret)(eg.: 504 buildings) Then we compute Frequencies of property values inside filter (Szűrőn belüli arányok) (they sum up to 100%) Then we compute Cross-products (Descartes-szorzat) of frequencies of property values to get quotes (Kvóták) should be kept in sample: (Eg. 49%(Public)× 35.7% (EUR21M-25M)×42%(500-750m2) = 7.4% of sample) To get at least 10%sampling rate (Mintavételi arány) S=100% S=100% S=100% × × × × × × we should collect at least 504×0.1 ≈ 51 valid responses from house ow-ners by interwievers (it means cost of several times more contacts)!
Sampling Plan: Practical example: CarSculpturers Inc. • They defined Target Market (Célpiac) of their products (see:Practice3):„Hungarian, 18-30 years old, min. secondary educated males/females” • From Hungarian Bureau of Statistics (Központi Statisztikai Hivatal) (http://www.ksh.hu ) they queried full population of target area 10,175,000 and distribution of basic Socio-Demograhic (Szocio-demográfiai) features: Gender(Nem), Age (Kor), Education(Képzés) • Then they set up a Filter(Szűrőfeltétel), which com-bination of feature Values(Értékek) are in the target market (Eg.18 ≤ Age ≤ 30 AND Educat ≠ Primary) • Then they determined Target Market Size(Célpiac Méret) as number of people falling into the filter from database table (example SQL code): Select Sum(Nepesseg) (4.1) From KSHTable Where Age Between(18,30) And Educ<>”Primary”; = 504000 • Then they computed Partial Relative Frequencies(Változónkénti szűrőn belüli relatív gyakoriságok) of values within filter (their sum is 100% at all features) • Then they combined all selected values of features, and as Cross-Product (Keresztszorzat) of their relative frequencies, they computed the Quotes(Kvóta): 49%(Male)×35.7%(21-25years)×42%(Mature)=7.4% (4.2) • Quota proportions should be kept in sample! (Respondents over quota are put into reserve data storage: in case a response turns out to be very inconsistent, it can be replaced with 1 from reserve) S=100% S=100% S=100% × × × × × ×
Sampling Plan: Gross- and Net Sample Size, Sampling rate • As quotas are only relative measures, in the next step we have to determine Net Sample Size (Nettó mintaméret): minimal number of valid responses necessary. There are three methods to determine it: • Simple minimum criteria: in most statistic methods number of observations should be at leas 5-10 times the number of variables used (Eg. if you analyze 13 variables together, you need 13×10=130 observations) • Statistical methods setting up Critical values (Hibahatár): these methods are more exact but require preliminary sample data from tartgeted base population, which is usually not available or too expensive at most market research surveys for small businesses • Rules of thumb (Hüvelykujjszabályok): the standard market research sample size in the 10M populated Hungary is 1000 (1/10000 rate) • Gross Sample Size (Bruttó mintaméretet): number of Respondent Contacts (Válaszadói kontaktus) with Unit Cost (Egységköltség) necessary to collect valid responses of Net Sample Size. Gross is much more than Net because of severe losses, making the survey even more expensive tool: • Refused responses (Válaszmegtagadás): depending on survey medium, response ratio can be very low • Invalid responses (Érvénytelen válaszok): even if there is a response it can be so inconsistent and carelessly filled that it is unusable Gross Sample Size = Net Size × 1/Response ratio,% × × 1/Correct filling ratio,% (4.3) Total Cost of Survey, $ = Fixed Cost of Infrastructure, $ + + Unit Cost,$/unit × Gross Sample Size (4.4) • Also you should record the Sampling Rate (Mintavételi arány) for further use: Sampling rate,% = Net Sample Size / Size of Target Market (4.5)
Primary quantitative research: Survey Definition, Purpose, Methods, Problems Sampling Plan Practical examples Miracle Roof Ltd. CarSculpturers Computing Gross- and Net Sample Size, Sampling Rate Format design of Questionnaires Types of questions Open question Closed question Single-choice question Multi-choice question Cardinal question Question matching Field names method Question numbering method Conditions of valid response Understanding the question Knowing the answer Remembering the information Thoughtful filling Willingness to answer Page break rules Technical question sequence Auxiliary parts of mail-in survey References Content of the Presentation
Format Design of Questionnaires: Types of questions 1 • To show different question types, we will use CarSculpturers Questionnaire as example and the underlying database CarSculpturersDatabase.mdb • Open Question (Nyílt kérdés): respondent can answer anything on that • Example:„If you buy a car, which brand and type you select?:_______” • On paper-based questionnaire: it is marked with underscore chars: _____ • On electronic forms: it is a TextBox(Szövegdoboz) control: • In the database:it is Text/String type field with 128 or 256 character length • Scale type:Nominal(Nominális): matching values can be counted:#, but we cannot: sort them meaningfully: ↓↑, compute their difference: ± or ratio: / • We should minimize open questions as it is laborous to process them: • Someone has to read through all answers given in database table • Then create groups from very similar answers: Eg: Fiat Stilo, Audi • Then correct misspellings, spaces, capitals to enable automatic processing: Eg: Fiat stilo, fiat stiol, FiatStilo, fiat stilo Fiat Stilo • Even after this you only have a less informative nominal scaled field! • There are 2 exceptions, when you should definitely use open question: • Motivational (Motiváló) question: if the questionnaire is lenghty and filling is exhausting, or we ask confidential data, you can motvate respondent with: „Please, write your own ideas about it:_____” But, do not process this! (Most cases they just repeat ideas already there) • Extra Alternative (Extra Alternatíva) for Closed Question (Zárt kérdés): if you are not fully familiar with the topic at questionnaire design (Eg. latest alternative music styles), and even the focus group could not identify all important alternative answers for a question, then you can put an extra „Other:____” alternative there as open question. Be care-ful: in the database it will form a string fieldse- parated from the other answers numeric fields! 37. What is your favourite music? (You can check more) 1□Jazz 2□Techno 3□Blues 4□Funky 5□Other:___________
Format Design: Types of questions: Closed questions 1 Group35 • Closed/Alternative Question (Zárt/Eldöntendő kérdés): respondent can select from finite number of Alternatives (Alternatíva) identified by 1..n numeric code: • Single Response (1×-es választós) question: 1 alternative can be chosen: • Example:„How important is powerful engine?(Please check one)” 1ONon-important 2OSlightly 3OModerate 4ORather 5OFairly 6OAbsolute • On paper: numbered rounded boxes with Value Labels (Értékcímke): • On form: it is group of Radio Button (Rádiógomb) controls: if one is turned on, others are turned out. Eg. in Visual Basic group is identified by .Group property of controls, in Borland Delphi they should be put in the same Radio Button Group (Rádiógomb-Csoport) invisible frame: • In database: it is an Integer type field – all linked radio buttons in the group write there their different numeric code if they are switched on. • Numeric codes and their value labels are stored in a separate small Code/Lookup/Master Table (Kód/Kinézegető/Törzs Tábla) • Scale type: most of the time it is Ordinal(Ordinális): values can be counted:#, sorted: ↓↑, but their difference: ± or ratio: / does not make any sense, even if they are numeric values: Eg. 6:Absolute important – 4:Rather important ≠ 2:Slightly important, this is madness!!! • Always code and list alternatives in strictly ascending order, Eg: Residence: 1OVillage, 2OTownship, 3OTown, 4OCity this way you can compute their average for a group (eg. 3.2 means living mostly in town), althogh it is only a rough estimation. (eg. the correct solution would be averaging number of residents of their hometowns, but most of the time respondents cannot give that)
Format Design: Types of questions: Closed questions 2 • If you ask degree of dis/agreement or evaluation of something, always use Even Numbered Scale (Páros Fokszámú Skála) (1..4, 1..6) to avoid Centering Effect (Középre húzási effektus): as people dislike to express radical opinions, if there is middle value in the scale, inproportionally high number of responses will be concetrated there, distorting all statistical analysis later: 81. How much do you like 0% Customer Loans? (Please check one) 1o I hate them 2o Very dislike 3o It depends on 4o Very like 5o I’m absolute fan • There is only one exception from even numbered scale-rule: if we ask large number of evaluation questions (Eg. 10 competing products × 7 features = 70 fields!!!), there is a big danger that the respondent finds it difficult, get bored and skips it. It is much more easy to fill it if we use well known School Grades (Iskolai Osztályzatok). Be aware of that they depend on country (Eg. Hungary: 1:bad..5:good, Germany: 5:bad..1:good, USA: F:bad..A:good), you should clarify it for the respondent!
Format Design: Types of questions: Closed questions 3 • Never use 8,10,12 or more scale values – even if it would be more exact – because most people cannot differentiate them, and will just use the very extremes and middle of the scale! 81. How much do you like 0% Customer Loans? (Please check one) 1o Absolute dislike 2o Mostly dislike 3o Less dislike 4o Mildly dislike 5o Tiny dislike 6o Tiny like 7o Mildly like 8o More like 9o Mostly like 10o Absolute like • This is because people cannot easily process more information in the same time, than the lenght of their Short Term Memory, STM (Rövid Távú Memória, RTM): • A part of brain can store 5-7-9 complex objects (sounds, pictures, chessgame status, etc.), • As electric signals • Valid for 5-8secs, • With fast retrieve time 0.05-0.1sec • Storing requires low energy consumption • Despite Long Term Memory, LTM (Hosszú Távú Memória, HTM): • Storing 0.5-1M objects, • As chemical signals, • For 20-50 years, • With retrieve time 1-3secs, • Storing requires high energy consumption • Object in STM can be associated both content of STM and LTM, but objects of LTM can be associated with each other only through STM, directly not!
Format Design: Types of questions: Closed: Multi-response • Multiple Response (Többszörös feleletválasztós) question: the respondent can select more alternatives, it should be stated very clearly. Example: 29. Which media do you follow regularly? (You can check more) 1TV 2Radio 3Internet 4Newspapers 5Magazines 6Friends 7Other • On paper: alternatives are numbered boxes with value label: 2Radio • On form: alternatives are separate Checkbox (Bejelölő Doboz) controls writing 0/1 or True/False values into: • Database: alternatives form group of separate binary{0,1} fields called Multi Response Variable Set (Többszörös feleletválasztós változóhalmaz): • For easier processing, we start their field name with the same Prefix (Előtag) derived from question text (Eg.Media usage:MediTv, MediRadi, MediInte, MediNewp)This is necessary because some database and statistic tools list all fields of database in alphabetic order: if they do not have the same prefix, they are scattered all over in a lenghty field list • Scale type(Skálatípus): binary fields are special - can be any type of scale: • Nominal (Nominális):#,↓↑,± ,/ Eg. Names or ID numbers:not quantity! • Ordinal (Ordinális): #,↓↑,± ,/ Eg. Education levels: unequal stages • Interval (Intervallum):#,↓↑,± ,/ Eg. Time: equally paced but no 0 point • Ratio (Arányskála): #,↓↑,± ,/ Eg. Milk,pints it has absolute 0 point So we can perform any operations with them
Format Design: Types of questions: Cardinal • Cardinal Question (Mennyiségi kérdés): asks for a quantity in a given Measure Unit (Mértékegység), don’t forget to clarify it: • Example: 46. Base price of car selected: ________ HUF • On paper: it is a filling space marked with underscore characters:_____ • On form: it is a TextBox (Szövegdoboz) control, which auto-checks that the given value is numeric (usually an .IsNumeric = True property or in-code function IsNumeric()) and in the pre-set range (setting .UBound/.Max and .LBound/.Min properties): • In database: it generates single numeric field: • Discrete (Diszkrét): it can have only integer value: Eg. FamilySize, ChildrenNumber It can be stored as field types: • Byte: 0..255, consumes 1byte • Integer: -32768..32767, consumes 2bytes • Long: -2470M..2470M, consumes 4bytes • Continous (Folytonos): it can have fraction values, Eg. CarBudget, MHUF It can be stored as field types: • Single/Real: 7 valuable decimals, consumes 4bytes • Double/Float: 15 valuable decimals, consumes 8bytes • DateTime (DátumIdő): a special field type storing full date and time from 1753.01.01 to 2900.12.31 in 1/300secs packed into an a fraction number consuming 8bytes, where integer part shows number of days passed from 1900.01.01 and fraction part shows proportional part of day: Eg. 1900.01.02 12:00:00 = 1.5, 1899.12.31 18:00:00 = -0.5 • They are always auto converted between text and number format • Dates can be subtracted from each other showing physical time pas-sed: 1900.01.02 18:00:00 – 1900.01.01 12:00:00 = 1.25 • Scale type:Interval:#,↓↑,± ,/(Eg.cannot divide with date) or Ratio:#,↓↑,± ,/
Primary quantitative research: Survey Definition, Purpose, Methods, Problems Sampling Plan Practical examples Miracle Roof Ltd. CarSculpturers Computing Gross- and Net Sample Size, Sampling Rate Format design of Questionnaires Types of questions Open question Closed question Single-choice question Multi-choice question Cardinal question Question matching Field names method Question numbering method Conditions of valid response Understanding the question Knowing the answer Remembering the information Thoughtful filling Willingness to answer Page break rules Technical question sequence Auxiliary parts of mail-in survey References Content of the Presentation
Format Design: Question matching 1 • In case of Manual Recording (Manuális Adatrögzítés) survey data into database (eg. paper-based quetionnaire), it is very important that the recorder Nusika should be able to safely, easily, quickly match questions with fields, because if she writes answer codes in the wrong field it terribly messes up database. It is surprisingly exhausting continously cross-match a big questionnaire and a huge table. • Therefore, at manual recording, it is recommended that someone should pre-read loudly answer codes question by question, and a second person types it, but it doubles labour cost! • There are 2 methods to make manual cross-matching safe and secure: • Printing Field names of database in red on paper-based questionnaire close to corresponding questions (this is used mainly in clinical medicine research surveys): But this can disturb simple people during self-filling (Eg. mail-in survey) How big was the blood sedimentation rate after 2 days of treatment? (mm/h):___ PostTreatSediRate
Format Design: Question matching 2 • Question Numbering (Kérdés Számozásos) method: (used mainly in market research) questions are sequelntially numbered and matched with field number in database table. • However, multi-response questions can complicate this as they generate a group of binary fields, moreover, number of fields inside the group can change if we add/remove alternatives this will require renumbering the rest of the questions! • Therefore, to avoid all this hassle, all questions are numbered by 10 increments: Question 1 = Field 10, Question 2 = Field 20, etc. • Fields of alternatives are numbered within the 10 range belonging to the multi-response question: Question 1, Alternative 1 = Field 11, Question 1, Alternative 2 = Field 12, Question 2, Alternative 1 = Field 21, Question 2, Alternative 2 = Field 22, • After Test Query (Teszt Lekérdezés), when questionnaire is finalized we prepare a Variable List (Változólista) with: • Field number: Eg. 291 • Field name: Eg. MediTv • Field type: Eg. Binary • Field description: Eg. Media usage - TV • Value labels: Eg. 1:yes 0:no It is usually made in Excel (see CarSculpturersFieldList.xls) and used as a documentation designing the database table storing survey results (see CarSculpturersDatabase.mdb)
Format Design: Conditions of valid response 1 • There are 5 conditions of valid response building on each other sequentially: • Can respondent understand the question by his/her IQ and express rsponse? • 46.Are integration proceedings plan of EU compatible with your Ethnical identity? (Is it really understood by target population?) • We should word questions simply: • There should be Subject(Alany) + Statement(Állítmány) + Object(Tárgy) • Avoid Compound Sentences (Összetett mondat) • Start with question words: How much? Where? Whom? When?, • Do not use Technical Terms (Szakkifejezések) but describe them with common words: Eg. 46.Do you have ejaculation precox? 46.Are you usually already gone when the girl just starts to spin up? • Avoid Implicite Assumptions (Beágyazott feltételezés): 46. Will you buy it for your children? (But has she any child?) • Do not compound Interdependent(Összefüggő) questions because of lack of content design or to save space (very typical begginers error!!!). The result will be Self-Contradictory (Önellentmondó) and totally useless: 46. If I have time, I often read books at the weekend, but only if I dont get tired weekdays: 1o Not at all 2o Only Action novels 3o Moderately yes 4o Yes 5o Absolute • Did the respondent ever know the answer? • 46. How much % more butter spread you would buy, if its price had lowered by 1%?_ (One may react on price change, but cannot tell exactly. Thats why we need quantitative market research and business planner software!!!) • Don’t ask what respondent certainly does not know, because he/she will just Guess (Találgat) to complete questionnaire (Eg. to take part in sweepstakes) even if we tell clearly that it is not compulsory to answer! • Don’t suggest what is the „correct answer” with your question: 46. Do you also think that our electric toilet paper roller is a groundbreaking idea?__ (Responses will show our dreams, not the reality!!!) Are you kid- ding bro- tha?
Format Design: Conditions of valid response 2 • Does the respondent remember the info? • 46. How many butter spreads did you buy on Feb 21, 2004?___ (He certainly knew it that day, but now…) • Dont ask what respondents cannot remember because they will also just guess. Avoid Thinking-inside-the-box Effect (Csőlátás Effektus): for us, topic of questionnaire is much more important than for the respondent, and we are more knowledgeable about it. So be careful with „even a complete idiot should know this”-thoughts. Eg. If this issue comes up, always think about Johann Sebastian Bach. He was the biggest composer ever lived, but who the hell from your generation cares or even know about it, when you have Ice-T… • Dont force the respondent to make computations by head: 46. How much was the average $ off by product unit?___ (To answer this: 1. One should know that total $ off should be divided by number of product units puchased 2. One should know how to divide…) • Ask it in two separate questions: 46. How much $ off did you get in total?___ 47. How much product units did you purchase in total?___ • And let the database sofware to compute, its definitely easier: SELECT DollarsOff / UnitsBought AS AvgOffPerUnit FROM SurveyTable; (4.6) • Will the respondent fill questionnaire carefully? • Eg. Your interviewer goes to the next pub, orders brandy, and creates „virtual people” in his mind to fill questionnaires, to cash the money… • Eg. Respondent checks the middle answer at any question to get it through fast just to participate sweepstakes…
Format Design: Conditions of valid response 3 • These can be checked by Consistency Control (Konzisztencia ellenőrző) Question: we ask Logic Negation (Logikai Tagadás) of a previous question later with different wording.Their distance should exceed STM lenght(5-7)! 46. Did you take the sour pill daily? yes/no 97. I forget to take that damn pill on: ____ days Inconsistencies can figured out automatically with database queries: SELECT RespID FROM SurveyTable WHERE DailyPill=„Yes” AND ForgetDays>0; (4.7) • However consistency control questions eat filling time eat extra paper/form space do not provide any new information. • Therefore it is better to check math interdependencies of cardinal fields queried:Eg.Baseprice and extra spendig should not exceed car budget • Does the respondent want to answer at all? • 46. How frequently do you feel pedofile desires? ____ days (He knows it exactly, but won’t tell because he can be arrested for that) • Sensitive topics (sex, money, power) can be queried by Indirect (Körüljáró) questions scattered all over the questionnaire randomly (keep also at least one STM-lenght distance between them in question sequence): 9. I like surfing on internet:1:no..6:yes 16. I frequently share photos via internet with my friends:1:no..6:yes 25. Government should not check content of web pages:1:no..6:yes 33. In this dirty world, only children are really pure:1:no..6:yes 41. Mature women merciless, manipulative, and selfish:1:no..6:yes • These are harmless-looking questions standing alone. However there is a funny statistical method called Factor Analysis (Faktor Analízis) capable of showing hidden common motives behind them. This is where Art of Data Mining starts: you make somebody to tell what he did not want, unnoticed!
Primary quantitative research: Survey Definition, Purpose, Methods, Problems Sampling Plan Practical examples Miracle Roof Ltd. CarSculpturers Computing Gross- and Net Sample Size, Sampling Rate Format design of Questionnaires Types of questions Open question Closed question Single-choice question Multi-choice question Cardinal question Question matching Field names method Question numbering method Conditions of valid response Understanding the question Knowing the answer Remembering the information Thoughtful filling Willingness to answer Page break rules Technical question sequence Auxiliary parts of mail-in survey References Content of the Presentation
Format Design: Page break rules • On electronic forms: page breaking is more flexible: • We can use conventional page breaks, creating screen-sized pages • We can use one continous form with Scrollbar (Gördítősáv): this is good when respondent needs cross-check answers (Eg. Clinical surveys) But requires more intelligent respondent • We can give questions one by one with Previous/Next buttons: this way filling is under total machine control, which can help simple people But they can get bored with that • On paper: page breaks dictated by paper size (Eg. Europe:A4 210×297mm, USA:Legal 8in×11in) imposes serious format design problems: • Question and its responses should be on same page, you cannot place page break between them, this wastes certain amount of paper • Page Top Effect (Lap Teteje Effektus): respondents instinctly focus they attention on top of the pages, so you should group difficult questions there • If you have more than 2..3 alternatives at a question, they should be bro-ken into columns, otherwise paper is wasted (usual in psycho surveys): 46. How frequently the baby showed this behavior on last week? (Please check one) 1o Never 2o Very infrequently 3o Less than half of the time 4o Almost all time 5o All time • The questionnaire (with all auxiliary parts) should be broken into even number of pages, otherwise you waste one full page! • The questionnaire should not have too colorful design, because it is more expensive and diverts respndents attention: This is not celeb magazine with „Is your husband cheating you” test! $$$!
Format Design: Technical question sequence • It is important because incorrect sequence increases chance of: • Response Denial (Válaszmegtagadás), • Aborted Fill (Félbehagyott Kitöltés), • Corrupted Fill (Hibás Kitöltés) • Technical question sequence (A kérdések technikai sorrendje): • First we ask introductory, easy-to-answer questions: 46. Do you agree that most politicians are stupid and steal as rat? yes/no • The more specific, difficult and sensitive the question is, the later part we place it: if respondent already spent considerable amount of time to to fill easy-to-answer part, it is less likely, that he/she will refuse more sensitive questions at the end: 46. How frequently you have sexual contact in one month? ___ times (She knows, she should not answer it, but she has a fear that then she cannot win the wonder cruise ship ride for 2 persons) • Therefore, we always ask socio-demographic data at the end of the ques-tionnaire (despite official forms, where they are in the beginning, typical beginners error!!!) and income (asking only in 4-6 categories, not exact amount) is always strictly the last question: 100. How much is your monthly net income? (Please check one) 1o 0-300€ 2o 301-600€ 3o 601-1000€ 4o 1001-2000€ 5o 2001-5000€ 6o Above 5000€ • Questions should be grouped into Sections (Szekció) by their topic • We should put easy (even fake) questions between hard sections, Eg.: 29.Which media do you follow regularly? (You can check more) 1TV 2Radio 3Internet 4Newspapers 5Magazines 6Friends 7Other 29. Please, write your own ideas about it:________________________
Format Design: Auxiliary parts at Mail-in Survey • Cover Letter (Fedlap/Motivációs Levél): • Personal toned motivation letter based on the psychologic reaction that if some-one is asked jovially, then he may help • It is written in the name of market rese-arch firm, to prevent uncovering the client company of market researcher • Promotes filling and mail-in with sweep-stakes or small product sampler gifts • Bigger prize or gift is NOT BETTER! It just temptates fake fillers. Main prize should be: • Hungary: 3 times the monthly minimal salary • USA: 50% of monthly minimal salary • We should provide, how long it will take to fill (85% of real actual average time) • At the end, there is a personal data security statement, and reminder of the deadline of mail-in • At the footer of all pages: TURN NEXT! • Last page footer: Mail-in reminder again • We enclose: pre-adressed, pre-stamped envelope: do not just put there a separate stamp and envelope, because people will use it for other purpose!
References • Theory of questionnaire design: • http://www.hik.hu/tankonyvtar/site/books/b156/ch03s05s01.html • http://www.statpac.com/surveys/ • http://www.quickmba.com/marketing/research/qdesign/ • http://www.cc.gatech.edu/classes/cs6751_97_winter/Topics/quest-design/ • http://www.surveysystem.com/sdesign.htm • http://piackutatas.lap.hu/ • Questionnaire planner software: • Pocket survey: http://www.pocketsurvey.info/ 30 days shareware