How can we determine if Dr. Eric’s Amazing Wonder Tonic is any good? A bad test:

4. Designing an Experiment Dr. Eric’s Amazing Wonder Tonic • How can we determine if Dr. Eric’s Amazing Wonder Tonic is any good? • A bad test: • Wait until you have one of these symptoms • Take Dr. Eric’s tonic • See if your symptoms go away • We need to compare it with the null hypothesis • The question is not if you get better, but whether you get better than you would if you had not taken the tonic

Regression to the Mean “With proper treatment, one can cure a cold in seven days, but if you don’t do anything, it will hang on for a week” • Most illnesses (not all) will get better without treatment over time • If you are sick, and you wait a while, you will probably get better • If you get treated, you may associate the cure with the treatment • Which it might be, but it might not be also • Many forms of alternative medicine and healing rely on testimony from those who have been ill and recovered • Unfortunately, such testimonials have almost no place in science • Other than forming hypotheses

Controls • In order to test Dr. Carlson’s elixir, we need to have an experimental group and a control group To test if Dr. Carlson’s elixir cures headaches: • Find a group of people with headaches • Divide then into two groups • For the first group, the experimental group, we give them the Tonic • For the second group, the control group, we don’t give them Tonic • We then determine if their headache went away • By studying the rate of decrease in headaches of the experimental and control groups, we can determine if headaches are treated by Dr. Eric’s Tonic

What Makes a Good Control Group? The control group must be made as similar as possible to the experimental group • We aren’t necessarily talking medicine here • Choose the control group from the same population • Assign members of the population randomly to the two groups • If people (or even animals) are involved, members of the population should not know which group they are in • The experimenter (the one judging success or failure) should also not know which people are in which group • Experiments that fit these last two criteria are called “double blind” experiments

The Placebo Effect – An Anecdote • Many years ago, I was visiting my girlfriend (later my wife) • While at the vacation home, I got pretty severely sick • Probably allergies to mold in the house • Enough that I felt my health was in danger • Eventually, I asked her to take me to a hospital • Almost immediately I felt better WHY? • Allergies are affected by a lot of factors, many of them hormonal • Being sick and having no treatment created anxiety • Once I was making plans for treatment, my anxiety diminished substantially • My expectation of getting treatment made me feel better

The Placebo Effect • Treatment can make us feel less anxious – we feel like we are being taken care of • These mental states can improve our health • It is especially effective in situations where the measures of success are subjective • Pain, anxiety, etc. • But it also has real effects on treatment of many other illnesses • Heart disease, allergies, even cancer • Improvement could be due to expectation of improvement, rather than the remedy itself • This is the placebo effect • Expectations of worsening can similarly cause us to get sicker • The nocebo effect

Is Placebo Effect Real? • There is lots of evidence that the placebo effect is real • Example: Patients were put in two categories randomly • “You’ll feel better in a few days” • “Not sure what is wrong” • Improvement was substantially better for the first group • Clearly, the words themselves have no direct effect • 1950’s, patients felt pain in their heart due to poor blood flow • Heart surgery performed on one group • Incision (sham surgery) on second group • The sham surgery group actually did better • Almost any medicine can have a placebo effect 64% felt better 39% felt better 10/13 felt better 5/5 felt better

How Does it Work? • Is it plausible that non-medical treatments will work? • Telling someone they are likely to get better can reduce stress • Reduced stress leads to • Lower blood pressure • Relaxed muscles • Reduced pain • Stronger immune system • A stronger immune system can help you fight many diseases • Even diseases like cancer

Harnessing the Placebo Effect – Homework 2 • There are a variety of “treatments” that calm you down • Hypnosis, Listening to music, Meditation, Chiropractic • If your goal is to reduce stress, anxiety, or pain, any of these could help a lot • Even for other diseases, they could help • Doctors cannot effectively treat all diseases • Sometimes, patients come to them seeking help from alternative medicine that may have no scientific validity (according to the doctor) • This leads to a difficult ethical question – should doctors encourage placebos? • Is it dishonest? • Could it decrease our trust in doctors? • If expensive, should insurance pay for them?

Demonstrations for Homework 1 Benny Claire Leo Yi Guest Lecturer Sarah Jeong “Fake News”

Blinding the Patient • It is important in an experiment that the patient not know if they are getting the real treatment or not (they are “blinded”) • Typically, you must give a similar looking treatment so they won’t know • Example 1: Efficacy of medicine • Experimental group gets a pill • Control group gets placebo pill (typically a sugar pill) • Example 2: Efficacy of surgery • Experimental group gets surgery • Control group gets sham surgery (incisions, etc., but nothing done inside) • Example 3: Efficacy of acupuncture • Experimental group gets needles inserted in correct places for acupuncture • Control group gets needles inserted in wrong places for acupuncture

Blinding the Doctor / Double Blinding • It is also important that the one judging effectiveness doesn’t know which group is which • Evaluations of health can be partly subjective • The judge is also blinded • It is often necessary to have a third party that assigns the groups and keeps track of data, but does no judging themselves • Medical tests that have both the patient and doctor blinded are called double blind experiments • All medical tests are supposed to be double blinded • Sometimes, it is difficult to do truly double blind experiments • The treatment may have side effects noticed by the doctor or the patient • No experiment is truly perfect

What Makes a Good Experiment? Some Things that Make Experiments Better • Good Control Groups • Similar groups • Random Assignments • Blind Assignments • Anticipate and make clear rules for throwing out “bad data” • Possibly impartial “blind” observer • Objective measuring criteria • When possible • If not, have blinded “measurers” of success • Reliable data recording • Do what you can to maximize “signal” • Do what you can to minimize “noise” • Good sample size (helps decrease statistical errors)

The Wrong Way to Test Things • Most people, if told something extraordinary, don’t know how to test it Typical reaction when you tell someone something extraordinary: • Think of examples that confirm or disconfirm the hypothesis • Immediately see if they can think of a way to test it • Judge the results (often by subjective criteria) • Draw conclusions based on the number of successes What’s wrong with this? • Typically, people have a bias to remember things that confirm or don’t confirm what they want to believe • They often rationalize failures (or successes) in an ad-hoc manner • They use vague success criteria that are difficult to evaluate • They don’t (or can’t) evaluate the likelihood that success is due to luck

Example 1: Projection of Thoughts (1) • I was contacted by C.J. and the James Randi Educational Foundation • C.J. claimed he could project his thoughts into another • I asked for clarification: • He could send images from his mind to another • As evidence, he talked about how he would look at things and other people could describe what he was looking at without seeing it • He didn’t have any records, etc., of doing so How would you go about testing this? • What questions should you ask to clarify his claim? • What could you use as experimental and control groups for this? • What counts as success? What counts as failure? • What provisions (if any) should be made to prevent cheating?

How is This the Scientific Method? • It started with observation – people seemed to be able to see images C.J. held in his mind • Not very well supported hypothesis, but let’s let that go • We need an alternative hypothesis • Null hypothesis – these events were coincidences due to luck • We now need to pick a population – a set of experimental subjects that are very similar that can be used for testing • I chose ten cards from a deck of cards – A, 2, 3, 4, 5, 6, 7, 8, 9, 10 of hearts • For each round of the experiment, we randomly choose one card from these ten, and this becomes the “experimental” group • The remaining nine cards are the control group • Under the primary hypothesis, the Receiver is more likely to pick the target card • Under the null hypothesis, the Receiver is as likely to pick

Projection of Thoughts Testing Protocol • C.J., the “Sender,” sat in my office with me • The “Receiver” sat in the library down the hall • Closed door between us • I turn over 10 ordinary playing cards (A through 10 of hearts), one per minute • We used synchronized stopwatches to make sure the Sender was sending at the same time the Receiver was receiving • After ten cards/ten minutes, we stopped= 6/10

Results of Tests of Thought Projection • C.J., the first time he tried it • One year later, he tried it again • Someone else tried it • C.J. Again Target: 2 4 8 3 5 7 10 A 6 9 Rec’d: 8 2 5 3 10 7 9 4 A 6 Target: 8 7 3 10 5 6 4 2 A 9 Rec’d: 7 3 10 8 A 3 9 2 4 5 Target: A 8 10 5 7 6 3 9 2 4 Rec’d: 8 A 2 4 9 5 3 10 6 7 Target: 4 8 2 A 6 10 7 3 5 9 Rec’d: 5 2 7 10 8 2 4 9 3 A

Example 2: Cloud Busting • I was contacted by a claimant and the James Randi Educational Foundation • Claimant said when he stared at clouds he could make them get smaller or even vanish • I asked for clarification: • Works on typical summer clouds • Takes about a minute How would you go about testing this? • What questions should you ask to clarify his claim? • What could you use as experimental and control groups for this? • What counts as success? What counts as failure? • What provisions (if any) should be made to prevent cheating?

Cloud Busting Testing Protocol A B • I decided I needed a (blinded) judge to make the decision on “successes” • I would choose two clouds in the sky of approximately equal size, and all three of us would agree on which is A and which is B • Judge would leave us • I would randomly select one of the two as the target, and tell the demonstrator • He had two minutes to make it go away • At the end of two minutes, judge decided which one diminished the most • We repeated the experiment ten times

Cloud Busting Results Target: B A A A B B A B A B B Rec’d: B A A A A B B A ? A A • In one of the rounds, the judge said that both clouds completely disappeared • We ultimately decided that we had to “throw out” this data and do another round • In the end, he got 5 out of 10 right – no more than chance • According to the judge, out of the 22 clouds used in the experiment, 16 of them got smaller • Maybe clouds mostly get smaller, and we just don’t notice because we rarely stare at clouds • It’s easy to see how the claimant got the impression he could bust clouds • It’s also clear why you need a control to do this test right

Example 3: Remote Viewing • Members of the Hawaii Remote Viewing Group contacted the James Randi Educational Foundation • Who then contacted me • They had been practicing remote viewing • One member of the group would secretly select a target picture • Over the course of a week, the other members would try to draw it • They would then compare the drawing with the target picture • And congratulate themselves on how many similarities there were between the target and the drawings How would you go about testing this? • What questions should you ask to clarify their claim? • What could you use as experimental and control groups for this? • What counts as success? What counts as failure? • What provisions (if any) should be made to prevent cheating?

Remove Viewing – the Test • They chose a set of ten pictures, which were sent to me • They said some kind sof pictures work better than others, so I let them choose • They said I could be the one choosing the picture • I had to type up an encrypted description of it (keeping the password), which would focus my mind on it • They had one week to draw sketches of it • One of the group then chose which of the ten targets was the “winner” • Sent in encrypted form to me, without the password • We repeated it for three weeks • At the end of which we exchanged passwords and counted how many times they got it right

Remove Viewing – the Results • For example, the first week, this was one of the drawings made by the viewers • Their judge, based on this, thought it bore a remarkable resemblance to one of the target pictures, a statue of King Kamehameha • I agree • Unfortunately, the actual target was an astronomical observatory in Hawaii • In three rounds, they got zero right

Outline of the Scientific Method Dr. Carlson’s Description of the Scientific Method • Observation or other source of inspiration • Form two or more hypotheses • Discard those contradicted by previous experience • Design experiment • Perform experiment and record data • Analyze data • Check for errors and repeat steps 4 – 6 as needed • Report your results for critical review by others • Repeat step 2 – 8 • Once it has survived a several cycles it becomes a theory • Repeat in a wide variety of situations over time • Once it has survived a large variety of tests, it becomes a law

How can we determine if Dr. Eric’s Amazing Wonder Tonic is any good? A bad test:

How can we determine if Dr. Eric’s Amazing Wonder Tonic is any good? A bad test:

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7