1 / 35

Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster

Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster. tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007. "Part of education is to expose people to different schools of thought.” President George Bush, August 1, 2005.

morgann
Download Presentation

Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining, Truth, Justice, the American Way, and the Flying Spaghetti Monster tim@menzies.us Ph.D. LCSEE, WVU, 20 Sept 2007

  2. "Part of education is to expose people to different schools of thought.” President George Bush, August 1, 2005 "Part of science is to expose people to the critical and continual (re)evaluation of ideas.” Some guy called Timm,September 20, 2007 Expose, and hose

  3. "Look up in the sky! It's a bird! It's a plane! It's Superman!" "Yes, it's Superman, strange visitor from another planet who came to Earth with powers and abilities far beyond those of mortal men.” “Superman, who can change the course of mighty rivers, bend steel in his bare hands; and who, disguised as Clark Kent, mild-mannered reporter for a great metropolitan newspaper, fights a never ending battle for truth, justice, and the American way." Why a never- ending battle? How to ensure justice? How to make lottsa $$ ? How to find truth?

  4. So, tonight • Notions of certainty • Standards for debate • Surprises • Nothing is “truth” • but many more things are false • And some things are useful • Implications for humility • And for justice

  5. God gave me a brain. I take it (s)he wants me to use it. • Mark of the rational • while not dead; do • Review and revise assumptions; • Done • Entertain a wide range of ideas • But don’t necessarily accept them • Demand evidence • that lets your repeat/ refute/ improve prior conclusions • But what of faith? • That, is another talk • There is room for the divine in my universe • But in my test tubes? • Not too much

  6. @relation weather.symbolic @attribute outlook {sunny, overcast, rainy} @attribute temperature {hot, mild, cool} @attribute humidity {high, normal} @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,hot,high,FALSE,no sunny,hot,high,TRUE,no overcast,hot,high,FALSE,yes rainy,mild,high,FALSE,yes rainy,cool,normal,FALSE,yes rainy,cool,normal,TRUE,no overcast,cool,normal,TRUE,yes sunny,mild,high,FALSE,no sunny,cool,normal,FALSE,yes rainy,mild,normal,FALSE,yes sunny,mild,normal,TRUE,yes overcast,mild,high,TRUE,yes overcast,hot,normal,FALSE,yes rainy,mild,high,TRUE,no outlook = sunny | humidity = high: no | humidity = normal: yes outlook = overcast: yes outlook = rainy | windy = TRUE: no | windy = FALSE: yes Data miners: agents that automate the creation and review of new ideas Mountains of data Tablespoons of knowledge

  7. Data doubling every 20 months • Internet, Radio Frequency Identification (RFID) tracking, on-line shopping (patterns of sales tracked at Amazon) • So now we can automatically learn answers to many questions; e.g. • What eggs to select for IVF? • What will software cost to develop? • What diseases does a patient have? • Which loan applications to fund? • What houses will have the best resale value? • Which parts of the program need more inspection? • What products are best to sell to what markets? • What cows to keep and which to send to the abattoir ? • How to teach a satellite to distinguish between cloud shadows and oil spills? • How much electricity will be needed in two hours • i.e. what cola-powered generators to fire up?

  8. Same data, different data miners different conclusions Every miner biased by Evaluation bias Language What is the “shape” of the models we can learn? Decision trees, equations, etc Search Pruning the possible infinitespace of of candidate models What not to explore Over-fitting avoidance How to stop the learner fixating on noise E.g. pruning back decision trees More fundamentally, what can we say about the world, with any certainty?

  9. Any learning schemehas many biases • Bias lets us ignore “stuff”. • Without it, we don’t know what is important or dull, we can’t summarize, generalize. • Without bias, we can’t learn from the past • Bias blinds us but lets us see the future • But changing biases changes what we best believe • No wonder truth is a never-ending battle

  10. Generalizing from the past, works • Sometimes, very clearly • Heavy smokers have 2000% to 3000% higher change of lung cancer • Learned theories performs very well on new data • But ... • the “best” learned theory can be a moveable feast.

  11. So, a relativistic soup? • No certainty? • No way to plan effective actions? • No way to rule out absurd notions?

  12. … I think that once … there were no cell phones or iPods, or clothes, or countries, or language, or human society, or 4-valved hearts, or homeostasis, or organs, or brains, or planets, or stars, or matter Where the net energy in-flow is positive… the universe selects for self-perpetuating systems, an exponentially decreasing number of which are of exponentially increasing complexity Should I even say this in a public place? "Part of education is to expose people to different schools of thought.” President George Bush, August 1, 2005 Shouldn’t I be have to give credence to all theories? Evolution, Intelligent design Pirates cause global warming? I don’t want to offend any one, but…

  13. The Church of the Flying Spaghetti Monster (FSM) • Founded in 2005 • OSU physics graduate Bobby Henderson • A protest against the decision by the Kansas State Board of Education • That require the teaching of intelligent design as an alternative to biological evolution. • Henderson wrote to the board • professing belief in a supernaturalCreator called the Flying Spaghetti Monster • Demanded that his "Pastafarian" theory of creation be taught in science classrooms.

  14. FSM is not about religion • It is a mistake to view FSM as anti-religion • Rather, FSM is anti-anti-scientific rigor • No one in their right mind would everbelieve this nonsense • And that’s the point • Truth is a never-ending battle • We must have standards to assess scientifictheories, to reject absurdities • Or any nonsense can be released on this world • E.g. “Global warming is caused by pirates.”

  15. FSM: an invisible, undetectable Flying Spaghetti Monster Evidence for evolution planted by FSM to in to Pastafarians' faith FSM changes the results of measurements, like radiocarbon dating, via His Noodly Appendage. Heaven contains beer volcanoes and a stripper factory. Hell is similar, but with stale beer and diseased strippers. Pirates are "absolute divine beings" and the original Pastafarians. Their image as "thieves and outcasts" is misinformation spread by Christian theologians in the Middle Ages and Hare Krishnas. Pirates are "peace-loving explorers and spreaders of good will" who distributed candy to small children. Global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s. Wikipedia on FSM

  16. FSM “proof” of the divinity of pirates A case study on how not to present data X-axis deliberatelymisleading. Crazy? Yes! • But would you recognize such craziness if you say it again?

  17. What is the “best” weight-loss diet?

  18. How lucky for those in power that people don't think.- Adolph Hitler i.e. people trying to sell you their diet book

  19. What is the “best”programming language?

  20. To our peril, we trust old ideas too much • Columbia ice strike: • Size: 1200 in3, • Speed: 477 mph(relative to vehicle) • Certified as “safe” by the CRATER micro-meteorite model • A typical experiment in CRATER’s test database • Size: 3 in3 piece of debris • Speed: under 150 mph.

  21. 1990s: American Heart Association recommends hormone replacement therapy for older women to ward off heart disease and osteoporosis. 2001: 15 million Americans filling H.R.T. prescriptions annually 2002: estrogen therapy exposed as a hazard, not a benefit, for health Failure of scientific method Benefits of estrogen reported from large observational studies, not randomized trials Repeated epidemiological finding: randomized trail rarely support conclusions from observational studies. So forget what you’re read about Anti-oxidants like vitamins E & C &beta carotene preventing heat disease Fiber prevents colon cancer Value of estrogen(NYT magazine,Sept 16, 2007)

  22. So, why is FSM silly? • And please, rest assured, • it is very very silly stuff indeed. • Theories need an entrance exam • Many possible theories • one for each bias • Demand that a theory has past at least some operational al test before we condone it, act on it. • If no reason to accept the new, don’t • Trust the most what has been challenged the most • Karl Popper

  23. No things are “right”, but some things are “useful” • Sure, one data set supports many theories. • But there are many many more theories that are unsupported. • No model is right, but some things are useful • (perform well on test data) • George Box • And many many many more ideas are useless • Can’t make predictions • Not defined enough to support (possible) refutation

  24. Wolfgang Pauli • The "conscience of physics", • the critic to whom his colleagues were accountable. • Scathing in his dismissal of poor theories • often labeling it ganz falsch, utterly false. • But “ganz falsch” was not his most severe criticism, • He hated theories so unclearly presented as to be • untestable • unevaluatable, • Worse than wrong because they could not be proven wrong. • Not properly belonging within the realm of science, • even though posing as such. • Famously, he wrote of of such unclear paper: • ”This paper is right. It is not even wrong."

  25. Believe those who seek the truth; doubt those who find it -Andre Gide.

  26. Don’t test once on just the training data • Study more than the average performance • Also look at the variance • E.g. here, no significant on new data after X=8

  27. If something works, poke it till it breaks i) Sort attributes on “infogain”ii) Learn using first N attributes labor soybean diabetes anneal A few variables are (often) enough

  28. Living with Uncertainty • Check how training rate size effects theory

  29. Living with Uncertainty • Launch learners with anomaly detection and repair tools

  30. Living with uncertainty:count, alert, fix An incremental discretizer + a Bayes classifier where all inputs are all mono-classified Track average max likelihood for data processing in “era”’s of X instances Count: stuff seen in past Alert: if new counts different Fix: find delta new to old • Very, very fast Contrast set learning Linear time inference, Tiny memory footprint • And, it works [Orrego, 2004] • F15 simulator data [courtesy B. Cukic] • Five flights: a,b,c,d,e • each with different off-nominal condition imposed at “time” 15 • Off-nominal condition not present in prior data • In all cases, massive change detected

  31. Living with uncertainty Life is a balance between • Policy #1: exploration • Tolerate the sub-optimal, a little • Doing crazy things to learn new things • Policy #2: exploitation • Fix your theories and base your work on those fixed ideas. • Popper: • most “science” is puzzle solving… • … within existing paradigms. • Sometimes the paradigm breakdowns…. • …prompting revolutionary research • Human young: • Do crazy things (take long trips) • Less craziness as we grow older

  32. Tolerance of “exploration” • Critical to the American way • America: history of tolerance and acceptance • 1945: • 400 German rocket scientists choose to surrender to the Yankees, not the Russians • The choose their post-war life based on their perceptions of American ideology • Hence,

  33. Tolerance = hi-tech = $$$ • R. Florida: The Economic Geography of Talent, 2002 • Annals of Association of AmericanGeographers 92(4), 2002,pp743-655 • Best predictor for hi-tech industry • R2 0.42 to “coolness” • R2 0.49 to cultural amenities • R2 0.50 to median house value • R2 0.77 to “diversity” index

  34. Data Mining, Truth, Justice, the American Way & Flying Spaghetti Monsters To make $$, institutionalize exploration and tolerance “Superman, fights a never ending battlefor truth, justice, and the American way." Old conclusions must be constantly re-assessed No “truth”, all Is biased. A healthy hi-tech needs tolerance to support exploration and that the FSM is silly, but would consider revising that view if new evidence emerges

  35. "Part of education is to expose people to different schools of thought.” President George Bush, August 1, 2005 "Part of science is to expose people to the critical and continual (re)evaluation of ideas.” Some guy called Timm,September 20, 2007 Expose, and hose

More Related