1 / 17

Data Mining Disasters

Data Mining Disasters. A Report Mary McGlohon SIGBOVIK Commission for Workplace Safety. Data Mining Safety. Data mining disasters are a hazard to the progress of scientific research. We will review some common mining disasters and make recommendations for prevention. Numeric Overflow.

meir
Download Presentation

Data Mining Disasters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Disasters • A Report • Mary McGlohon • SIGBOVIK Commission for Workplace Safety

  2. Data Mining Safety • Data mining disasters are a hazard to the progress of scientific research. • We will review some common mining disasters and make recommendations for prevention

  3. Numeric Overflow “ • In 2007, numeric floods were responsible for over $600 million in property damages. ’’ -Department of Made-Up Statistics

  4. Numeric Overflow ERROR::NUMERICOVERFLOW Nobody expected the breach of the levees

  5. Numeric Overflow • Also caused loss of several hundred nerd-hours. • 1 nerd-hour = 1 grad-student-hour = 0.25 faculty-hours = 6 undergrad-hours

  6. Numeric Overflow • Recommendation: A drowning researcher’s best bet is to grab onto a floating log.

  7. Power Law Failures • Occurs when confusing heavy-tailed distributions such as: • Power Law (incl. Pareto, Zipf) • Lognormal • Weibull • Burr • Log-gamma • Log-Log-Log-Log-Mushroom-Mushroom

  8. Power Law Failures • Many natural phenomena have heavy tails. • Magnitude of earthquakes • Size of human settlements • Degree distribution of “real” graphs • Time-to-response in CS professors email • Your mom • However, confusing heavy-tailed distributions confused results in...

  9. Power Law Failures • Related danger: Statisticians, computer scientists, and physicists wasting valuable nerd-hours in religious arguments over which heavy-tailed distribution is being followed.

  10. Power Law Failures • Statisticians get mean when they get religious. (SIGBOVIK07) • Recommendation: Calm the hell down.

  11. Decision Tree Forest Fires • Pruning is used to prevent overfitting. • When overpruning occurs, trees are burned to stumps. • This spreads, torching entire forests. L (Aww...)

  12. Decision Tree Forest Fires • Recommendation: Researchers should obtain burning permit before pruning with fire. • Smoking while researching is not recommended-- if you choose to do so, make sure your “butts are out”.

  13. Voting Fraud by One-Armed Bandits • Cascading failures from other fields may cause disasters in data mining. • Fatal mistake: combining related subfields voting mechanisms and one-armed bandit problems.

  14. Voting Fraud by One-Armed Bandits • One-armed bandits commit voting fraud by: • Impersonating real voting machines. • Cramming cake into voting machines. • (The cake is a lie.)

  15. Other safety measures • Cool mining helmets

  16. Conclusion • The Commission for Workplace Safety hopes this has raised awareness of potential data mining disasters. • When faced with data-mining disasters, • Remain Calm. J • Blame it on one-off errors, lack of rigor in proofs of correctness, or whatever government agency is funding the project.

More Related