1 / 29

Attacking Intermittent Failures: Mozilla's War on Orange Mark Côté Software Artisan

Attacking Intermittent Failures: Mozilla's War on Orange Mark Côté Software Artisan Mozilla Automation & Tools. Overview. Background: Intermittent failures Mozilla's automated-testing infrastructure The War on Orange: Data sources Metric Basic UI. Overview.

vaughan
Download Presentation

Attacking Intermittent Failures: Mozilla's War on Orange Mark Côté Software Artisan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Attacking Intermittent Failures: Mozilla's War on Orange Mark Côté Software Artisan Mozilla Automation & Tools

  2. Overview • Background: • Intermittent failures • Mozilla's automated-testing infrastructure • The War on Orange: • Data sources • Metric • Basic UI

  3. Overview • The War on Orange, continued: • Advanced UI • Actions • Conclusion

  4. Background – Intermittent failures • In any testing system of sufficient size, errors crop up that • occur infrequently but consistently • are not reliably reproducible • cannot be tied to a particular changeset • These are known as “intermittent failures”, and they're awful.

  5. Background – buildbot • Buildbot: • Complex continuous-integration system • Triggered by commits to mozilla-central and other key branches • A few hundred builder slaves • Around a thousand tester slaves • Hundreds of thousands of tests executed against each build

  6. Background – tbpl • Tinderbox Push Log (tbpl) presents buildbot results • Results are colour coded: • Green: passes • Red: fatal errors, including crashes • Orange: nonfatal errors • Blue: test restarted due to infrastructure error • Purple: unrecoverable infrastructure error

  7. Background – tbpl

  8. Background – Intermittent oranges • When an orange or a red occurs, the changeset is usually backed out... • Except that the orange might indicate an intermittent failure. • Intermittent failures are “starred” and marked with a comment, usually a Bugzilla bug ID.

  9. Background – Intermittent oranges • Starring updates the Bugzilla bug with a comment about the occurrence. • Ultimately it has to be done by a human. • For the rest of this presentation, we refer to an “intermittent orange” as just an “orange”.

  10. Background – Intermittent oranges

  11. The War on Orange • Predictably, more and more oranges occurred over time • We had no way to know even how many oranges were occurring, yet alone any characteristics of them • We needed a system to track oranges over time and extract data about their occurrences

  12. The War on Orange • We created a web tool, known as both the War on Orange (WOO) and OrangeFactor (OF) • Rich HTML/CSS/JS client • Python back-end powered by web.py • Assorted Python helper scripts and modules

  13. The War on Orange – Data sources

  14. The War on Orange – Data sources • Using two distinct sources of data means they sometimes fall out of sync • This is noted on the UI • Could fall back to orange data, but it isn't completely accurate

  15. The War on Orange - Metric • Basic metric is referred to as the “orange factor” (OF) • The orange factor is the ratio of oranges to test runs in a given period of time • OF of 5 means 5 oranges every test run, on average • Ideal OF is 0!

  16. The War on Orange – Basic UI

  17. The War on Orange – Basic UI

  18. The War on Orange – Basic UI

  19. The War on Orange – Basic UI

  20. The War on Orange – Basic UI

  21. The War on Orange – Basic UI

  22. The War on Orange – Basic UI

  23. The War on Orange – Basic UI

  24. The War on Orange – Advanced UI • Data is great; information is (way) better • Implement some of the common analyses • JSON data available via web API for further analysis

  25. The War on Orange – Advanced UI

  26. The War on Orange – Advanced UI • Orange Seed: estimate when an orange was introduced • Calculate average interval between occurrences • Extrapolate to point in the past • Estimate probable range based on interval variance

  27. The War on Orange – Taking Action • Augment passive data interface with active alerts • Keeps project visibility and focus • Weekly progress reports • Notifications of significant events • Large increases/decreases in OF, new oranges

  28. Conclusion • Intermittent failures essentially unavoidable • Cannot solve the problem without data • Automatic analysis even better than tracking data • Notifications to maintain visibility and focus

  29. Links • Application: • http://brasstacks.mozilla.com/orangefactor/ • Project page: • https://wiki.mozilla.org/Auto-tools/Projects/WarOnOrange • Mozilla Automation & Tools' home: • https://wiki.mozilla.org/Auto-tools

More Related