1 / 59

Graphics Stability

Graphics Stability. Steve Morrow Software Design Engineer WGGT stevemor @ microsoft.com Microsoft Corporation. Gershon Parent Software Swordsman WGGT gershonp @ microsoft.com Microsoft Corporation. Session Outline. Stability Benchmark History

butch
Download Presentation

Graphics Stability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphics Stability Steve Morrow Software Design Engineer WGGT stevemor @ microsoft.com Microsoft Corporation Gershon Parent Software Swordsman WGGT gershonp @ microsoft.com Microsoft Corporation

  2. Session Outline • Stability Benchmark History • CRASH (Comparative Reliability Analyzer for Software and Hardware) • The CRASH Tool • The CRASH Plan • The Experiments • CDER (Customer Driver Experience Rating) • Program Background and Description • High-level Statistics of the Program • Factors Examined in the Crash Data • Normalized Ratings • Customer Experience and Loyalty

  3. Stability Benchmark History • WinHEC – May ‘04 • CRASH 1.0 released. • Web portal has 52 non-MS members from 16 companies • November ’04 • CRASH 1.1 released to the web. Includes DB backend • December ’04 • Stability Benchmark components ship to 8,000 customers and normalizable OCA data begins flowing in • CRASH Lab completes first data collection pass • Web portal has over 60 non-MS members from 17 companies

  4. CRASH Tool • CRASH is new dynamic software loading tool designed to expose and easily reproduce reliability defects in drivers/hardware • Answers the call from IHVs and OEMs for more reliability test tools. • Enables wide range of endurance/load/stress testing • Configurable load profiles • Scheduled cycling (starting and stopping) of test applications • Replay-ability • Automatic failure cause determination • Scripting for multiple passes with different scenarios • Creation of a final “score”

  5. CRASH Demo o _ X o _ X

  6. CRASH Demo o o _ _ X X

  7. CRASH Demo o o _ _ X X

  8. CRASH: 4 Phase Plan • Phase 1 • Produce CRASH documentation for review by partners • Release 1.0 to our partners for feedback • Phase 2 • Release 1.1 with database functionality to our partners • Execute controlled baseline experiments on a fixed set of HW and SW to evaluate the tool’s effectiveness • Phase 3 • Execute series of experiments and use results to increase accuracy and usefulness of the tool • Phase 4 • Create a CRASH-based tool for release to a larger audience

  9. Experiment 1 Objectives • Determine if the CRASH data collected sufficient to draw meaningful conclusions about the part/driver stability differences • Determine how machine configuration affects stability • Evaluate how the different scenarios relate to conclusions about stability • Find the minimum data-set needed to make meaningful conclusions about part/driver stability • Create a “baseline” from which to measure future experiments • Identify other dimensions of stability not exposed in the CRASH score

  10. Experiment 1 Details • Standardize on one late-model driver/part from four IHVs • Part/Driver A, Part/Driver B, Part/Driver C, Part/Driver D • Test them across 12 different flavors of over-the-counter PCs from 4 OEMs • OEM A, OEM B, OEM C, OEM D • High End and Low End • Include at least two motherboard types • MB Type 1, MB Type 2 • Clean install of XP SP2 plus latest WHQL drivers • Drivers snapped 8/16/04 • Use the 36 hr benchmark profile shipped with CRASH 1.1

  11. Important Considerations • Results apply only to these Part/Driver/System combinations only • Extrapolation of these results to other parts or drivers or systems is impossible with this data

  12. CRASH Terminology • Profile • Represents a complete “run” of the Crash tool against a driver • Contains one or more scenarios • Scenario • Describes a session of CRASH testing • Load intensity/profile • What tests will be used • How many times to run this scenario (loops) • Score • Score is always a number that represents the percentage of the testing completed before a system failure (hang or kernel-break)

  13. Profile Score Averages

  14. CRASH Terminology: Failures • Failure • Hang • No minidump found and loop did not complete • Targeted Failure • Minidump auto-analysis found failure was in the display driver • Non-Targeted Failure • Minidump analysis found failure was not in display driver • Does not count against the score

  15. Percentage of Results by Type

  16. Average Profile Score by Machine Group

  17. Average Profile Score by OEM and MB

  18. Affect of MB Type on Profile Score

  19. Score Distribution for Part/Driver C & D(MB Type 1)

  20. Experiment 1 Test Profile • Real Life • Moderate load and application cycling • 9 max and 3 min load • Tractor Pull • No load cycling • Moderate application cycling • Incrementally increasing load • Intense • High frequency load and application cycling • 9 max and 0 min load

  21. Average Scenario Score by Part/Driver

  22. Statistical Relevance Questions • Question: How do I know that the difference between the averages of result set 1 and Result Set 2 are meaningful? • Question: How can I find the smallest result set size that will give me 95% confidence? • Answer: Use the “Randomization Test”

  23. Randomization Test Delta 1 Set 1 Set 2 • Random test 10,000 times. If 95% of the time the Delta 1 is greater than Delta 2 then you are assured the difference is meaningful. • Try smaller sample sizes until the confidence drops below 95%. That is your minimum sample size. • Information on the “Randomization Test” can be found online at:http://www.uvm.edu/~dhowell/StatPages/Resampling/RandomizationTests.html Combination Set Delta 2 Random Set 1 Random Set 2

  24. Scores and Confidence Intervals for Part/Driver/MB Combinations

  25. The Experiment Matrix With three experiments completed, we can now compare: One driver across two OS configurations Two versions of one driver across a single OS configuration

  26. Old vs. New Drivers This table compares the profile scores for old drivers vs. new drivers on OEM Image New drivers were noticeably better for parts/drivers C & D Part/Driver A and B were unchanged

  27. OEM Image vs. Clean Install This table compares profile scores for OEM Image vs. Clean Install with Old Drivers Clean install scores universally better than OEM image for parts/drivers C and D Part/Driver A and B were unchanged Clean Install OEM Image Aug ’04 Driver Experiment 1 Experiment 3 Jan ’05 Driver Experiment 2

  28. Future Plans • Collate with OCA data • CRASH failure to OCA bucket correlations • What buckets were fixed between 1st and 2nd driver versions? • Do our results match field data? • customer machines have hardware that is typically several years old • Can we find the non-display failure discrepancy in the field? • Begin to tweak other knobs • Content • Driver-versions • HW-versions • Windows codenamed “Longhorn” Test Bench • PCIe cards

  29. Suggested Future Experiments • Include more motherboard types • Newer drivers or use a “Control Group” driver. Reference Rasterizer? • Disabled AGP to isolate chipset errors from AGP errors • Driver-Verifier enabled • Add non-graphics stress tests to the mix • Modified Loop Times

  30. IHV Feedback • “There are definitely unique [driver] problems exposed through the use of CRASH and it is improving our driver stability greatly” • “[CRASH is] producing real failures and identifying areas of the driver that we are improving on” • “Thanks for a very useful tool”

  31. CRASH 1.2 features • RunOnExit • User specified command run upon the completion of CRASH profile • More logging • Logging to help troubleshoot problems with data flow • More information output in xml • More system information • More failure details from minidumps • More control over where files are put • More robust handling of network issues

  32. Customer Device Experience Rating (CDER) Program Background • Started from a desire to rate display driver stability based on OCA crashes • Controlled program addresses shortcomings of OCA data: • Unknown market share • Unknown crash reporting habits • Unknown info on non-crashing machines • This allows normalization of OCA data to be able to get accurate ‘number of crashes per machine’ stability rating

  33. CDER Program Description & Status • Program & Tools • A panel of customers (Windows XP only) • User opt-in allows extensive data collection, unique machine ID • System Agent/scheduler • System Configuration Collector • OCA Minidump Collector • System usage tool (not yet in the analysis) • Status • All tools for Windows XP in place and functioning • First set of data collected, parsed, analyzed

  34. Overall Crash Statistics of Panel • Machines • 8927 in panel • 49.9% experience no crashes • 50.1% experience crash(es) • 8580 have valid device & driver info • 82.2% have no display crashes • 17.8% have display crashes • Crashes • 16.1% of valid crashes are in display • Note: Crashes occurred over 4 yr period

  35. Crash Analysis Factors • Examined several factors which may have an impact on stability ratings • Processor • Display Resolution • Bit Depth • Monitor Refresh Rate • Display Memory • Note: Vendor & part naming does not correspond to that in CRASH presentation. • Note: Unless otherwise noted, data for these analyses were from the last 3 years

  36. Display Resolution Crashes & Distribution

  37. Bit Depth Crashes & Distribution

  38. Refresh Rate Crashes & Distribution

  39. Display Memory Crashes & Distribution

  40. Display Crashes By Type (Over Last Year)

  41. Normalized Crash Data • The following data is normalized by program share of crashing and non-crashing machines

  42. Crashes per Machine Ranking by Display Vendor for Last Year (2004)

  43. ‘Vendor A’ Normalized Crashes By Part/ASIC Family Over Last 3 Years

  44. ‘Display Vendor B’ Normalized Crashes By Part/ASIC Family Over Last 3 Years

  45. ‘Display Vendor C’ Normalized Crashes By Part/ASIC Family Over Last 3 Years

  46. Normalized Crashes Ranked by Part - 2004

  47. Ranking and Rating Conclusions • This is a first look • Need to incorporate system usage data • Need to continue collecting configuration data to track driver and hardware changes • Need more panelists, and a higher proportion of newer parts • With that said: • This is solid data • This demonstrates our tools work as designed • It shows the viability of a crash-based rating program

  48. Customer Experience & Loyalty • A closer look at segment of panelists who: • Experienced display crashes, and • Switched or upgraded their display hardware or driver

  49. Experience & Loyalty Highlights • 19.4% of users who experienced display crashes upgraded their drivers, or hardware, or changed to a different display vendor • 7.9% of users (nearly 41% of the 19.4%) who experienced display crashes switched to a competitor’s product • ALL users who switched to a competitor’s product had the same or better experience • Only 91.3% of those who upgraded had the same or better experience afterwards, based on crashes • Time clustering of crashes

  50. Overall Experience of Users After Changing Display System

More Related