210 likes | 358 Views
Windows Thoughts on dependability of Windows. Rob Short Corporate Vice president Windows architecture and kernel. Today’s talk. End user view of dependability Windows overview – some stats Progress we’ve made already Tough Challenges. Expectations have changed.
E N D
WindowsThoughts on dependability of Windows Rob Short Corporate Vice president Windows architecture and kernel
Today’s talk • End user view of dependability • Windows overview – some stats • Progress we’ve made already • Tough Challenges
Expectations have changed • Computers are now appliances • People no longer accept the blame when computer fails or when they can’t find out how to use a feature • Highest cost issues are complex design problems, rather than traditional “software failure” issues • Complexity of diagnosis is beyond most users • Microsoft must beresponsible for everything running on the system • The end user doesn’t know or care if they loaded a virus, they see “Windows crashed”
Our approach needs to change Chart from pre-reading shows reduction in hw and swfailures as cause of downtime • System management, user interference component has increased • BUT • Computer is an appliance - system manager is no longer an expert • We must find approaches to deal with this
Windows – some stats • Windows is an entire family of products • Almost 1 Billion! Copies sold • ~200Million per year, >500K a day • Over 1,000,000 devices supported • 100,000s applications available • Embedded in phones and handhelds with 32MB of memory • 64-CPU, 64-bit systems with > 1TB of RAM
Developer view of product range • Embedded/specific purpose • As small as possible, but composibleby an expert. • Tool kit to add/remove components • Devices chosen by engineer, not end user • Builds a run-time, not a general purpose OS • Server • Just the files for a particular role, others available when needed • GUI/Wizards allows user to choose role • Administrator may want to hand configure some devices such as SANs • How to add support such as NUMA without slowing client? • Client – most complex windows system • Everything on by default • Full automatic plug and play, streaming media, audio, etc • Most systems have hundreds of drivers and extensions running • Corporate IT wants complete control over devices and installed software
Huge progress • Systems are much more reliable and functional than a few years ago • Newer hardware and better driver tools have helped with hardware issues • We’re very good at the “hard faults” ie crashes etc where we have real information • Hangs, slowdowns etc still haunt users • Security ---- may be the worst problem since solution is social as much as technical
Progress in Vista • Reliability and security were a top priority • Significantly strengthened performance and reliability teams • Source level analysis tools • Diagnosis and tracing infrastructure • Auto diagnosis for common problem areas • User mode driver frameworks • Hang detection infrastructure • Cancelable synchronous I/Os • Hardware error architecture
Using feedback to improve quality (The nice marketing slide) Analysis used to prioritize Dev work Windows Update Fixes, patches, updates, etc. Partners Problems, crashes, annoyances Partners Internet Customers and Community Partners
Online Crash Analysis (OCA) • Automatically takes dumps from customers, analyzes them, and sends solution back • Store the output of the analysis into the OCA Database • Dev and MSR worked together to create analysis tools and to mine the data • Search for common devices/ sw and themes • We save dumps for further analysis
Architecture challenges • Windows has grown explosively, but organically • System became intertwined and complex • Organization is also large and complex • Sharing code base across products is great, but can serialize development • Adding new product variants is too hard • Servicing it all is a challenge
Architectural focus areas in Windows • Application model • State • Extensions, both user and kernel mode • Application compatibility • Layering and partitioning • Top management added security
State State is everything persistent Schema to identify system, user, and application state Users should be able to move to new machines How to really understand impact of changes Rules for developers
Extensibility • Microsoft is successful because we’re a platform company – it's what we do • Providing the ultimate platform means well thought-out extensibility points throughout the system • The system needs a common way to identify, load and enumerate extensions • We need to make extensibility consistent and robust so customers feel comfortable using software, all of which includes extensions
Drivers – extensibility example • Windows driver model designed for performance first and extensibility second • Wrong choice for today 100,000 drivers, 1,000,000 versions • Created driver “frameworks” for Vista • Re-architecting the boundaries • Huge effort on tools for developers • Static driver verifier • Joint MSR and development effort
Software EngineeringResearch challenges • Large systems are too complex to fully analyze • How to think about full impact of design? • Ways to think about interactions more formally • What should the extension model be? • Component model with cross-component tools • Focus on entire lifecycle • Requirements, specification and architecture, failure analysis • Design/coding/Test and verification/Maintenance, patching etc • Help with education? • Raise awareness of value of correctness, test and verification etc
Summary • Huge improvements in capability and reliability in the past decade • Requirements and expectations increased faster than improvements in dependability • System complexity has increased faster than our ability to manage it • Development teams are very good at evolutionary improvements • We need new, end to end, approaches to help entire product lifecycle
The right people are here Lets do something about it Questions?
Security • More than just a technology issue • World wide network of hackers spread the word on vulnerabilities, most attacks take advantage of more than one • Reverse engineer patches - Race to get the patch out before the hack • Attacks are increasingly sophisticated • Most issues are design problems, not simple coding errors • Threat models, design reviews, code reviews, tools etc • Tradeoff between usability and security
PC Hardware capabilities drive entire industry 20-30 GHz 10 GHz 3 GHz 80 / 200 GB 213 / 500 GB 568GB / 1 TB 40 / 100 GB 100Mb/S Wired 11Mb/S Wireless 100Mb/S (wired) 11 / 54 Mb/S (wireless) 1Gb/S (wired) 54 Mb/S (wireless) 2002 2003 2005 2007