460 likes | 629 Views
An Overview of Systems and Networking Research at Microsoft Research. Michael B. Jones Systems and Networking Research Group, Microsoft Research April 1999. Microsoft Research A quick primer. Founded in 1991 Goal: Pursue strategic technologies for Microsoft Original research groups:
E N D
An Overview of Systems and Networking Research at Microsoft Research Michael B. Jones Systems and Networking Research Group,Microsoft Research April 1999
Microsoft ResearchA quick primer • Founded in 1991 • Goal: Pursue strategic technologies for Microsoft • Original research groups: • Natural Language Processing • Operating Systems • Programming Languages
Microsoft Research • Over 300 researchers in 27 areas • Speech, Decision Theory, Graphics, Databases, to Statistical Physics • Research lab locations: • Redmond, San Francisco, Cambridge (UK), Beijing • Internationally recognized research teams • Hundreds of publications, presentations • Leadership roles in professional societies, journals, conferences
Fastest Growing CS Research Organization In The World • Grew by factor of four from ’94 to ’97 • Decided in ’97 to grow by a factor of three in three years • 200 in FY ’97 => 600 in FY ’00, primarily in Redmond • Major impact on Microsoft products • Virtually all MS products shipped today use technology from Microsoft Research
Systems and Networking Research Group • One of the original three research groups at Microsoft Research in Redmond • Formerly called the “Operating Systems Research Group” • Name changed in 1998 to explicitly include networking • Group presently 15 members • Working in four areas
Past Projects • Tiger • Scalable, fault-tolerant multimedia file system using commodity hardware • Rialto • Real-time kernel enabling predictable concurrent execution of independent real-time programs • Both were used in Microsoft's Interactive TV trial in 1996-1997 with NTT in Yokosuka, Japan
Current Research Areas • Networking • Distributed Computing • Operating Systems • Real-Time Systems
Victor Bahl – Net Bill Bolosky – OS Gerald Cermak – Dist. Sys. Scott Cutshall – OS Rich Draves – Net John Douceur – OS Alessandro (Sandro) Forin – Net Johannes Helander – OS Galen Hunt – Dist. Sys. Mike Jones – Real-Time Sys. Steve Levi – Dist. Sys. Venkat Padmanabhan – Net Marvin Theimer – OS Yi-Min Wang – Dist. Sys. Brian Zill – Net Group Members andCurrent Research Areas
Networking Projects • Location Aware Systems and Services • Hardware Adapter for Light-Weight Mobile Networking • IPv6 • Automatic Network Configuration • High Performance & Sys. Area Networking • DCOM over SAN • TCP Fast Start, Network Performance Improvement • Multicast-based Data Dissemination
Distributed Computing Projects • Millennium • Distributed, Fault-Tolerant Applications • Automatic Application Partitioning • Distributed Java Virtual Machine
Operating Systems Projects • Componentized System Architecture • Single-Instance Store Filesystem • Unobtrusive Background Computation • Transactional Filesystem
Real-Time Systems Projects • Real-Time Scheduling • Real-Time Latency Measurement
Current Projects Grouped by Research Areas
Location Aware Systems and Services • In-building location-aware system • Wireless mobile nodes precisely compute their geographic location • Enable new class of mobile applications • E.g., use nearest printer, etc. • Victor Bahl, Venkat Padmanabhan, Turner Whitted, Josh Broch (CMU)
Hardware Adapter for Light-Weight Mobile Networking • MCoM (Mobile Communicator) Project • Light-weight devices network in both ad-hoc and controlled manner • Investigates protocol and systems issues: • Energy conservation • Multi-hop routing • In presence of link failures, mobility • Victor Bahl, Turner Whitted
IPv6 • Internet Protocol Version 6 (IPv6) implementation for Windows NT • Freely downloadable • Numerous v6 utilities also available • Multi-homing issues • Rich Draves, Brian Zill, ISI (Allison Mankin, etc.) • Published in ’98 USENIX NT
Automatic Network Configuration • Algorithms for auto-configuring IP networks • Address and subnet assignment that optimize the network’s efficiency • Rich Draves, Chris King (Northeastern), Cheenu Venkatachary (WUSTL) • Published in InfoCom ’99
High Performance & System Area Networking • High-performance networking under NT • VIA-like and memory-like interconnects • It’s WinSock! No need to rewrite apps • No loss of performance • Easily extensible (RDMA, registration, …) • Gigabit Ethernet Jumbo Frames • TCP Switch • Layered WSP over SAN vendor’s WSP • Sandro Forin, Johannes Helander, NT • Published at DARPA NT Workshop
Hybrid SAN-TCP/IP Architecture Winsock App Winsock App Winsock Winsock SPI Winsock Switch SAN WS Provider MsAfd MsAfd User Kernel AFD AFD TDI App SAN WS Driver TDI App TDI Switch TCP/IP TDI TCP/IP SAN TDI Provider NDIS SAN NDIS MiniPort SAN MiniPort SAN NIC SAN NIC
DCOM Over SAN • Millennium Falcon project • Implement high-performance distributed object systems • For clusters of servers • Connected by SANs • Take full advantage of user-mode nets • Current implementation based on DCOM and VIA • Yi-Min Wang
TCP Fast Start, Network Performance Improvement • Reuse information learned in past • Rather than rediscover it each time • E.g., TCP congestion window • Venkat Padmanabhan, Randy Katz (Berkeley) • Published at Globecom ’98 Internet Mini-Conference
Multicast-based Data Dissemination • Quantify potential benefits of multicast for information dissemination • Based on HTTP logs • Evaluate algorithms/heuristics for deciding which data should be multicast • Venkat Padmanabhan
Distributed, Fault-Tolerant Applications • Millennium Project • Unifying vision behind several individual prototype projects • Galen Hunt, Yi-Min Wang, Gerald Cermak, Johannes Helander, Rick Rashid • Initial position paper published at HotOS-VI, 1997
Millennium • Problem • Building distributed, fault tolerant applications is too hard, costs too much • Goal • Raise the level of abstraction provided by the operating system • Individual computers, file systems, networks unimportant to component builders
App App App App App Application Millennium COM+ COM+ COM+ NT NT NT Millennium:Raise the Level of Abstraction • Maintain single system image. • Transparent invocation, migration, and recovery. • Individual computers, file systems, and networks become unimportant to application developers.
Automatic Application Partitioning • Millennium Coign Project • Galen Hunt • Published in OSDI ’99
Coign: Automatic Distributed Partitioning • Converts local COM applications into distributed client-server applications without source code. Before: After:
The Plan: 1. Find Components in Application Binaries 2. Identify Interfaces and Measure Communication 3. Partition and Distribute Components
COP: Component Object Proxy • Transparently remote Win32 API calls • Factor Win32 interface • Automatically create DCOM interfaces • Transparently insert proxy objects • Galen Hunt, Gerald Cermak
Millennium Continuum • Provides single system image for Windows API • Automatic object placement and migration at run-time • Language neutral • At least Visual Basic, C, C++, Java • Based on COM+ • Galen Hunt, Gerald Cermak, Rick Rashid
Distributed Java Virtual Machine • Millennium Borg project • Makes multiple JVMs appear to be one • Unmodified Java programs may run as distributed applications • Transparent distribution, migration • Johannes Helander
Componentized System Architecture • MMLite Project • Kernel object architecture stressing adaptability, minimalism, reusability • Many normally “built-in” components selectable, loadable • E.g., Virtual Memory, IPC • Johannes Helander, Sandro Forin • Published at ’98 SigOps European Workshop
Single-Instance Store Filesystem • Enables single on-disk instance of files with multiple logical copies • Sharing transparent to applications • Replicas found in background, coalesced • Bill Bolosky, Scott Cutshall, John Douceur, NT filesystem group • Planned to ship with Windows 2000
Unobtrusive Background Computation • “How to be Really Nice” • Background processes that don’t interfere with foreground work • Even if neither CPU-bound • Based on progress metrics • Back off when statistically significant slowdown observed • John Douceur, Bill Bolosky
Transactional Filesystem • Research version of NTFS with transactional semantics • Marvin Theimer
Real-Time Scheduling • Scheduling abstractions enabling predictable concurrent execution of independent real-time programs • Mike Jones, John Regehr (Virginia), formerly Daniela Rou (GA Tech), Marcel Rou (GA Tech), George Candea (MIT) • Published in ’96 SigOps, ’97 SOSP, ’98 & ’99 USENIX Windows NT
Real-Time Latency Measurement • Understand, fix sources of long thread scheduling latencies in NT • Mike Jones, John Regehr (Virginia) • Published in ’98 NOSSDAV & ’99 HotOS
Problem: “Unimportant” Background Work • DEC dc21x4 PCI Fast 10/100 Ethernet • 6ms periodic DPC every 5s • Autosense processing • Most of 6ms in five 0.88ms calls to routine that reads device register that: • Writes a HW register – 1.5µs • Stalls for 5µs • Writes HW register again – 1.5µs • Stalls for 5µs • Reads a HW register – 1.5µs • Stalls for 5µs • And does this 16 times! (once per bit)
Another Long DPC: Intel EE 16 • Intel EtherExpress 16 ISA Ethernet • 17ms DPC every 10s • Card reset for no received packets Amusing Observation • Unplugging Ethernet makes latency worse! • Despite conventional wisdom to the contrary
Even Worse: Video Cards • Video cards and drivers conspire to hog the PCI bus • Dragging large window locks out interrupts for up to 30ms • Obliterates sound I/O, for instance • Can set registry key to ask drivers to behave, but not default • No problem when set correctly • Manufacturers’ motivation: WinBench ~ 5% improvement
Video CardMisbehavior Details • Don’t check if card FIFO full before write • Eliminates one PCI read • Stalls PCI bus if full to prevent overflow • Uses “PCI disconnect” feature
For More Information • Systems and Networking Research Group web pages: • http://research.microsoft.com/sn/