630 likes | 949 Views
M Windows NT 4.0 Setup and Debugging. Joseph West Sr Technology Specialist. Agenda . Setup (build overview) Three phases of Setup Character-Based Setup Boot from Character-Based to GUI-Based Setup GUI-Based Setup Troubleshooting ( Blue Screens & Stop Codes )
E N D
MWindows NT 4.0Setup and Debugging Joseph West Sr Technology Specialist
Agenda • Setup (build overview) • Three phases of Setup • Character-Based Setup • Boot from Character-Based to GUI-Based Setup • GUI-Based Setup • Troubleshooting • (Blue Screens & Stop Codes) • Latest information for NT 4.0 • SP4
Hardware Compatibility List • How important is it • Support parameters http://www.microsoft.com/hwtest/ http://support.microsoft.com/
Character-Based Setup Gathering of System Architecture Information • CPU Type • Motherboard Architecture • Hard Drive Controllers • File Systems • Disk Free Space • Memory
Info Gathered is Required for Basic System Initialization • ‘Failure to Detect’ will lead to failure of Setup • Unsupported components and enhancements • PCI 2.1 • Special Bus Drivers • Caching Chips for Burst Mode
Boot from Character-Based to GUI-Based Setup • Windows NT Kernel is loaded completely for the first time • Finds a valid Hard Drive • Polls Adapters and tests Bus • Most likely point of failure • Drivers are loaded into Memory and Multi-threading is initialized
GUI-Based Setup • Install secondary Drivers • Create Accounts • Machine and Administrator • Configure Network Settings • Build final System Tree and Registry
Troubleshooting Character-Based Setup NTHQ Tool • Located in Support Directory • Purpose is to show all hardware peripheral settings • Works with PCI, PnP and Legacy peripherals
Troubleshooting Character-Based Setup NTHQ Demo
Troubleshooting Character-Based Setup Unsupported Controller and BIOS Enhancements • 32-bit I/O • Enhanced Drive Access • Multiple Block Access or Rapid IDE • Power Management Features
Troubleshooting Character-Based Setup Setup Hangs During Initial Boot • Disable CD-Boot capability before installing • Needs to be done at both the Controller and BIOS levels
Troubleshooting Character-Based Setup Setup Cannot Find Hard Drive • Scan System for Viruses • Make certain there is valid Boot Sector on the Hard Drive
Troubleshooting Character-Based Setup Setup Cannot Find Hard Drive • If Hard Drive Controller is SCSI • Are devices properly terminated • Is SCSI BIOS enabled - first Controller (if at all) • On secondary Controllers, make certain BIOS is disabled • Partition and format using current Controller
Troubleshooting Character-Based Setup Setup Cannot Find Hard Drive • If Hard Drive Controller is IDE or EIDE • Make certain drive is on primary Controller Channel • Make certain drive is jumpered correctly • (i.e.) Master, Slave, Independent
Troubleshooting Character-Based Setup Setup Does Not Detect Hard Drive Controller Correctly • Manually select Controller type • Make certain that an NT 4.0 driver is being loaded • Use NTHQ Tool to check for correct IRQ and Memory addressing
Troubleshooting Character-Based Setup Setup Cannot Find a Valid Partition • If Windows 95 is on the system, back-up and Fdisk Hard Drive (no support for Fat 32) • Recreate Partitions and Format with DOS 6.22 • Restore Windows 95 and proceed with Windows NT installation • Make certain that correct HAL is being loaded
Troubleshooting Failure to RebootFrom Character-Based to GUI-Based Setup Stop Messages • Record Hex Value, 0x1e, 0x7b, etc. • Record Values in parentheses • Record component where failure occurred • Note where in Boot Process error occurred • Call PSS (installation support)
Troubleshooting Failure to Reboot Stop Messages Which can be Solved in the Field • 0x7b, (0x4,0,0,0), or 0x8b • Indicates problem with Master Boot Record • Scan for Viruses • Confirm correct Controller driver is loaded • Refresh Master Boot Record
Troubleshooting Failure to Reboot After Reboot, Video Remains “Black” • Check for devices using IRQ’s 2, 9 or 12 (PCI) • Scan Hard Drive for Viruses
Troubleshooting Failure to Reboot Stop Messages Which can be Solved in the Field • 0x1e or 0xa • Disable any Third-party services or drivers which were loaded prior to Upgrade • Use NTHQ to confirm appropriate Memory and IRQ settings
Troubleshooting GUI-Based Setup Issues Setup Will Not Read From CD-ROM Drive • Make certain CD is on HCL • Copy I386 directory to the Hard Drive and start again from the beginning • Make certain that the Controller and/or Hard Drive is correctly configured
Troubleshooting GUI-Based Setup Issues If Setup Fails During Copy of Files to Hard Drive • Disable all external Caches in BIOS • Make certain Hard Drives are terminated correctly; Active Preferred
Setup Enhancements in Windows NT 4.0 Bootable CD-ROM • Supports only El Torrito Specification • Can only be used in ‘No Emulation Mode’ • Must be supported by both System and SCSI BIOS
Setup Enhancements in Windows NT 4.0 Winnt Character-Based Setup Logging • Using Winnt or Winnt32 /L: • Logs all actions during character-based setup to find last successful action • Helps to isolate where setup halted without requiring special DLL’s
Setup Enhancements in Windows NT 4.0 Restartable GUI-Based Setup • If the machine fails during GUI-mode Setup; the problem can be fixed and setup will continue from reboot
Agenda • Setup (build overview) • Three phases of Setup • Character-Based Setup • Boot from Character-Based to GUI-Based Setup • GUI-Based Setup • Troubleshooting • (Blue Screens & Stop Codes) • Latest information for NT 4.0 • SP4
Debugging(the connection) • Connect • Modem, Null-modem cable, LAN • Boot.ini • / Debug /Debugport=com1 / Baudrate=19200 • Symbols • Retail NT CD (in the) support\debug\[platform]\symbols sub-directory
Interpreting Blue Screens • The error code and parameters at the top of the screen • The list of modules that have successfully loaded and initialized in the middle of the screen • The list of modules that are currently on the stack at the bottom of the screen
Stop Codes Note: For a complete listing of stop codes, see Windows NTW 4.0 Resource Kit, Chapter 39, “Windows NT Debugger”, or Q142657 article on http://support.microsoft.com
Common Stop Codes • 0xA • 0x1E • 0x24 • 0x3F • 0x50 • 0x7B • 0x7F • 0xC000021A
0xA • 0x0000000A IRQL_NOT_LESS_OR_EQUAL • Description • An attempt was made to touch paged out memory at a process interrupt request level (IRQL) that is too high. Code that runs at higher interrupt levels can’t touch paged-out memory because paging would be to expensive. If it happens that a pageable page is not committed, but it’s virtual address range is still in the translation buffer, high irql code can get away with touching it. But if the system is stressed – then the memory manager will have likely paged that page out and when an in page is attempted - the bugcheck will occur. So, this is why certain bugs tend to not show up on developers boxes which are less stressed than production. • Typical Scenarios • System configuration changes, virus scanners, other file I/O filters.
0x1E • 0x0000001E KMODE_EXCEPTION_NOT_HANDLED • Description • Essentially, this bugcheck identifies an error that occurred in a section of code where no error detection routines were in place. Most exceptions are generated directly in the section of code that is executing. In this case, the error was not trapped in the middle of the code that was executing. Therefore, the error was allowed to fall through to this default error handler. This makes the error a very common exception. The actual instruction fault is usually similar to a STOP 0xA – that is a memory access violation. • Typical Scenarios • Invalid or obsolete third-party driver or system service, Microsoft driver or system service bug, file I/O filter drivers.
0x24 • 0x00000024 NTFS_FILE_SYSTEM • Description • A STOP 0x24 is the result of NTFS code that detects a problem with the structure of the NTFS file system. This is not a cut and dried exception code and debugging it is sometimes difficult. Disk corruption can generate a STOP 0x23 (FAT_FILE_SYSTEM) and 0x24. However any processes involved in reading or writing data from a FAT or NTFS file system could cause the disk data to appear corrupted. Therefore SCSI and IDE drivers as well as the disk structure itself (hard errors, i.e. bad blocks) can be suspect. The file system calls this bug check in multiple places and this will help us identify the actual source line that generated the bug check. Also, this bugcheck can be caused by I/O filter drivers (resource hangs, race conditions, etc.). After the above is eliminated, more low-level constructs such as file system synchronization objects, scb attributes, etc. need to be examined by the debug engineer. • Typical Scenarios • This bugcheck is encountered when the NTFS file system has a corruption, or the hard drive has a bad block.
0x3F • 0x0000003F NO_MORE_SYSTEM_PTES • Description • This stop isn’t as common as most of the others in this section, but a good explanation is warranted. A STOP 0x3F is the result of a system doing lots of I/O, therefor fragmenting the system PTE’s. The bugcheck occurs not because the system is out of PTE's, but because a driver requests a huge chunk of memory that can’t be satisfied because a contiguous block that big isn’t available. • Typical Scenarios • Often video drivers will allocate large amounts of kernel memory that must succeed. Also, some backup programs do the same. • For these situations, consult a PSS engineer for the Registry hack that allows the increase of total system PTE’s.
0x50 • 0x00000050 PAGE_FAULT_IN_NONPAGED_AREA • Description • A STOP 0x50 is caused when a memory region that is not supposed to be paged out (usually for performance reasons) is paged out. This stop can be caused by a variety of problems including corrupt NTFS volumes, bad network packet data, and in general kernel mode drivers that corrupt memory. Also, drivers that free an MDL but don’t communicate it to all portions of the driver. Others include Disk, Controller, and Disk Driver problems. • Typical Scenarios • Usually third-party kernel mode drivers munging memory, or reading beyond allowable memory. Also, when the file system is pushed to the tested limits (large Mac volumes), bugs in NTFS are exposed that result in this STOP. This STOP can occur due to interaction problems between SCSI Controller firmware and Hard Drive firmware.
0x7B • 0x0000007B INACCESSIBLE_BOOT_DEVICE • Description • During the initialization of the I/O system, the driver for the boot device may have failed to initialize the device that the system is attempting to boot from, or the file system that is supposed to read that device may have either failed its initialization or simply not recognized the data on the boot device as a file system structure. • If this is the initial setup of the system, this error may have occurred because the system was installed on an unsupported Hard Disk or SCSI Controller. • This error can also be caused by the installation of a new SCSI Adapter or Hard Disk Controller or by repartitioning the Hard Disk with the System Partition. • Typical Scenarios • VIRUS • LBA type problems, MBR type problems, SCSI Controller/Hard Drive geometry issues, etc.
0x7F • 0x0000007F UNEXPECTED_KERNEL_MODE_TRAP • Description • This error means a trap occurred in kernel mode, either a kind of trap that the kernel is not allowed to have or catch (a bound trap), or a kind of trap that is always instant death (double fault). • Typical Scenarios • Hardware, kernel mode drivers that manipulate critical system data in an untimely fashion. • This STOP most often is the result of the processor taking a double 0x7f (8,0,0,0). Note that these parameters can also show up for a modern software issue involving Netmon (bhnt.sys).
0xC000021A • 0xC000021A FATAL_SYSTEM_ERROR • Description • This is a typical description that accompanies this error: The Windows Subsystem System process terminated unexpectedly with a status of (0x6130F2B6 0x01B6FBA4). The system has been shutdown. • The failing process sometimes is listed in the blue screen itself. • This bugcheck occurs when a user-mode subsystem such as Winlogon or CSRSS is fatally compromised such that security can not be guaranteed. The Operating System makes a transition into kernel mode and throws this exception. • Typical Scenarios • A typical cause of this crash would be an extensible perfmon counter that overwrites it’s Winlogon shared data buffer (Q171033), and in general any access violation that compromises a user-mode subsystem.
Agenda • Setup (build overview) • Hardware Compatibility List • Three Phases of Setup • Character-Based Setup • Boot from Character-Based to GUI-Based Setup • GUI-Based Setup • Troubleshooting • (Blue Screens & Stop Codes) • Latest Information for NT 4.0 • SP4
A Day in the Life Video
NT4 Service Pack 4 • Contents • Hotfixes for important customer-reported problems • Resource and memory leak bugfixes from NT5 • 30+ support, diagnostic and repair tools from the NT Resource Kit are included on the SP4 CDROM • Event log entries for clean and dirty shutdown • Process Improvements • Dedicated Service Pack test team • Beta Program for Service Packs • Improving the Knowledge Base, depth and ease of use • Slipstreaming Service Packs into OEM releases
Resource / Memory Leaks • Problem • Leaks lead to hung systems and bluescreen crashes • Some customers do “preventive reboots” • Difficult to stop or kill the offending process • Solutions • Fix leaks: several hundred in NT5, key fixes in NT4 SP4 • Job objects in NT5, set memory limits on a collection of processes • Visual Studio adding leak checking to MFC and CRT • Next Work Items • Better leak detection • Logging in under low resource conditions • Stopping and killing processes
Bugchecks (Blue Screens) • Kernel mode code detected a serious error • Blue screens are still frequent and very hard to diagnose • Crash dumps take too long on large memory systems • Prevention • Find and fix bugs in our code • Review all calls to KEbugcheck by NT5 RTM • Improve diagnosis • Reduced clutter on the blue screen, focus on key data, and add hints • Crash dumps are now dramatically faster in NT5 • Developing comprehensive crashdump analysis tools for NT4 and NT5
Stop 0x0000001E ( 0xC0000005, 0xFDE38AF9, 0x00000001, 0x7E8B0EB4 ) KMODE_EXCEPTION_NOT_HANDLED Address <x> has base at <x> - <filename> <manufacturer> <version> If this is the first time you've seen this Stop error screen, restart your computer. If this screen appears again, follow these steps: Check to make sure any new hardware or software is properly installed. If this is a new installation, ask your hardware or software manufacturer for any Windows NT updates you might need. If problems continue, disable or remove any newly installed hardware or software. Disable BIOS memory options such as caching or shadowing. If you need to use Safe Mode to remove or disable components, restart your computer, press F8 to select Advanced Startup Options, and then select Safe Mode. Refer to your Getting Started manual for more information on troubleshooting Stop errors. Bugchecks (Blue Screens)
3rd Party Drivers • Problem • One of the most common complaints from PSS • Source of pool corruption - difficult to diagnose • Solution • DDK driver samples and documentation is improved in NT5 • Enhanced driver testing in NT4 and NT5, including pool corruption tests • NT5 will have driver signing, “warning” level by default • WDM drivers will drive higher quality • We are testing major third-party anti-virus software regularly
Unnecessary Reboots in NT5 • Problem • Hardware and software configuration and maintenance • Solutions • Fixed 50 software configuration cases which required a reboot in NT4. Key fixes include: • Adding, removing and configuring network protocols; changing IP addresses • Reconfiguring settings on PCI and other PnP hardware • Reboots still required for some rare cases • Machine name change, domain membership changes, system locale and system font changes, service pack installation • Hardware reconfiguration by clustering solutions in NTS/E • Where possible, hotfixes will avoid requiring a reboot
Diagnosis and Recovery • Recovery Involves • Detection (hard with a hung application or server) • Diagnosis (need good tools, need parallel installs, bad error messages) • System Recovery (chkdsk, crash dump biggest time hits) • Application recovery (SQL, Exchange Store, etc) • We are delivering • 30+ of the most critical support, diagnostic, and repair tools in SP4 and NT5 B2 • Fixing 35 worst error messages by B2+30, then next 200 as time allows • NT5 Safe-mode Boot today and Floppy Boot by NT5 RTM • Both support NTFS • Web-based trouble-shooter for most common bluescreens • Online chkdsk post NT5
NT Test Initiatives • Long duration Server stress • 10 Servers running stress for a month+ starting at NT5 Beta 2 • Mix of stress including BackOffice, IIS, Client/Server, etc • Specifically watching for memory and resource leaks • Improved driver testing for NT4 and NT5 • Catch pool corruption • Fault injection • Better integration testing of Server applications • BackOffice applications: Exchange, SQL Server • Using automated scripts from BackOffice teams • Testing with Oracle, SAP R/3, Lotus Notes • 100 Top Server Applications from Tier 1 RDP customers • Expanded tests for customer configurations • RDP Customer configurations, ISP