370 likes | 868 Views
SCO Unix Diagnostics and Troubleshooting Alexander Sack ( alexs@sco.com ) Senior Software Engineer. Intro Initial System Load (ISL) Common Hardware and Driver Issues System Tuning Networking Tips Reporting Problems Q & A. Agenda. Before installing…
E N D
SCO Unix Diagnostics and TroubleshootingAlexander Sack (alexs@sco.com)Senior Software Engineer
Intro Initial System Load (ISL) Common Hardware and Driver Issues System Tuning Networking Tips Reporting Problems Q & A Agenda
Before installing… Has the system itself been certified by the OEM? Is the motherboard in the CHWP? (Intel whitebox) Is it compatible kinda sorta maybe? Do I need a third-party HBA diskette? Network card supported? Does X support my graphic chipset? Disk layout issues, multi-boot? ISL: Overview
“Alt-SysReq-H” or “Alt-Ctrl-H” to enter console mode “Alt-SysReq-F1” or “Alt-Ctrl-F1” to go back to install screens Acess to resmgr, ISL scripts (/isl/ui_modules), note any console messages during install IVAR_DEBUG_ALL=1 Dumps log files in /tmp/log Transfer logs to floppy via cpio E.g. find /tmp/log/* | cpio –oc –O /dev/dsk/f03ht cpio –ic –I /dev/dsk/f03ht ISL: Debugging
Problem: Installation sees more processors than actually present Reasons: Bad MPS tables Cores listed as physical CPUs in BIOS Limited ACPI support (OSR5 only) Solution: Boot in single processor mode (ATUP) and apply latest MP/SMP pack ACPI=Y, USE_XAPIC=Y, ENABLE_JT=Y, MULTICORE=N Flash BIOS ISL: Issues
Problem: Kernel hangs on boot-up Reasons: Missing interrupts Mixed stepping processors Solution: Boot in single processor mode (ATUP) Reverse stepped processors, make the LOWER stepping processor in slot 1 Check BIOS settings, ACPI vs. MPS Move add-on PCI card to a different slot PnP set to OFF in BIOS ISL: Issues
Problem: Can not load a HBA from USB floppy Reasons: BIOS does not support legacy mode (OSR5 only) “Device enumeration timeout” USB is disabled in the BIOS ISL CD left in tray Solution: Check USB BIOS settings Re-plug USB floppy device, verify sdiconfig output on console Follow TA article on renaming disk nodes Remove CD before load Make sure disk was created correctly, dd image to p0 not s0 Try a different USB floppy device ISL: Issues
Problem: Root HBA not found after the DCU runs Reasons: Didn’t load the right third-party HBA Software based RAID issues Valid media kit USB floppy wasn’t really picked up (ISL will use CD1 for HBA drivers from an ATAPI drive) Solution: Disconnect USB floppy after HBA loads Bind third-party resmgr entry to HBA driver manually via DCU Check resmgr entry BOARDID and verify that HBA really supports the card Download a later driver from IHV website ISL: Issues
Problem: SATA or IDE hangs after loading or fails to recognize my devices Reasons: Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec, Silicon Image, Marvell) Solution: Check cables and jumpers Change mode in BIOS: Legacy, Compatible, Enhanced, AHCI ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA) ISL: Issues
Problem: Red screen during mount of CD Reasons: Missed interrupts (polling messages) DMA incompatibility Driver in slave only configuration (OSR6/UW7) SATA/PATA card uses custom third-party driver (e.g. Adaptec, Silicon Image, Marvell) Solution: Check cables and jumpers Change mode in BIOS: Legacy, Compatible, Enhanced, AHCI ATAPI_DMA_DISABLE=Y Avoid cable select (legacy PATA) ISL: Issues
Problem: NIC is not auto-detected Reasons: Driver on ISL media is older than card Driver issues with card, driver loads but fails Solution: Defer networking and pkgadd drivers after install After install, use SCOadmin Network to configure card Bind entry to particular NIC driver if card is within the same family via DCU Stick in another card! ISL: Issues
Problem: vfs_mountroot() failure Reasons: Driver on ISL media is older than card Driver issues with card, driver loads but fails “$static” not added to ROOT HBA sdevice file Solution: Follow TA to mount disk from ISL Use the RECUT media Make sure you are using the latest HBA driver ISL: Issues
Problem: Screen goes blank after logo appears Reasons: VESA mode is not supported by card On-board chipset uses system memory for framebuffer Solution: AGP Gart is now supported, install latest maintenance pack USE_VESA_BIOS=Y Use a supported graphics chipset! ISL: Issues
Problem: Filesystem is left dirty after ISL and every reboot Reasons: Aggressive BIOS Power Management RAID battery failure Target issues – CHECK CONDITIONS Older driver and the write cache Solution: Check RAID battery levels Check HBA and target firmware revision Update to latest driver ISL: Issues
Problem: Installed one OS and another one won’t boot Reasons: OSR5 8GB limit UW7/OSR6 128GB limit OSR5 on the first partition of a drive is recommended MBR rewritten Solution: Use CD1 to boot-up and execute fdisk to rewrite MBR from UW7/OSR6 fdisk Use a third-party boot loader like GRUB ISL: Issues
Problem: Failing to create large logical volumes Reasons: VXFS technical 2TB limit OSR6/UW7 1TB physical capacity limit HTFS has issues with greater than 1TB filesystems (slow) RAID utility issues Solution: Use VXFS and ODM Split volumes in 1TB chunks Use RAID BIOS or OEM utility if possible to always setup volumes ISL: Issues
Problem: ISL load time is very slow Reasons: ATAPI DMA is disabled Write caching is disabled Media errors Faulty hardware Solution: Check IDE/SATA settings Some OEM disable write caching which makes install slow – future boot parameter Check hardware and BIOS settings ISL: Issues
Problem: Kernel link failure at end of ISL Reasons: IRQ conflicts in System driver file Driver configuration build error Solution: Check BIOS settings Disable serial or legacy devices you don’t need Chroot into fresh install and check build files Update HBA drivers if available ISL: Issues
Problem: Kernel panics on boot-up Reasons: Full moon out You weren’t nice to the machine that day The customer is out to get you Solution: Boot in single processor mode Disable USB via boot parameter or BIOS Take note if possible of the stack trace to discern error Cry to the OEM Cry to SCO support ISL: Issues
Migrating OSR5 disk to OSR6 Install wd supplement before migration! Administer the disk at the source system FIRST before migration OSR6 Divvy now works on OSR5 (wd) and OSR6 disks Limitations: There is no conversion for UW VTOC disks to dual format OSR6 OSR6 does not support extended VTOC slices Always back your data before migration! Hardware and Driver Issues: Disk migration
All Intel based processors are multi-core! ACPI is required to fully support multi-core (OSR6/UW7) OSR5 supports multi-core provided MPS tables are sane – has some ACPI support (HT) OEMs have stopped testing MPS table! SCO licenses per CPU package not core (industry standard) Mixed steppings headaches Hardware and Driver Issues: Multi-core
What driver to use? If in doubt, always use the driver diskette with the higher IHVVERSION in it! Supported cards can be found in the Drvmap files of the HBA driver/btld package http://pciids.sourceforge.net/ Sometimes adding a OEM branded BOARDID will work – sometimes it will panic your system! “echo pcilong | ndcfg” Management utilities are packaged with the driver if available Recut media and maintenance packs include latest drivers Read the README posted on the SCO download area! Hardware and Driver Issues: HBAs
Migrating from OSR5 to OSR6 DO NOT BLINDLY import OSR5 tunables from OSR6 E.g. buffer cache has different use on OSR6 Identify the performance problem you are trying to solve first! [ GOLDEN RULE ] Take measurements /etc/conf/bin/idtune SCOadmin has wrapper for idtune System Tuning: General
Performance Tuning Identify bottleneck Rtpm, prfstat, sar, prof, lprof CPU performance sar –u 00:00:00 %usr %sys %wio %idle %intr 00:00:01 30 10 10 46 4 high usr, investigate with truss, prof high sys, intr, investigate with prfstat high wio, storage throughput System Tuning: Performance
Storage Performance Hardware configuration Device topology don’t connect slow devices and fast devices on the same bus e.g. put your slow tape drive on a separate controller Cabling ensure your cables are up to specifications Hardware RAID performance RAID 0 vs integrity RAID 1 RAID 5 Filesystem tuning fsadm, block size, increase logsize (@ mkfs only) mount options; tmplog ODM dramatic performance boost for $99 System Tuning: Storage
Memory Avoid swapping DEDICATED_MEMORY, use if using shared memory mkdev dedicated Dedicated memory reserves physical Saves kernel virtual Reduces paging, uses large mappings (PSE) SEGKMEM_PSE_BYTES Add more memory! System Tuning: Memory
Tuning for largefile support HDATLIM, SDATLIM, HVMMLIM, SVMMLIM, HFSZLIM, SFSZLIM set to 0x7fffffff (unlimited) /etc/conf/bin/idbuild –B && init 6 fsadm /mountpoint or raw device fsadm –o largefiles / OSR6 defaults to largefiles, UW7 does not Building large file aware applications -D_FILE_OFFSET_BITS=64 System Tuning: Filesystem
Network configuration netconfig drivers installed in /etc/inst/nd/ bcfg files are parsed by ndcfg /etc/confnet.d/inet/interface is configured at boot /etc/tcp (c.f. S69inet on UW) is run to link the driver into dlpi - initialize -U STREAMS based network stack ndcfg useful for displaying info about the system geared toward network device driver writers Networking Tips: Configuration
Network monitoring & tuning tools netstat ifconfig inconfig ndstat ndcfg traceroute ping Tcpdump dlpid logging dlpid –l <logfile> /etc/inst/nd/dlpidPIPE or edit /etc/default/dlpid LOG=<logfile> NIC failover automatically and transparently switch to a backup NIC in the event of failure of the primary Chains of backup NICs supported Networking Tips: Tuning and Tools
Network is UP but can’t connect to other systems is DNS configured correctly? netstat –rna do you have a default route? Network performance is poor check cabling ndstat –l collisions inconfig nfsstat Networking Tips: Commons Issues
Network responds to pings but can’t login are the daemons running ? licensed ? Multiple hosts with the same IP or MAC arp –an (-n disable name resolution) ? (132.147.103.1) at xx:xx:xx:xx:xx:xx (802.3) ? (132.147.103.9) at xx:xx:xx:xx:xx:xx (802.3) Stopping and starting the interface ifconfig net0 down /etc/tcp stop – daemons stopped, NIC is UP /etc/tcp shutdown – everything down /etc/nd stop start Networking Tips: Common Issues
crash Primarily used for panic analysis /var/spool/dump dumpmemory to generate a crash dump on a live system crash –a <dumpfile>; will produce a listing suitable for SCO support provide dumpfile, /stand/unix, all of /etc/conf/mod.d, /usr/sbin/crash Useful crash commands ps, as, trace, u, eng, od, addstruct, help walk data structures using od od –f ksh style history buffer lsof, can save hours of fun on a live system Reporting Problems
When reporting problems to support: Establish a reproducible case (if possible) Save any crash related files Note stack trace, crash -a Save system log files /var/adm/ Include hardware specs when filing a bug run sysinfo Be aware of changes made to /stand/boot bootparam Reporting Problems