620 likes | 927 Views
Kapil Ramlal (KappA) Escalation Engineer. Troubleshooting Tools and Methodology in a Citrix XenApp 5.0 Environment. Agenda. XenApp troubleshooting. The right tool, right place at the right time. Troubleshooting scenarios. Top utilities. Case studies. Additional resources/Q&A. Agenda.
E N D
Kapil Ramlal (KappA) Escalation Engineer Troubleshooting Tools and Methodology in a Citrix XenApp 5.0 Environment
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
Agenda XenApp troubleshooting The right tool, right place at the right time Troubleshooting scenarios Top utilities Case studies Additional resources/Q&A
XenApp troubleshooting Understanding the infrastructure The anatomy of a XenApp farm • Information:Static and Dynamic • Components: Where to focus troubleshooting Understanding what happens from logon to launch • Types of issues: Denial of service, bottlenecks • Troubleshooting: Medevac, performance monitoring, CDF…
Types of Information Static Dynamic LHC • Data Store • Does not change frequently • Farm configuration • Changes made in the Management Console • Dynamic Store • Constantly changing information • Load management • Information required for application launch DATA STORE
Logon to launch Active Directory XML Broker Client Web Interface Least Loaded Server Zone Data Collector Data Store
MedEvac (CTX107935) • The XML Broker tests • Verifies that the XML Service is able to respond to an XML / client request • XML is able to contact the Zone Data Collector • Zone Data Collector tests • Verifies that the ZDC can provide the address of the least loaded server for the requested app • The IMA Service is able to respond • The IMA Service can read the Local Host Cache • The IMA Service can read it’s Dynamic Store • Least Loaded Server tests • Verifies that Terminal Service is able to respond • Verifies that the RPC Service is able to respond
How to Monitor Farm Health using MedEvac? • See knowledge center article CTX119899
RSOP CDF CDF Monitoring Active Directory XML Broker XML Threads Client Web Interface ASP Requests Zone Data Collector IMA Work Item Queues IMA %CPU time Zone Elections Won
XenApp 5.0 Health Monitoring and Recovery • Enterprise & Platinum Editions of XenApp • Performs tests to monitor state and identify health risks • Terminal Services tests • XML Service test • Citrix IMA Service test • Logon Monitor test • Check DNS test • Local Host Cache test • XML threads test • Citrix Print Manager Service test • Microsoft Print Spooler test • ICA Listener test • See page 307 of the XenApp 5.0 Administrator’s Guide (CTX115519) for information
Free the ZDC! Large Farm Tips • Limit additional roles on Zone Data Collectors • Limit the number of zones in the environment • Do not run management consoles on or pointed to the ZDCs • Read the Key Infrastructure Tuning article: CTX116492
The evolution continues! • Citrix XenApp 5.0 opens the door for delivering resources on Windows Server 2008 • Clients are also adopting more Windows Vista users • Say hello to the next generation troubleshooting artillery for the XenApp 5 environment • Existing tools have been updated, and new tools introduced • The evolution continues!
The right tool, right place at the right time • DON'T • Use troubleshooting tools just because you can • Recommend tools that are not relevant to the problem • Use troubleshooting tools without understanding their impact of the environment • DO • Use tools to help automate time consuming tasks • Use tools at the right time, such as when the problem is occurring and not afterwards • Understand what the tool is trying to accomplish, so that the right data is obtained • Use tools with a clear purpose • Maintain a local toolkit, so that the right tools are always available in times of crisis
Common Diagnostic Facility (CDF) • Provides the ability to collect traces for problem diagnosis on Citrix binaries without disrupting the services or users • Citrix’s standard debug tracing facility • Efficient and non-intrusive data collection process • Enabled without stopping and starting services • Faster & easier tracing for retail modules • Flexible & customizable troubleshooting facility • Consistency across most Citrix products
CDF Basics • To better understand what a CDF trace message is, let’s look at the following pseudo code example • In the example, the function belongs to a service, which can be considered to be a Trace Provider (more on this later)
The moral of the story • We could capture a CDF trace to determine if the CitrixFeatureDLL.dll loaded successfully • How difficult it would be to debug without having this tracing? • You need special symbol files to be able to read the trace messages (TMF files) • This allows certain information to remain private as needed (similar to .pdb files) • You get more by default!
CDF Internals • To better understand CDF, let’s take a quick overview at how the Operating System supports Event Tracing (ETW)
CONTROLLER CONSUMER CDFCONTROL Events Windows Events Event Tracing for Windows Enable/ Disable Events Buffers Events Events Trace File CDM.sys RadeSvc.exe WFShell.exe
ETW Components • Providers: • Modules containing tracing, that can be enabled or disabled • Example: MF_Driver_Cdm (Cdm.sys) • Controllers: • Enables/Disables a provider • Configures trace capture settings • Starts/Stops a trace • Consumer: • Reads trace events from log file • Reads trace events real-time from a trace session
CDFControl v2.5 • CDFControl is a hybrid controller and consumer • It can start/stop/enable and configure an ETW/CDF trace session • It can consume (read) trace events from a log file, or from a live real-time trace session • The original version operated only as a ETW Controller, and was published under CTX111961
Troubleshooting scenarios • Application Streaming • Seamless/Multi-Monitor • 3rd Party Applications • CPU Spikes • Deadlocks/Hangs • Database • Network • Black Hole Effect • XenApp Plugin (PNA) • Debugging
Application Streaming What happens on the client side? • manifest file • executable • AIE rules • .dll’s • data files • other .exe’s • .dll’s • data files • other .exe’s • .dll’s • data files • other .exe’s RAD file End User Network File Servers Streaming Client and AIE • End user launches app from WI or PN Agent • RAD file is downloaded • RAD file launches client Application Isolation Environment (AIE) • RAD file instructs streaming client to download: • Manifest file | AIE rules | Application executable | Pre and post execution scripts • Streaming client launches executable according to instructions in manifest file and AIE rules including pre and post execution scripts and registers with the ctxsbx.sys (redirector) • Application is available to user • Streaming Client requests additional files as required, checking first in the client cache, then if necessary, downloading additional files from the file server
Application Streaming • Isolate the Issue • When? • Profiling • Publishing • Streaming • How? • Streaming to Server • Streaming to Client • Versions? • WI 4.5, 5.0 • License server 4.5,5.0 • Client
Application Streaming Streaming Client Troubleshooting: • Client installation is required on workstations • Verify the Citrix Streaming Service is started or restart • Reference CTX116483 – required permissions • Enable debug console • HKEY_LOCAL_MACHINE\Software\Citrix\Rade • REG_DWORD: “EnableDebugConsole” • Value: 1 to switch on, 0 to switch off
Application Streaming Leverage realtime CDF tracing! • Run CDFControl on the client (where client is installed) • Choose the Application Streaming category • Enable realtime tracing • Provide a TMF path (CTX106233) • Start tracing and reproduce the launch failure
Seamless/Multi-Monitor SEAMLESS HOST COMPONENTS Winlogon Default winlogon.exe sehook20.dll sehook20.dll wfshell.exe seamls20.dll icactls.dll icast.exe TWIWorker TWIReader TWISysTrayAgent ICA Client
Seamless/Multi-Monitor SEAMLESS CLIENT COMPONENTS wfica32.exe vdtwin30.dll vdtwn.dll ctxsrcc.lib GAI LVB
Seamless/Multi-Monitor Multi-Monitor • An optional component • Client provides a monitor layout via thinwire channel which is shared by all process loading mmhook.dll via shared memory • Work area change is always posted to host. This could be due to change in work area of the existing area or change in virtual screen size due to addition /deletion of monitors. • API hooks are controlled by flags and can be customized per process. Refer to CTX115637 for various configuration options
Seamless/Multi-Monitor • Shift F2 to change to Full Screen mode • Reconnect as fixed size window session • Set global flags, 0x26DEA7, to see if it fixes the issue. • This is combination of following flags (See CTX101644 for details of each bit) • 0x1 (Disable session sharing), 0x2 (Disable modality check), 0x4 (Disable AA hook) • Analyze CDF trace for MF_DLL_CTXNOTIF and MF_SESSION_TWI • Analyze window information using SPY++/Window History/Message History • Try per-window exception flags • Analyze application logic (API flow) using TracePlus utility
Seamless/Multi-Monitor • Get the Window class name which is exhibiting the problem • Collect the CDF traces for concerned module ONLY • CTXNOTIF, MMHOOK, TWCDS, TWI, TWI_HOOK • Analyze the behavioral aspect that could be affected by hooks??? • Enable disable/ Does it happen on single monitor too? If yes, chances are very little. Disable mmhook and see what happens? • Compare the window styles at host and client • For seamless specific issue, verify if it happens in ICA Desktop/RDP also.
3rd Party Applications • How does the application work? • Is it Native, or does it run on a Framework, such as .NET or Java? • Do you have the right versions of the Framework installed? • Are the correct dependencies present, and does it work at the console? • Does it require certain file and registry access? (Does it need Write permissions etc. ?) • Does it require component registration? • Inspect core functionality • View the application/process under an analysis tool such as ProcessExplorer or WinDbg • Inspect all loaded modules (DLLs) by the application • Validate any dependencies (missing DLL's?) • Inspect named events and handle usage (synchronization/resource problems?) • Validate file and registry access using ProcessMonitor • Run application under the AppVerifier utility to check for a multitude of issues
3rd Party Applications • Leverage the Global Flags for user-mode applications using the Gflags utility • Set 3rd party application to run under Image File Executions • Configure a debugger to invoke the application (such as WinDbg) • When the application launches, the debugger will automatically attach to the process and halt its execution! • This gives the opportunity to explore all application threads from process initialization (~*kb) • From here the internals of the application can be understood at the Native Windows API level (i.e. Which Windows API's are being used)
3rd Party Applications • Use ProcessExplorer to view the loaded modules for a process, and check for the presence of any hook modules (hooking DLL's) • Hook modules can alter the natural behavior of applications, which can sometimes cause problems • Try excluding the problem application from all Citrix hooks (CTX107825)
CPU Spikes • Try to define a pattern (leverage perfmon) • Determine offending Thread ID causing the spike (Process Explorer, QSlice) • Obtain userdump of offending process immediately after (Userdump.exe, WinDbg.exe) • Check CDF trace for repeated (looping) messages (if Citrix component) • Use application spy to look at what the application is doing (TracePlus, Logger)
Deadlocks • Windows Vista and Server 2008 offer the new Wait Chain Traversal (WCT) API! • This offers applications a mechanism to check internally for wait conditions, and also allows for custom tools to be created which can also check for application hangs – LIVE! • No cool WCT tools available? The debugger is your friend! • Attach to hung process/service and generate a dump for post-mortem analysis: • .dump /ma c:\PathToDump\DeadlockedApp.dmp • Manually inspect thread states, and get the debugger's opinion with: • !analyze -hang -v THE WINDOWS TASK MANAGER CAN CAPTURE USER DUMPS IN VISTA & 2008!!!
Slow logons • Understand the logon process and Identify the slowdown! • Validate via network trace that the connection between server to client is good • If the connection makes it to the server, check which processes exist • Use TaskManager and sort by session ID • Gather userdumps for each process for the slow session to try to identify any synchronization problems, such as LPC and ALPC wait chain conditions • Ensure Terminal Services is running (svchost.exe) and that the thread count appears normal • Ensure critical Citrix processes are okay, such as IMA, CpSvc and XML
The XenApp client • PNAgent.exe starts up and communicates with PNAMain.exe to share application launch, and shortcut details • PNAMain.exe initiates communication with the Web Server for application requests and config.xml settings • WFCRun32.exe works with WFICA32.exe to launch an application • Best to use a live-debug approach as there is no inherent tracing readily available on the client
The XenApp client For single sign-on problems ensure: • PNSSONis at the top of the network provider list • SSONSVR is running • Nothing is causing any logon delays (such as 3rd party monitoring applications etc.) as this would cause the SSON ticket to expire, therefore causing SSONSVR to exit • Enable a default debugger to look out for any unexpected termination of the client processes
Debugging • User Mode versus Kernel Mode • The Windows operating system can be conceptually divided into 2 parts: • User Space (User Mode) • Kernel Space (Kernel Mode) • Applications run in User Mode • System drivers run in Kernel Mode (Privileged Mode)
USER SPACE USER APPLICATION USER APPLICATION USER MODE USER APPLICATION USER APPLICATION USER APPLICATION KERNEL SPACE rusb2w2k.sys keyboard.sys win32k.sys tcpip.sys […]