170 likes | 190 Views
Operations Orchestration. Use Cases at Halliburton. CloseIncidentFromCauseCode. This flow is launched from openview on an end policy.
E N D
Operations Orchestration Use Cases at Halliburton
CloseIncidentFromCauseCode • This flow is launched from openview on an end policy. • It finds all open Peregrine(Service Center) tickets with related causecodes (where the cause code is <$MSG_NODE_NAME>:<$MSG_OBJECT> - from OpenView) and closes them. After successfully closing these tickets it annotates the original message (in OpenView) and acknowledges it as well, even if it is in the acknowledged group of OpenView. • Inputs: • msgID - ID of openview message that launched this flow • causecode - <$MSG_NODE_NAME>:<$MSG_OBJECT> of openview message that launched this flow
CloseIncidentFromCauseCode • This flow is used to eliminate redundant Service Center Tickets that have been opened by our NOC or even automatically by OO. It will close related tickets automatically. It is one of our high value (ROI) flows, as it can take a long time for a NOC specialist to go through these tickets in Service Center.
CreateRtoTicket_proc • In Halliburton, we have a special group called “Real Time Operations – RTO” This group is responsible for monitoring progress made at a remote Oil Well Drilling site. These “Virtual” (VMWARE) servers can be found on land, on board a drilling ship, or an oil rig. There have been a few very critical processes identified that can consume to much Windows processor time. From an OpenView trigger, we can identify if one or more of these critical processes are in a runaway mode. If that is the case, this flow will create a Service Center (Peregrine) trouble ticket that will track the problem and send emails and escalation emails to support engineers at a special 24x7 NOC. This is another high value flow that saves us a lot of NOC operator time.
Flow Step within the flow “CreateRtoTicket_proc”called “RelaceStringsWithNodeName” • Several of the steps in this flow had to be executed by using scripts that allow us to get specific data from the OpenView database. These scripts actually use WMI to retrieve this information. We are not making direct connection to the OpenView database, but rather using well documented WMI queries within these scripts to get our desired information about a particular OpenView alarm.
Example VBScript used to retrieve information about a particular OpenView Alarm for use in the flow • '============================================================================ • ' This VB script to get the server name, causecode and ticket title for a PAS • ' flow. • ' The inputs of this script are: • ' msgID, ServerName, causecode, title • ' The outputs of this script are: • ' ServerName, causecode, title • ' The script will run on management server (HOUHPOV001H). • '---------------------------------------------------------------------------- • ' Date Name Modification Details • '---------------------------------------------------------------------------- • ' 05/02/2008 Jin Liu initial implementation • '============================================================================ • Const WMIMsg = "WinMgmts:{impersonationLevel=impersonate}!root/HewlettPackard/OpenView/Data:OV_Message.Id=" • Const WMINode = "WinMgmts:{impersonationLevel=impersonate}!root/HewlettPackard/OpenView/Data:OV_ManagedNode.Name=" • Set oArgs=wscript.arguments • If (oArgs.Count < 3) Then • Wscript.Echo "Missing arguments!" • WScript.Quit -1 • end if • MsgId = oArgs.Item(0) • causecode = oArgs.Item(1) • title = oArgs.Item(2) • 'Wscript.Echo "MsgID = " & MsgId • MsgPath = WMIMsg & """" & MsgId & """" • Set OV_Message = GetObject(MsgPath) • NodeId = OV_Message.NodeName • ' Getting node properties • Dim Caption, Primary, NodePath, OV_ManagedNode • NodePath = WMINode & """" & NodeId & """" • Set OV_ManagedNode = GetObject(NodePath) • NodeName = OV_ManagedNode.PrimaryNodeName • if (NodeName = "rto_urls.corp.halliburton.com") then • NodeName = "HOUHPOV200.corp.halliburton.com" • end if • Wscript.Echo "ServerName: " & NodeName
' get the causecode • tempArr = Split(causecode, ":") • count = UBOUND(tempArr) • tempArr(0) = NodeName • causecode = NodeName • for i = 1 to count step 1 • causecode = causecode + ":" + tempArr(i) • next • if (LEN(causecode) > 64) then • causecode = Left(causecode, 64) ' Peregrine field limit • end if • Wscript.Echo "causecode: " & causecode • ' get the title • tempArr2 = Split(title, ":") • count2 = UBOUND(tempArr2) • tempArr2(1) = NodeName • title = tempArr2(0) • for i = 1 to count2 step 1 • title = title + ":" + tempArr2(i) • next • if (LEN(title) > 100) then • title = Left(title, 100) ' Peregrine field limit • end if • Wscript.Echo "title: " & title
UpdateVirusDefFiles_new • The trigger for this flow comes from an OpenView policy which discovers if a virus definition file is older than 8 days. The fact that the Virus Definition files are out of date is an exception, most servers do this update automatically every so often by a configured schedule. The HP OpenView policy is a scheduled policy that runs once per day and can discover if a particular server has this problem. This flow re-checks the dates on the virus definition files of Symantec Antivirus. If the dates are older than 8 days, this flow will copy the self extraction file symcdefsx86.exe (or the 64 bit equivalent file) from a central location, which is downloaded everyday from the Symatec ftp site. Then, the flow runs this self extraction file silently to update the virus definition files on the specified server. When the extraction is done, the flow checks the two Symantec services (Symantec AntiVirus and DefWatch) to make sure that they are running. • The steps in this flow are: • 1. Run a VB script to check if the virus definition files are current. • 2. If the files are current, acknowledge the alert (in OpenView). • 3. If the files are not current, create a Service Center (Peregrine) ticket. If there is an error, terminate the flow and report the error. • 4. Copy symcdefsx86.exe from a central location. If there is an error, terminate the flow and report the error. • 5. Extract the virus definition files from symcdefsx86.exe silently. If there is an error, terminate the flow and report the error. • 6. Check if the Symantec AntiVirus service is running. If the service is not running, start the service. • 7. Check if the DefWatch service is running. If the service is not running, start the service. • 8. If there is no error in step 6 and 7, close the Peregrine ticket. • 9. If there is no error in in step 8, acknowledge the alert. • 10. Complete the flow successfully. • Responses: • success - the flow completed successfully. • failure - an error occurred.
UpdateVirusDefFiles_new • This another high value OO Flow. It used to take Support Engineers or NOC Operators a considerable amount of time to fix all the servers that had some problem with their automatic Virus Definition file updates. We are able to automatically fix the systems that do not have the current files in over 90% of the cases we encounter. Some systems are to low in free disk space to load the files or have some problem that prevents the Norton Daemons from running. These systems end up being identified as problems that must be passed on to server support specialists via the Service Center trouble tickets and direct emails.
The next slide is from our Dashboard screen and shows this past weeks runs. I forgot to add an ROI value on one of the flows “CreateRTOTicket_proc”. I added that today while preparing this use case presentation.