330 likes | 527 Views
Regular Expressions and XML Parsing. Objectives. After this session you should be able to: Understand and write Regular Expressions Create XML code that will use Regular Expressions to parse data from providers into parameters. Regular Expressions.
E N D
Objectives After this session you should be able to: • Understand and write Regular Expressions • Create XML code that will use Regular Expressions to parse data from providers into parameters
Regular Expression Parsing In Application Log/Syslog Provider
Regular Expressions • Used to parse and analyze fields • Designed for matching text items • Requires extremely precise syntax
Regular Expression Overview • Can be used in: • Rule criteria • View criteria • Computer Group formulas • XML / parameter parsing • Popular Boost.org regular expression parser • Perl-like regular expression syntax • Features include: • Advanced text pattern matching • Timestamp conversion • UI support • Syslog IP filtering Note: The regex that is used in Rules, Views and Computer Groups is not the same syntax as parsers
Regular Expression Example • Regular Expression = ^World\s+.* • “^” means “Start of Line” • “^World” means the text line must begin with “World” • “\s+” means any number of spaces • “.*” is a wildcard that will match anything else • Matches: “World Wide Web Publishing Service” “World with lots of space” “World Class” “World War” • Does not match: “WorldWide” “Wide World of Sports” “Wayne’s World” “War of the Worlds”
Regular Expression Example #2 • Regular Expression = ^\s+TCP|ICMP\s+\d+.\d+.\d+.\d+:?\d+? • “^” means “Start of Line” • “\s+” means “any number of blank spaces” • “TCP|ICMP” means the literal words “TCP” or “ICMP” must be present • \d+.\d+.\d+.\d+:?\d+? means a field with 5 numerical (digits) parts, separated by periods and a colon, the colon and the 5th field may or may not exist. • Since each digital component has a “+” it can be any number of consecutive digits • Matches: TCP 192.168.1.1:80 ICMP 192.168.1.1
Regular Expression Example #3 • Regular Expression = [^\,]* • Matches all fields until a , is seen. Any character can be used. • Useful for matching data within a given sub-expression that can vary greatly • Matches: the red text in the below line: 250606001E05,25,2,8,HOUAV03
Special Characters • Special Characters include \ ^ $ * . [ ] | + ( ) • Any time you want to use a special characters as a literal, it must be escaped • Example: The path c:\myfile.txt would need to be entered as c:\\myfile\.txt • Example: The User-ID $ExchangeService would need to be entered as \$ExchangeService
Taking Apart the Regular Expression ^\s+ TCP \s+ \d+ . \d+ . \d+ . \d+ :? \d+? TCP 192 . 168 . 106 . 134 : 80
Syntax Must Be Precise • Regular Expression = ^\d.\d.\d.\d:?\d? • Matches: 1.2.3.4:5 1.2.3.5 • Does Not Match: 192.168.1.20:25 192.168.1.20
Regular Expression Tools & Links • Expresso http://www.ultrapico.com/Expresso.htm • Helps with the actual writing of RegEx expressions • Regular expression syntax help http://www.boost.org/libs/regex/doc/syntax_perl.html • Timestamp format http://icu.sourceforge.net/userguide/formatDateTime.html
Three Major Sections • Date • Filters • Events
Date Section <DateTimeMap> <TimeStamp> <TimeStampSample>2005-9-11T14:18:11 GMT</TimeStampSample> <TimeStampFormat>yyyy-MM-dd'T'HH:mm:ss z</TimeStampFormat> <TimeStampRE>\d+-\d+-\d+T\d+:\d+:\d+\w+[^|]*</TimeStampRE> </TimeStamp> </DateTimeMap> <DateTimeFormat>yyyy-MM-dd'T'HH:mm:ss z</DateTimeFormat> When using a DateTimeMap, your regex code should include the following comment tags: <!--TimeStampStartTag--><!--TimeStampEndTag-->
Filter Section • Used to pre-filter high volume Events or unwanted Events • Used to improve Provider performance • Should be as efficient and specific as possible • Sample filter section: <Filters> <RegEx>.*last message repeated\s+\w+\s+times.*</RegEx> </Filters> This particular Filter is used to filter out UNIX Syslog Messages that list the previous message being repeated X times.
Event Section • Contains one or more Event matching nodes • An Event node is used to match a particular message and format it in a specific way • Each Event node contains 3 sections: • Regular Expression section – the RegEx itself • Instruction section – parameter mapping • Message section – SM description definition
Event Node Mapping – RegEx Section <RegEx>^\s+TCP\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\w+)</RegEx> (F 2) (F 3) (F 4) (F 1) (F 0) (F 4) (F 3) (F 2) (F 1) (F 0)
Event Node Mapping – Instruction Section <Instructions> <Field name="$EventSource" source=“MYEVTSRC" /> <Field name="1" source="%0%" /> <Field name="2" source="%1%" /> <Field name="3" source="%2%" /> <Field name="4" source="%4%" /> <Field name="5" source=“" /> <Field name=“6" source=“%3%" /> </Instructions>
Event Node Mapping – Message Section <Message><![CDATA[ Protocol: TCP Local Address: %0% Local Port: %1% Foreign Address: %2% Foreign Port: %3% Status: %4% ]]></Message> Note: <![CDATA[ ]]> tags are used to tell the code the interprets the XML code to ignore the contents within from an XML syntax standpoint.
Message Example • This is an acceptable way to break down the event into details, but is not necessary. A better way will be explained shortly. <Message><![CDATA[ Protocol: TCP Local Address: %0% Local Port: %1% Foreign Address: %2% Foreign Port: %3% Status: %4% ]]></Message>
Where are the Parameters? • Parameters are not stored separately unless SM is specifically instructed to do so.
Preventing Data Loss • Adding additional “Catch-all” parsers will allow you to collect anything that slipped through the cracks. • Examples: <Event id="“> <RegEx>.*snort.*:.*</RegEx> <Instructions> <Field name="$EventSource" source="Snort IDS" /> <Field name="$EventSeverity" source="1" /> </Instructions> <Message></Message> </Event> <Event id="“> <RegEx>.*</RegEx> <Instructions> <Field name="$EventSource" source="Syslog" /> <Field name="$EventSeverity" source="1" /> </Instructions> <Message></Message> </Event>
Putting it All Together • Change the Provider to XML • Click on the Configure XML button • Cut and Paste XML code from Editor
Custom Alert Descriptions – The right way to create alert messages! • The default is to use $Description$ for the Alert Description • This causes the alert to look like this:
Custom Alert Descriptions – The right way to create alert messages! • By creating a descriptive alert description, you can make the alert look like this: • This is accomplished by creating modifying the event processing rule that generates the alert to have a more detailed alert description
Limitations of Regular Expression Parsing • It is a lexical parser and it works only for sequence-based regular expression parsing • Does not support XML format messages, i.e., IDMEF messages • Sub-expressions are limited to 0–24
XML Tools & Links • SCiTe • http://scintilla.sourceforge.net/ScintillaDownload.html • Small fast text editor with color coding for XML • Notepad++ • http://notepad-plus.sourceforge.net/uk/site.htm • Slightly larger text editor, but more robust than SCiTe
Module Review In this session you learned how to: • Understand and write Regular Expressions • Create XML code that will use Regular Expressions to parse data from providers into parameters