1 / 33

Regular Expressions and XML Parsing

Regular Expressions and XML Parsing. Objectives. After this session you should be able to: Understand and write Regular Expressions Create XML code that will use Regular Expressions to parse data from providers into parameters. Regular Expressions.

Download Presentation

Regular Expressions and XML Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regular Expressions and XML Parsing

  2. Objectives After this session you should be able to: • Understand and write Regular Expressions • Create XML code that will use Regular Expressions to parse data from providers into parameters

  3. Regular Expressions

  4. Regular Expression Parsing In Application Log/Syslog Provider

  5. Regular Expressions • Used to parse and analyze fields • Designed for matching text items • Requires extremely precise syntax

  6. Regular Expression Overview • Can be used in: • Rule criteria • View criteria • Computer Group formulas • XML / parameter parsing • Popular Boost.org regular expression parser • Perl-like regular expression syntax • Features include: • Advanced text pattern matching • Timestamp conversion • UI support • Syslog IP filtering Note: The regex that is used in Rules, Views and Computer Groups is not the same syntax as parsers

  7. Regular Expression Example • Regular Expression = ^World\s+.* • “^” means “Start of Line” • “^World” means the text line must begin with “World” • “\s+” means any number of spaces • “.*” is a wildcard that will match anything else • Matches: “World Wide Web Publishing Service” “World with lots of space” “World Class” “World War” • Does not match: “WorldWide” “Wide World of Sports” “Wayne’s World” “War of the Worlds”

  8. Regular Expression Example #2 • Regular Expression = ^\s+TCP|ICMP\s+\d+.\d+.\d+.\d+:?\d+? • “^” means “Start of Line” • “\s+” means “any number of blank spaces” • “TCP|ICMP” means the literal words “TCP” or “ICMP” must be present • \d+.\d+.\d+.\d+:?\d+? means a field with 5 numerical (digits) parts, separated by periods and a colon, the colon and the 5th field may or may not exist. • Since each digital component has a “+” it can be any number of consecutive digits • Matches: TCP 192.168.1.1:80 ICMP 192.168.1.1

  9. Regular Expression Example #3 • Regular Expression = [^\,]* • Matches all fields until a , is seen. Any character can be used. • Useful for matching data within a given sub-expression that can vary greatly • Matches: the red text in the below line: 250606001E05,25,2,8,HOUAV03

  10. Regular Expression Operators and Their Definitions

  11. Special Characters • Special Characters include \ ^ $ * . [ ] | + ( ) • Any time you want to use a special characters as a literal, it must be escaped • Example: The path c:\myfile.txt would need to be entered as c:\\myfile\.txt • Example: The User-ID $ExchangeService would need to be entered as \$ExchangeService

  12. Taking Apart the Regular Expression ^\s+ TCP \s+ \d+ . \d+ . \d+ . \d+ :? \d+? TCP 192 . 168 . 106 . 134 : 80

  13. Syntax Must Be Precise • Regular Expression = ^\d.\d.\d.\d:?\d? • Matches: 1.2.3.4:5 1.2.3.5 • Does Not Match: 192.168.1.20:25 192.168.1.20

  14. Examples of Regular Expressions and Matches

  15. Regular Expressions with XML

  16. Regular Expression Tools & Links • Expresso http://www.ultrapico.com/Expresso.htm • Helps with the actual writing of RegEx expressions • Regular expression syntax help http://www.boost.org/libs/regex/doc/syntax_perl.html • Timestamp format http://icu.sourceforge.net/userguide/formatDateTime.html

  17. Sample XML File

  18. Three Major Sections • Date • Filters • Events

  19. Date Section <DateTimeMap> <TimeStamp> <TimeStampSample>2005-9-11T14:18:11 GMT</TimeStampSample> <TimeStampFormat>yyyy-MM-dd'T'HH:mm:ss z</TimeStampFormat> <TimeStampRE>\d+-\d+-\d+T\d+:\d+:\d+\w+[^|]*</TimeStampRE> </TimeStamp> </DateTimeMap> <DateTimeFormat>yyyy-MM-dd'T'HH:mm:ss z</DateTimeFormat> When using a DateTimeMap, your regex code should include the following comment tags: <!--TimeStampStartTag--><!--TimeStampEndTag-->

  20. Filter Section • Used to pre-filter high volume Events or unwanted Events • Used to improve Provider performance • Should be as efficient and specific as possible • Sample filter section: <Filters> <RegEx>.*last message repeated\s+\w+\s+times.*</RegEx> </Filters> This particular Filter is used to filter out UNIX Syslog Messages that list the previous message being repeated X times.

  21. Event Section • Contains one or more Event matching nodes • An Event node is used to match a particular message and format it in a specific way • Each Event node contains 3 sections: • Regular Expression section – the RegEx itself • Instruction section – parameter mapping • Message section – SM description definition

  22. Event Node Mapping – RegEx Section <RegEx>^\s+TCP\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\d+.\d+.\d+.\d+):?(\d+)?\s+(\w+)</RegEx> (F 2) (F 3) (F 4) (F 1) (F 0) (F 4) (F 3) (F 2) (F 1) (F 0)

  23. Event Node Mapping – Instruction Section <Instructions> <Field name="$EventSource" source=“MYEVTSRC" /> <Field name="1" source="%0%" /> <Field name="2" source="%1%" /> <Field name="3" source="%2%" /> <Field name="4" source="%4%" /> <Field name="5" source=“" /> <Field name=“6" source=“%3%" /> </Instructions>

  24. Event Node Mapping – Message Section <Message><![CDATA[ Protocol: TCP Local Address: %0% Local Port: %1% Foreign Address: %2% Foreign Port: %3% Status: %4% ]]></Message> Note: <![CDATA[ ]]> tags are used to tell the code the interprets the XML code to ignore the contents within from an XML syntax standpoint.

  25. Message Example • This is an acceptable way to break down the event into details, but is not necessary. A better way will be explained shortly. <Message><![CDATA[ Protocol: TCP Local Address: %0% Local Port: %1% Foreign Address: %2% Foreign Port: %3% Status: %4% ]]></Message>

  26. Where are the Parameters? • Parameters are not stored separately unless SM is specifically instructed to do so.

  27. Preventing Data Loss • Adding additional “Catch-all” parsers will allow you to collect anything that slipped through the cracks. • Examples: <Event id="“> <RegEx>.*snort.*:.*</RegEx> <Instructions> <Field name="$EventSource" source="Snort IDS" /> <Field name="$EventSeverity" source="1" /> </Instructions> <Message></Message> </Event> <Event id="“> <RegEx>.*</RegEx> <Instructions> <Field name="$EventSource" source="Syslog" /> <Field name="$EventSeverity" source="1" /> </Instructions> <Message></Message> </Event>

  28. Putting it All Together • Change the Provider to XML • Click on the Configure XML button • Cut and Paste XML code from Editor

  29. Custom Alert Descriptions – The right way to create alert messages! • The default is to use $Description$ for the Alert Description • This causes the alert to look like this:

  30. Custom Alert Descriptions – The right way to create alert messages! • By creating a descriptive alert description, you can make the alert look like this: • This is accomplished by creating modifying the event processing rule that generates the alert to have a more detailed alert description

  31. Limitations of Regular Expression Parsing • It is a lexical parser and it works only for sequence-based regular expression parsing • Does not support XML format messages, i.e., IDMEF messages • Sub-expressions are limited to 0–24

  32. XML Tools & Links • SCiTe • http://scintilla.sourceforge.net/ScintillaDownload.html • Small fast text editor with color coding for XML • Notepad++ • http://notepad-plus.sourceforge.net/uk/site.htm • Slightly larger text editor, but more robust than SCiTe

  33. Module Review In this session you learned how to: • Understand and write Regular Expressions • Create XML code that will use Regular Expressions to parse data from providers into parameters

More Related