1 / 20

Cosc 2150

Cosc 2150. String parsing in c++ with regular expressions. String parsing. One of main tasks that a program may need to do is take a string and parse it to determine the next step in the program. Command line applications Search applications (like bing and google).

thuyet
Download Presentation

Cosc 2150

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cosc 2150 String parsing in c++ with regular expressions.

  2. String parsing • One of main tasks that a program may need to do is take a string and parse it to determine the next step in the program. • Command line applications • Search applications (like bing and google). • Most network applications, send and receive data as strings. • to many more to even begin to name.

  3. How to parse • As with all things in c/c++ you can do it any number of ways. • Develop a functions and algorithms to parse a string up. • Use the methods functions in the string class • String parsing. • Use the sscanf functions • More like regular expressions. • Use the regex stl • Which is regular expressions. • Requires visual studio 2010 or gcc 4.3.0+

  4. Reading a line of input. • The standard cin reads to a space and then stops. • This is not always the functionality we want. • getline function: (2 methods)‏ • cin.getline(c_str, 256)‏ • Reads to end of line marks or number of characters, which ever comes first. • Requires a c_str, instead of a string. • Example: char stuff[256]; Cin.getline(stuff, 256); • But this is still not the method we want since it requires c-strings.

  5. Reading a line of input (2)‏ • Getline second method, which is the method we want to use, since it returns a string. • Part of the string class • getline(cin, string)‏ • Example: string stuff; getline(cin, stuff)

  6. Regular Expressions • Regex for short. • Likely the most powerful way to do any string processing. • Use: • Create a pattern that you want to match with • Run the “match” • If returns true, then the string matched the pattern • Also can get all the matches into an array to use as well. • Problem: • Regex patterns can be very complex and we don’t have to time (about 6 lectures) to cover the entire regex set. This will only cover the very basics.

  7. Code for regex • Include the regex stl, which is part of tr1 #include <regex> • Define the pattern • Note pattern is a variable! std::tr1::regex pattern ( … string pattern…); • object that will contain the sequence of sub-matches (optional) std::tr1::match_results<std::string::const_iterator> result;

  8. code for regex • regex_match to match the full string If (std::tr1::regex_match(string, result, pattern)) • If true there was a match • if capturing matches, result should have matches. if result.size() >0 or if !result.empty() • regex_search to match any part of a string If (std::tr1::regex_search(string, result, pattern)) • same as match, with result.

  9. pattern • assume we are using regex_search unless other noted. • Matching text regex ex1(“ello”); //matches anything with ello //such as “hello world” • alternation regex ex2(“Fred|Wilma|Pebbles”); //true if string contains Fred, Wilma, or Pebbles • alternation and grouping regex ex3(“(p|g|m|s|b)et”); //true if contains contains: pet, get, met, set, or bet //note () are also used to capture the match

  10. pattern (2) • single character or’d matching, using [] • regex ex4(“[0-9]”); //match a single digit • note the dash is a range operator ie 0 to 9 • regex ex5(“[a-zA-Z0-9.]”); • match any one “character” a through z or 0 to 9 or the period • match quantifiers • + 1 or more times • ? zero or 1 time • * zero or more times • regex ex6(“[0-9]+”); //find 1 or more digits • regex ex7(“[a-z]*”); //find zero or more characters

  11. pattern (3) • matching quantifiers {} • {min number, max number} • regex ex8(“[0-9]{1,3}”); • find 1 to 3 digits • regex ex9(“fo*ba?r{1,2}”); • matches f, 0 or more o's, b, 0 or 1 a, then 1 or 2 r’s • match: fobar, fbr, fbrr, fooobr, fooobarr, etc…

  12. pattern (4) • metasymbols • Match any thing using the period • regex ex10(“.+”); //find 1 or more ascii character • ie “123”, “ atr”, “\t there” all match • unless the string is empty, this will match. • \d match a Digit [0-9] • regex ex11(“\\d+”); //match 1 or more digits • \D match a Non-digit [^0-9] • \s match whitespace [ \t\n\r\f] • \S match a Non-whitespace [^ \t\n\r\f] • \w match a Word character [a-zA-Z0-9_] • regex ex12(“\\w+”); //match 1 more word character • \W match a Non word Character [^a-zA-Z0-9_]

  13. pattern (5) • capturing the matches • use the () around the part you want to capture • regex ex13(“(\\w+)”); • find 1 or more word characters and capture the resulting match • regex ex14(“(\\w+)\s+(\\w+)”); • find 1 or more word characters, then white space, then 1 or more word characters. Capture the word character matches • example: “hi there” • result[1]=“hi”, result[2]=“there” • regex ex15(“(\\d+) (.*)”); • What does this capture? How much this be useful with the regex_match?

  14. Examples • tr1::regex pattern1("(\\d+) (.*)") • tr1::regex pattern2("load M\\((\\d+)\\)"); • tr1::regex_match(str,result,pattern1); • result[1] = result=[2]= • tr1::regex_match(str,result,pattern2); • result[1]=

  15. Regex reference • http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 • http://www.codeproject.com/KB/string/TR1Regex.aspx • Patterns http://msdn.microsoft.com/en-us/library/bb982727.aspx

  16. Converting strings to integers (1) • Can use the sscanf function: #include <cstdlib> #include <cstdio> int GetIntVal2(string strConvert) { intintReturn =0; //if sscanf fails, because no digits, intReturn is already set to zero. sscanf(strConvert.c_str(),"%d",&intReturn); return (intReturn); }

  17. Converting strings to integers (2) • Use the atoi method #include <cstdlib> #include <cstdio> #include <iostream> #include <string> intGetIntVal(string strConvert) { intintReturn; // NOTE: You should probably do some checks to ensure that // this string contains only numbers. If the string is not // a valid integer, zero will be returned. intReturn = atoi(strConvert.c_str()); return(intReturn); }

  18. Converting integers to strings • Uses the ostringstream (in the <sstream>) • Put the integer into the stream, then put it back out as string. #include <sstream> #include <iostream> string GetStrVal(intintConvert) { ostringstreamcstr; //create the stream cstr << intConvert; //put integer into the stream return cstr.str(); //put out the string }

  19. Converting string to integer example: int main() { string str, str2; str = "12"; str2 = "1d2"; cout <<"aoti method str: "<<GetIntVal(str)<<endl; // prints out 12 cout <<"aoti method str2: "<<GetIntVal(str2)<<endl; //prints out 1 cout <<"sscanf method str: "<<GetIntVal2(str)<<endl; // prints out 12 cout <<"sscanf method str2: "<<GetIntVal2(str2)<<endl; //prints out 1 return 0; }

  20. Q A &

More Related