200 likes | 361 Views
Cosc 2150. String parsing in c++ with regular expressions. String parsing. One of main tasks that a program may need to do is take a string and parse it to determine the next step in the program. Command line applications Search applications (like bing and google).
E N D
Cosc 2150 String parsing in c++ with regular expressions.
String parsing • One of main tasks that a program may need to do is take a string and parse it to determine the next step in the program. • Command line applications • Search applications (like bing and google). • Most network applications, send and receive data as strings. • to many more to even begin to name.
How to parse • As with all things in c/c++ you can do it any number of ways. • Develop a functions and algorithms to parse a string up. • Use the methods functions in the string class • String parsing. • Use the sscanf functions • More like regular expressions. • Use the regex stl • Which is regular expressions. • Requires visual studio 2010 or gcc 4.3.0+
Reading a line of input. • The standard cin reads to a space and then stops. • This is not always the functionality we want. • getline function: (2 methods) • cin.getline(c_str, 256) • Reads to end of line marks or number of characters, which ever comes first. • Requires a c_str, instead of a string. • Example: char stuff[256]; Cin.getline(stuff, 256); • But this is still not the method we want since it requires c-strings.
Reading a line of input (2) • Getline second method, which is the method we want to use, since it returns a string. • Part of the string class • getline(cin, string) • Example: string stuff; getline(cin, stuff)
Regular Expressions • Regex for short. • Likely the most powerful way to do any string processing. • Use: • Create a pattern that you want to match with • Run the “match” • If returns true, then the string matched the pattern • Also can get all the matches into an array to use as well. • Problem: • Regex patterns can be very complex and we don’t have to time (about 6 lectures) to cover the entire regex set. This will only cover the very basics.
Code for regex • Include the regex stl, which is part of tr1 #include <regex> • Define the pattern • Note pattern is a variable! std::tr1::regex pattern ( … string pattern…); • object that will contain the sequence of sub-matches (optional) std::tr1::match_results<std::string::const_iterator> result;
code for regex • regex_match to match the full string If (std::tr1::regex_match(string, result, pattern)) • If true there was a match • if capturing matches, result should have matches. if result.size() >0 or if !result.empty() • regex_search to match any part of a string If (std::tr1::regex_search(string, result, pattern)) • same as match, with result.
pattern • assume we are using regex_search unless other noted. • Matching text regex ex1(“ello”); //matches anything with ello //such as “hello world” • alternation regex ex2(“Fred|Wilma|Pebbles”); //true if string contains Fred, Wilma, or Pebbles • alternation and grouping regex ex3(“(p|g|m|s|b)et”); //true if contains contains: pet, get, met, set, or bet //note () are also used to capture the match
pattern (2) • single character or’d matching, using [] • regex ex4(“[0-9]”); //match a single digit • note the dash is a range operator ie 0 to 9 • regex ex5(“[a-zA-Z0-9.]”); • match any one “character” a through z or 0 to 9 or the period • match quantifiers • + 1 or more times • ? zero or 1 time • * zero or more times • regex ex6(“[0-9]+”); //find 1 or more digits • regex ex7(“[a-z]*”); //find zero or more characters
pattern (3) • matching quantifiers {} • {min number, max number} • regex ex8(“[0-9]{1,3}”); • find 1 to 3 digits • regex ex9(“fo*ba?r{1,2}”); • matches f, 0 or more o's, b, 0 or 1 a, then 1 or 2 r’s • match: fobar, fbr, fbrr, fooobr, fooobarr, etc…
pattern (4) • metasymbols • Match any thing using the period • regex ex10(“.+”); //find 1 or more ascii character • ie “123”, “ atr”, “\t there” all match • unless the string is empty, this will match. • \d match a Digit [0-9] • regex ex11(“\\d+”); //match 1 or more digits • \D match a Non-digit [^0-9] • \s match whitespace [ \t\n\r\f] • \S match a Non-whitespace [^ \t\n\r\f] • \w match a Word character [a-zA-Z0-9_] • regex ex12(“\\w+”); //match 1 more word character • \W match a Non word Character [^a-zA-Z0-9_]
pattern (5) • capturing the matches • use the () around the part you want to capture • regex ex13(“(\\w+)”); • find 1 or more word characters and capture the resulting match • regex ex14(“(\\w+)\s+(\\w+)”); • find 1 or more word characters, then white space, then 1 or more word characters. Capture the word character matches • example: “hi there” • result[1]=“hi”, result[2]=“there” • regex ex15(“(\\d+) (.*)”); • What does this capture? How much this be useful with the regex_match?
Examples • tr1::regex pattern1("(\\d+) (.*)") • tr1::regex pattern2("load M\\((\\d+)\\)"); • tr1::regex_match(str,result,pattern1); • result[1] = result=[2]= • tr1::regex_match(str,result,pattern2); • result[1]=
Regex reference • http://www.codeguru.com/cpp/cpp/cpp_mfc/stl/article.php/c15339 • http://www.codeproject.com/KB/string/TR1Regex.aspx • Patterns http://msdn.microsoft.com/en-us/library/bb982727.aspx
Converting strings to integers (1) • Can use the sscanf function: #include <cstdlib> #include <cstdio> int GetIntVal2(string strConvert) { intintReturn =0; //if sscanf fails, because no digits, intReturn is already set to zero. sscanf(strConvert.c_str(),"%d",&intReturn); return (intReturn); }
Converting strings to integers (2) • Use the atoi method #include <cstdlib> #include <cstdio> #include <iostream> #include <string> intGetIntVal(string strConvert) { intintReturn; // NOTE: You should probably do some checks to ensure that // this string contains only numbers. If the string is not // a valid integer, zero will be returned. intReturn = atoi(strConvert.c_str()); return(intReturn); }
Converting integers to strings • Uses the ostringstream (in the <sstream>) • Put the integer into the stream, then put it back out as string. #include <sstream> #include <iostream> string GetStrVal(intintConvert) { ostringstreamcstr; //create the stream cstr << intConvert; //put integer into the stream return cstr.str(); //put out the string }
Converting string to integer example: int main() { string str, str2; str = "12"; str2 = "1d2"; cout <<"aoti method str: "<<GetIntVal(str)<<endl; // prints out 12 cout <<"aoti method str2: "<<GetIntVal(str2)<<endl; //prints out 1 cout <<"sscanf method str: "<<GetIntVal2(str)<<endl; // prints out 12 cout <<"sscanf method str2: "<<GetIntVal2(str2)<<endl; //prints out 1 return 0; }
Q A &