80 likes | 215 Views
parsing strings. Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis. Working Example. As software developers we frequently find ourselves needing to parse strings.
E N D
parsing strings Fred Kuhns Computer Science and Engineering Applied Research laboratory Washington University in St. Louis
Working Example • As software developers we frequently find ourselves needing to parse strings. • We will focus on character strings using the STL string class but the techniques used have broader applicability • A common scenario is we have a file which contains tabulated data that we must read, perform some processing than store the results back in a file • A popular file format is CSV or Comma Separated Values • commas are used to separate fields and newlines to separate records CS422 – Operating Systems Concepts
CSV # Registration table # Last <fs> First <fs> MI <fs> ID <fs> Email <fs> Comments # Each line represents the record for one registered person Smith , John , M , 1001 , john@someplace.com , needs receipt Jackson, Mary , I , 2010 , mary@thatplace.edu , Mitchel, Mark, L, 4000, mm@candy.com, must call Hicks,,, 2110, , must get missing information • Convenient o think of the file as a two-dimensional array of records and fields • Be specific about your assumption concerning data format and whether comments and escape sequences are permitted • don’t assume that are fields will have values or that the proper number of field separators are present, especially if people are permitted to edit the file CS422 – Operating Systems Concepts
Accessing the file • C++ gives you access to the C and C++ standard libraries for I/O. See required text for the details. • I assume you need input and output files open: char ch; ifstream fin(“data.cvs”) if (!fin) {cerr<<“open fin failed”; exit(1);} ofstream fout(“result.cvs”); if (fout) {cerr<<“open fout failed”; exit(1);} while (fin.get(ch)) { … fout.put(ch);} if(!fin.eof() || !fout){cerr<<“File IO error”;exit(1);} • you may explicitly open a file fin.open(“filename”); • stream destructor closes file or you may explicitly close it fin.close(); CS422 – Operating Systems Concepts
Operations • stream objects have stategood() – next operation expected to succeedeof() – end of file (input) reachedfail() – next operation will failbad() – corrupted stream • An operation on a stream not in a good state is a null op • bool operator!() const on a stream returns fail() • operator void*()const returns fail() ? 0 : -1; • char oriented I/O uses get, put, read, write, getline and the operators << and >>. • get(char*,…) does not remove ‘\n’ but getline(char *,…) does. • Can also use the non-member function getline which takes a string CS422 – Operating Systems Concepts
Reading CSV • Questions: • is it OK to add fl to the vector records? • does the line read retain all whitespace? istream fin(argv[1]; string line; vector<string> lines; vector<FieldList> records; while (getline(fin, line)) { lines.push_back(line); // example of using vectors FieldList fl(line); records.push_back(fl); } CS422 – Operating Systems Concepts
Reading the fields, one or many ways FieldList::FieldList(const string &rec, …) // you can fill in the missing pieces string fld; string::size_type indx, fend, tmp, end = rec.size(); for (indx = 0; indx <= end; indx = fend + 1) { // skip over any initial white space indx = rec.find_first_not_of(ws_, indx); ??? flds_.push_back(fld); } • To solve this consider the edge cases • Make sure you explicitly address each case • Draw a picture • Do you allow comments? • What about quoted text with embedded field separators? CS422 – Operating Systems Concepts
Simple Examples • You can use the find family of string member function to split up this line: find(), find_first_of(), find_first_not_of(), find_last_of(), find_last_not_of() a, b, c\n char a , b , c 0 1 2 3 4 5 index Record as it appears in file string representation of record after a cal to getline(fin, line). line.size() == 5 CS422 – Operating Systems Concepts