1 / 14

Finding & Eliminating Rogue Hex Characters in Text Fields

Finding & Eliminating Rogue Hex Characters in Text Fields. Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie. The Problem. Chart abstraction data containing several comment fields (255 chars each) Some values with "random" line feeds.

Download Presentation

Finding & Eliminating Rogue Hex Characters in Text Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Finding & Eliminating Rogue Hex Characters in Text Fields Martha CoxCancer Outcomes Research Program CDHA / Dalhousie

  2. The Problem Chart abstraction data containing several comment fields (255 chars each) Some values with "random" line feeds

  3. Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------

  4. So I emailed my SAS buddies...

  5. Lots of suggestions • compress? kcompress?Returns seem to be between words. Compress would smash 2 words together. • translate or tranwrd?Should work, but these wouldn't take a hex value for me. Besides, which character(s) is the problem?

  6. data charlist; set shrug.sample1 (where=(PATIENT in (28))); length single singlhex $1; loopx = length(trim(COMMENT)); do i = 1 to loopx; single = substr(COMMENT, i, 1); singlhex = single; output; end; keep single singlhex; run; How to find the Bad Word

  7. Obs single singlhex 20 g 67 21 e 65 22 r 72 23 y 79 24 . 2E 25 20 26 0D 27 0A 28 0D 29 0A 30 B 42 31 i 69 Patient 28's comment, one char at a time

  8. data shrug.sample2; set shrug.sample1; badword = trim('0D'x) || left('0A'x); goodword = ' '; COMMENT = tranwrd(COMMENT, badword, goodword); drop badword goodword; run; Repair Program

  9. Results Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------

  10. Hmm... • Noticed that the breaks seemed to occurring where one might have used a slash (“/”). • Working in a VMS batch environment; no Display Manager. • Looking at the data via PROC REPORT with “flow” for the comments column. So, is this a data problem or a reporting problem?

  11. after much digging through SAS manuals...

  12. The Answer! Split character in PROC REPORT • not just for column headers • also used to split long text values in the body of the report • default character is slash

  13. Final Results Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N/A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3/4 tumor above reflection, 1/4 was below reflection --------------------------------------------------------------------------------------------

  14. Any questions ?

More Related