140 likes | 222 Views
Finding & Eliminating Rogue Hex Characters in Text Fields. Martha Cox Cancer Outcomes Research Program CDHA / Dalhousie. The Problem. Chart abstraction data containing several comment fields (255 chars each) Some values with "random" line feeds.
E N D
Finding & Eliminating Rogue Hex Characters in Text Fields Martha CoxCancer Outcomes Research Program CDHA / Dalhousie
The Problem Chart abstraction data containing several comment fields (255 chars each) Some values with "random" line feeds
Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
Lots of suggestions • compress? kcompress?Returns seem to be between words. Compress would smash 2 words together. • translate or tranwrd?Should work, but these wouldn't take a hex value for me. Besides, which character(s) is the problem?
data charlist; set shrug.sample1 (where=(PATIENT in (28))); length single singlhex $1; loopx = length(trim(COMMENT)); do i = 1 to loopx; single = substr(COMMENT, i, 1); singlhex = single; output; end; keep single singlhex; run; How to find the Bad Word
Obs single singlhex 20 g 67 21 e 65 22 r 72 23 y 79 24 . 2E 25 20 26 0D 27 0A 28 0D 29 0A 30 B 42 31 i 69 Patient 28's comment, one char at a time
data shrug.sample2; set shrug.sample1; badword = trim('0D'x) || left('0A'x); goodword = ' '; COMMENT = tranwrd(COMMENT, badword, goodword); drop badword goodword; run; Repair Program
Results Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3 4 tumor above reflection, 1 4 was below reflection --------------------------------------------------------------------------------------------
Hmm... • Noticed that the breaks seemed to occurring where one might have used a slash (“/”). • Working in a VMS batch environment; no Display Manager. • Looking at the data via PROC REPORT with “flow” for the comments column. So, is this a data problem or a reporting problem?
The Answer! Split character in PROC REPORT • not just for column headers • also used to split long text values in the body of the report • default character is slash
Final Results Patient ID Comments -------------------------------------------------------------------------------------------- 013 Found hyperplastic polyp -------------------------------------------------------------------------------------------- 017 colonscopy performed in Bridewater - showed a large rectal tumor as well as multiple polyps throughout the colon. -------------------------------------------------------------------------------------------- 028 Pt did not have surgery. Biopsy from endoscopy came back as moderatley differentiated adneocarcinoma -------------------------------------------------------------------------------------------- 031 -Pt. had moderately severe sigmoid diverticulosis and agulated sigmoid colon and hepatic flexure. -No evidence of intraluminal tumor at this point. -------------------------------------------------------------------------------------------- 038 report not present. -------------------------------------------------------------------------------------------- 040 office sigmoidscopy done in April, 2003 and was found to be normal. A second sigmiodscopy was done in sx. -------------------------------------------------------------------------------------------- 056 colonscopy confirmed the presence of a low-lying carinoma of the rectum. -------------------------------------------------------------------------------------------- 084 lap attempted X 2 but resection could not be carried out. Most questions N/A for laparatomy. -------------------------------------------------------------------------------------------- 155 had a hemicolectomy -------------------------------------------------------------------------------------------- 157 3/4 tumor above reflection, 1/4 was below reflection --------------------------------------------------------------------------------------------