60 likes | 95 Views
Banal Because Format Checking is So Trite. Geoffrey M. Voelker University of California, San Diego Workshop on Organizing Workshops, Conferences, and Symposia for Computer Systems (WOWCS’08). This Talk is Not Very Interesting. Banal is a format checker for PDF documents
E N D
BanalBecause Format Checking is So Trite Geoffrey M. Voelker University of California, San Diego Workshop on Organizing Workshops, Conferences, and Symposia for Computer Systems (WOWCS’08)
This Talk is Not Very Interesting • Banal is a format checker for PDF documents • Deduces how a document was formatted • Optionally compares it with a specification • Intended for conference management systems • Now being used in HotCRP and EDAS • Seemed timely to document its genesis and implementation WOWCS’08
Why? • Preserving reviewer anonymity • Acrobat javascript that calls home when pdf is loaded • Assisting conference management tasks • Ensuring anonymity rules • Possibly helping do initial assignments by mining the bib • Fairness • Everyone else obeyed the rules… • Time • Already enough time spent on reviewing • Frustrated that abuse meant taking even more of my time WOWCS’08
How? Convert PDF To XML (with pdftohtml) Track the locations of all segments of text, essentially form bounding boxes Compute margins, columns, body font, etc. Heuristics for page #s, headers, footers, etc. WOWCS’08
Where? • A handful of SIGOPS/SIGCOMM conferences • OSDI’06, SIGCOMM’07, SIGCOMM’08 • Eddie Kohler has integrated it into HotCRP • Henning Schulzrinne also integrated banal with EDAS • Since 2006, used for over 800 events WOWCS’08
So? • What are our community goals for having formatting requirements? • Evil: Annoying trifles that negatively impact our ability to communicate our results and ideas? • Helpful: Reflect practicalities of publishing costs and community time? • Not surprisingly, I’m in the practical camp WOWCS’08