230 likes | 352 Views
PIPING HOT: Little Bins in big workflows. Alex Garnett Digital Preservation & Data Curation SFU Library. Thesis: I am a terrible programmer. Thesis: I am a terrible programmer. 2 0% of you are thinking “no kidding!” The other 80% of you are thinking “uh huh. Stupid false-modest shmuck .”.
E N D
PIPING HOT:Little Bins inbig workflows Alex Garnett Digital Preservation & Data Curation SFU Library
Thesis: I am a terrible programmer • 20% of you are thinking “no kidding!” • The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.”
Thesis: I am a terrible programmer • 20% of you are thinking “no kidding!” • The other 80% of you are thinking “uh huh. Stupid false-modest shmuck.” • Who needs impostor syndrome when you have a bash shell?
For the record, this is the payoff from all those colonoscopy jokes. Yep.
But how does it apply to libraries?[If MJ Suhonos is here this year, this is his cue to groan audibly]
LIBRARY PROBLEM #1: PDFA • ProQuest wants PDFA submissions from now on • “now on” apparently = the past five years’ backlog • We have to convert five years of theses! • This is now also being used at the UofA.
LIBRARY PROBLEM #2: ARCHIVES PROBLEM:LIBRARY HARDERSTARRING BRUCE WILLISCRAP, I USED UP THE WHOLE SLIDE ON THE TITLE
Archives needed a GUI tool to be able to create restrictive FTP accounts for donors.
LIBRARY PROBLEM #3:PDF REDACTION (IT’S LIKE THE FIRST ONE BECAUSE NO ONE LIKED THE SEQUEL, DOES ANYONE WANT TO WATCH TEMPLE OF DOOM LATER, OH HELL I’VE DONE IT AGAIN)
We learned we had some poorly redacted PDFs • Blackout meant to obscure text; still selectable
Solution: • Detect offending pages with ghostscript… • (this is the hard part; dumping PDF guts is appalling)
… and then: • Snip offending pages with pdftk • Convert them to images with imagemagick • OCR back into PDF (minus obscured text) with tesseract and fix up the dimensions with gs again • Paste back in with pdftk. • 5 lines, all free tools! Documentation & piping.
Takeaway • If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way
Takeaway • If you find yourself doing a very bad job of learning PHP and feeling like you have something to prove: it doesn’t have to be this way • There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.
Takeaway • Open-source command line tools are really good these days! They are powerful, they are straightforward, and they are often cutting edge. • There is a huge amount of useful space you can occupy as a barely-programmer if you’re comfortable using a terminal for problem solving (less so on Windows). StackOverflow and Google are your friend.
Surprise: Everybody gets a free colonoscopy after all! • Thanks! garnett@sfu.ca ; @axfelix