1 / 14

Text Editing

LIANZA ITSIG webinar series. Text Editing. Tools, tips, tricks. Kim Shepherd k.shepherd@auckland.ac.nz Digital Development Team The University of Auckland Library. Summary. General (large) text files We manage and manipulate text data daily It’s tedious and time consuming

glyn
Download Presentation

Text Editing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIANZA ITSIG webinar series Text Editing Tools, tips, tricks Kim Shepherd k.shepherd@auckland.ac.nz Digital Development Team The University of Auckland Library

  2. Summary • General (large) text files • We manage and manipulate text data daily • It’s tedious and time consuming • Find & Replace is too limited and dangerous • We know there must be a better way... • Tabular data files (eg. Spreadsheets) • We work with these all the time, usually in Excel • What tools can help us clean messy data?

  3. Topics • Regular Expressions • Text Editors • Operating on lines, not entire files • Google Refine

  4. Regular Expressions /^\s+[a-zA-Z0-9](?:\W+)/

  5. Regular Expressions • A way to describe a set of strings and capture parts of them • Originated in old UNIX/POSIX tools • Now used all over the place • Test your regexes out on the web: • http://gskinner.com/RegExr/

  6. Text Editors & Useful Languages sed, grep, awk

  7. Text Editors • Word processors aren’t text editors • Shop around, compare features • My favourite: Vim (UNIX, Windows, Mac) • Wikipedia comparison of editor features • Wikipedia list of regex software

  8. Useful Languages / Interpeters • Perl • An old favourite, great for string manipulation • Python • The cool kids tell me it’s better than Perl • GREL • We’ll get to this later...

  9. Line-by-line processing while(<STDIN>) { .... }

  10. Line-by-line processing • Large files are large! • If they’re big on disk, they’ll be big in memory • Lines are (usually!) small • Read a line • Do something with it • Output the modified line

  11. Google Refine • Cleans messy tabular data • Easy facetting and filtering of columns/values • Easy transformation of values • Google Refine Expression Language (GREL) • Extensive use of regular expressions and other standard string manipulation techniques • Other features • Perform web service calls directly, reconcile row IDs

  12. Conclusion • Our problems are solvable! • Regular expressions • Decent text editors for general/unformatted text • Google Refine for tabular data • Contact me • Please feel free to contact me with questions, corrections or ideas • k.shepherd@auckland.ac.nz • Twitter: @kimshepherd • Google+: kim.shepherd@gmail.com

More Related