340 likes | 468 Views
Case Study: RFA Migration. How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East. Who Am I?. Plone Consultant Non-profits in DC Foundation Member Zope/Python Users Group of DC (ZPUGDC) Events Organizer
E N D
Case Study: RFA Migration How I migrated 208,566 news stories from Bricolage to Plone. Alex Clark • http://aclark.net March 12, 2008 • Plone Symposium East
Who Am I? • Plone Consultant • Non-profits in DC • Foundation Member • Zope/Python Users Group of DC (ZPUGDC) Events Organizer • “UNIX guy”, sysadmin, Bachelor of Science in Computer Science, not really a programmer.
What is this? • An example of a “successful” migration, YMMV (your mileage may vary). • Inspiration-a-palooza! If I can do it, anyone can. • An opportunity to learn from my mistakes. • Analyses at the end. • XXX: News ‘story’ not ‘news item’ ;-) • i.e. rfasite product ‘story’ content type, not Plone default content type ‘news item’. • Medium to large size migration
What this is not • Plone vs. Bricolage. • How to: <your migration>. • Best practice (OK, maybe some best practice.)
Radio Free Asia • RFA is a private, nonprofit corporation that broadcasts news and information in nine native Asian languages to listeners who do not have access to full and free news media. The purpose of RFA is to provide a forum for a variety of opinions and voices from within these Asian countries. • Our Web site adds a global dimension to this objective. If you have comments, questions or suggestions, please contact us…
After • Not yet! ;-)
Pre-migration decisions i.e. how to get the data out of the old site? • Relational database “content”? • No one understood the Bricolage data model. • http? • I didn’t want to crawl the website. • “Baked” content on the filesystem. • provided the clearest migration path. • Find /var/www/rfa -name index.html
Zopectl run, then what? • Need a way to structure the migration of 10 different language services • e.g. zopectl run mandarin.py. • Need to ‘walk’ the file system. • i.e. how do we find the stories. • Need a way to parse the html on the file system, • i.e. we can’t shove the entire index.html into the body via setText() • Need to do Unicode conversions. • E.g. from Big5, euc_kr, gb2312, ascii to Unicode.
Zopectl run, then what? • Use Framework for performing asynchronous tasks, http://www.simplistix.co.uk/software/zope/stepper • Use os.walk, http://docs.python.org/lib/os-file-dir.html (in particular cb2_examples/cb2_2_16_sol_1.py) • Use HTML parsing, http://docs.python.org/lib/module-sgmllib.html (in particular diveintopython-5.4/py/BaseHTMLProcessor.py) • Use Unicode conversions, http://docs.python.org/lib/standard-encodings.html
Stepper Basics • Allows you to break your migration into pieces. • Commits transactions for you. • Zopectl run run.py site-object steps-or-chains
Basic Results • The ‘create’ step creates the site structure based on a list of categories defined in categories.py • The ‘migrate’ step walks the file system looking for index.html files, then • Extracts the contents • Invokes the Factory on the new object in the context of the category. • Calls mutators to insert content into fields, • E.g. obj.setTitle(title_extracted)
Intermediate Results(How to: Promise Too Much) • Slug-i-fication: Turning • /english/news/symposium_talks_rfa/2008/03/12/index.html into • /english/news/20080312-symposium_talks_rfa.html • Change “category” names, e.g. from • /english/news to • /english/exciting_news. • Import audio and image files from file system • insert into story fields and/or story folders (stories are folderish). • Featured audio or image, vs. inline audio or image.
Advanced Results(How to: Really Promise Too Much) • Related Links • At the bottom of each story are related links. • Slug-I-fy then insert them inline. • Slug-I-fy, change the category, then insert them inline.
No, Really… • I promised too much.
The RFA Migration Story • 10 Language Services • 208,566 stories • 5 Different encodings • 70GB of content on the file system • Hundreds of categories
The RFA Migration - E! True Hollywood Story • Images everywhere • /english/category/story/2008/01/01/index.html has image • /english/category/story/2008/01/01/foo.jpg and • /english/images/foo.jpg • Audio everywhere • Duplicate stories everywhere • Stories published as • /english/category/story/2008/01/01/index.html were also published as • /english/category2/story/2008/01/01/index.html.
Sidebar: Buildout vs. Buildit • Shortly after this project began, Buildout became the de facto standard for deploying a Plone site. • Deploy migration code and sample data with your buildout. • e.g. bin/buildout -c migration.cfg • where migration.cfg installs your migration code and sample data • Even better: bin/migrate
And now the moment you have all been waiting for! • Run buildout • Add site • Configure migration • Run migration
Wrap up • Unexpected results • Avoidable problems • General wrap up
Unexpected results • Missing content • Wrong content • Silent failures
Avoidable problems • Don’t promise too much • Don’t write bad code (read: bare try/excepts, etc.) • Don’t write slow code (use string methods over regular expressions, etc.)
General Wrap-up • Client is happy • May actually launch soon • Huge rewards • Great learning experience • This talk • Help others • Things I would do different? • unrestrictedTraverse instead of app.rfa[‘english’][‘news’][‘20080101-slug.html’]
Questions/Comments? • Email me: aclark@aclark.net • http://aclark.net • ACLARK.NET, LLC