130 likes | 347 Views
The Orlando Project. An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph http://www.ualberta.ca/ORLANDO/. The Orlando Project.
E N D
The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph http://www.ualberta.ca/ORLANDO/
The Orlando Project • is producing the first scholarly history of women’s writing in the British Isles • is in the 4th of 6 years as a Major Collaborative Research Initiative funded by SSHRC and the Universities of Alberta and Guelph • is using SGML to markup its own research rather than tagging pre-existing texts
SGML • 3 DTDs that blend structural markup with tags that foreground the interpretive nature of our research • 252 unique tags • 114 unique attributes • 640,000 element occurrences across all documents • e.g. 22,459 uses of the <name> tag
Why strive for consistency? • Delivery Needs/End-User needs • Consistency of presentation • titles, quotations, foreign words • Adequate search and retrieval • standard expressions for people, places, organizations, texts • Chronological sorting • standard expressions for dates
Tag Cleanup Pilot • Step 1: Analyze data to locate common inconsistencies • Step 2: Prioritize tag types for cleanup • Step 3: Establish workflow protocols • Step 4: Generate assignments • Step 5: Update user documentation
Division of Labour • Batch changes: fix errors that regular expressions can easily find and replace • Undergraduates: fix problems that are predictable but not machine processable or that require minimal research • GRAs and PDFs: fix problems that require an experienced tagger or that need further research • Volume Author: act as consultants on research and practice issues
Assignments and the tools that make them happen • SGML-aware fulltext search engine: helps find “prosey” tags in context; helps find incorrect or missing attributes • Document-wide statistics: reports odd sub-elements and content that varies from the norm • Tag cleanup reporting: organizes “index” tags to make cleanup easier
Establish priorities early on Weigh priorities against needs, wants, and total available resources Divide tasks according to expertise Train “experts” to fix major tags Develop a good checking system Revisit workflow models regularly to make sure the workload is equal to the resources available Conclusions
Consistency -- it is possible? is it possible across projects? SMGL -- does it help? ... “that's not a date” ... “no, that's not a date, either” ... “nope, still not a date” ... “there you go: a date at last!” ... “your text is getting pretty long” ... “the text for this tag is usually not this long” ... “you have exceeded the average length of text in this tag by 250%!” ... “congratulations, you have created the world's longest tag!” Reflections on Consistency
The Orlando Project An Integrated History of Women’s Writing in the British Isles The University of Alberta The University of Guelph http://www.ualberta.ca/ORLANDO/