140 likes | 229 Views
Applying the LIPARM Schema to legacy content. Paul S Ell David Hardy Centre for Data Digitisation and Analysis. LIPARM Project Workshop 28 January 2013. Backdrop. Significant investment in British Isles Parliamentary content – BOPCRIS, Stormont Papers, Cobbett’s Parliamentary Papers
E N D
Applying the LIPARM Schema to legacy content Paul S Ell David Hardy Centre for Data Digitisation and Analysis LIPARM Project Workshop 28 January 2013
Backdrop • Significant investment in British Isles Parliamentary content – BOPCRIS, Stormont Papers, Cobbett’s Parliamentary Papers • Generally each resource has its own interface and metadata standards • Systematic research using disparate resources is hampered by this • Consequently the impact of the digital resources was reduced
CDDA’s Role • To take the standardised Parliamentary Metadata Language (PML) developed by the project and apply it to sample legacy datasets • To examine existing authority files/controlled vocabularies and see the degree to which they need augmentation • To advise of the challenges of applying the schema to legacy materials • To identify methodologies to reduce the capital cost of implementing the schema • To establish the time and investment needed to convert existing content • To advise on the application of the schema to born digital content
Capturing what? • Members of parliament – John Smith, Lord Smith, Viscount Smith, Member for Manchester South, the Prime Minister, the Chancellor etc • Parliamentary constituencies – changes of name over time, names presented in different ways (South Manchester/Manchester South), varying boundaries where the name remains the same, differntiating John Taylor (UU MP), Lord Kilclooneyand Kilclooney the place in Donegal) • Calendar objects – Parliaments 1979-1983, sessions 1/9/79-1/6/80, sittings 15/1/80 • Functions – PM, Speaker, Chancellor • Proceeding objects – debates, reading of bills, reading of acts • Divisions – and members who cast votes
Authority files/Controlled vocabularies • The schema is highly dependent on authority files such members of parliament and the dates they were in parliament, offices of state and individuals associated with them, constituency lists for each parliament and an association between a person and a constituency • Whilst to a degree authority files could be populated automatically in practice there was work in manually amending them • Authority files also had to cope with differing parliamentary models between Westminster and Northern Ireland – for example in NI single constituencies had more than one member serving them • Ideally controlled vocabularies/authority files should facilitate links to non-parliamentary e-resources
Issues to consider • Initially the schema was applied manually which was both very time consuming and produced errors. A number of steps were introduced to automate the system • The amount of work involved in retro conversion varies from parliamentary year to year. New administrations tend tend to have more legislation, administrations with slight majorities tend to have more divisions etc. • The schema needs to be sufficiently flexible or adaptable to cope with differences between parliaments – such as multi-member constituencies. • It would be useful to see to what degree existing XML could be used to apply the schema • A pick and mix approach to elements of the schema would be good. Such is the detail at present tagging is highly complex.
Lessons learnt • Real-time conversion of content – as proposed to the Welsh and Northern Ireland Assemblies is likely to be far less problematic than retro-conversion • In total only 14 years of Hansard have been converted during the project. Whilst the PML was honed, and staff became more familiar with the content this is a very slow process • Hence there is a need to make the best possible use of any existing xml and to automate as much of the process as possible • The project primarily has addressed the PML application to Hansard. Other content – parliamentary reports for example – will result in additional challenges
Examples of stages in the process Creating a unique name and date range for each volume
The fields are pre-populated from the existing authority files. Some skilled data entry staff Have sufficient access privileges to create new roles/people etc