1 / 14

Applying the LIPARM Schema to legacy content

Applying the LIPARM Schema to legacy content. Paul S Ell David Hardy Centre for Data Digitisation and Analysis. LIPARM Project Workshop 28 January 2013. Backdrop. Significant investment in British Isles Parliamentary content – BOPCRIS, Stormont Papers, Cobbett’s Parliamentary Papers

joben
Download Presentation

Applying the LIPARM Schema to legacy content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applying the LIPARM Schema to legacy content Paul S Ell David Hardy Centre for Data Digitisation and Analysis LIPARM Project Workshop 28 January 2013

  2. Backdrop • Significant investment in British Isles Parliamentary content – BOPCRIS, Stormont Papers, Cobbett’s Parliamentary Papers • Generally each resource has its own interface and metadata standards • Systematic research using disparate resources is hampered by this • Consequently the impact of the digital resources was reduced

  3. CDDA’s Role • To take the standardised Parliamentary Metadata Language (PML) developed by the project and apply it to sample legacy datasets • To examine existing authority files/controlled vocabularies and see the degree to which they need augmentation • To advise of the challenges of applying the schema to legacy materials • To identify methodologies to reduce the capital cost of implementing the schema • To establish the time and investment needed to convert existing content • To advise on the application of the schema to born digital content

  4. Capturing what? • Members of parliament – John Smith, Lord Smith, Viscount Smith, Member for Manchester South, the Prime Minister, the Chancellor etc • Parliamentary constituencies – changes of name over time, names presented in different ways (South Manchester/Manchester South), varying boundaries where the name remains the same, differntiating John Taylor (UU MP), Lord Kilclooneyand Kilclooney the place in Donegal) • Calendar objects – Parliaments 1979-1983, sessions 1/9/79-1/6/80, sittings 15/1/80 • Functions – PM, Speaker, Chancellor • Proceeding objects – debates, reading of bills, reading of acts • Divisions – and members who cast votes

  5. Authority files/Controlled vocabularies • The schema is highly dependent on authority files such members of parliament and the dates they were in parliament, offices of state and individuals associated with them, constituency lists for each parliament and an association between a person and a constituency • Whilst to a degree authority files could be populated automatically in practice there was work in manually amending them • Authority files also had to cope with differing parliamentary models between Westminster and Northern Ireland – for example in NI single constituencies had more than one member serving them • Ideally controlled vocabularies/authority files should facilitate links to non-parliamentary e-resources

  6. Issues to consider • Initially the schema was applied manually which was both very time consuming and produced errors. A number of steps were introduced to automate the system • The amount of work involved in retro conversion varies from parliamentary year to year. New administrations tend tend to have more legislation, administrations with slight majorities tend to have more divisions etc. • The schema needs to be sufficiently flexible or adaptable to cope with differences between parliaments – such as multi-member constituencies. • It would be useful to see to what degree existing XML could be used to apply the schema • A pick and mix approach to elements of the schema would be good. Such is the detail at present tagging is highly complex.

  7. Lessons learnt • Real-time conversion of content – as proposed to the Welsh and Northern Ireland Assemblies is likely to be far less problematic than retro-conversion • In total only 14 years of Hansard have been converted during the project. Whilst the PML was honed, and staff became more familiar with the content this is a very slow process • Hence there is a need to make the best possible use of any existing xml and to automate as much of the process as possible • The project primarily has addressed the PML application to Hansard. Other content – parliamentary reports for example – will result in additional challenges

  8. Examples of stages in the process Creating a unique name and date range for each volume

  9. Development of a function/job list

  10. The fields are pre-populated from the existing authority files. Some skilled data entry staff Have sufficient access privileges to create new roles/people etc

  11. Entering divisions

More Related