460 likes | 581 Views
Automated Benchmarking Of UK Museum Web Sites With An Introduction to UKOLN and UK Web Focus. Email B.Kelly@ukoln.ac.uk URL http://www.ukoln.ac.uk/. Brian Kelly UK Web Focus UKOLN University of Bath Bath, BA2 7AY. UKOLN is supported by:. Contents. About UKOLN
E N D
Automated Benchmarking Of UK Museum Web SitesWith An Introduction to UKOLN and UK Web Focus Email B.Kelly@ukoln.ac.uk URL http://www.ukoln.ac.uk/ Brian Kelly UK Web FocusUKOLN University of Bath Bath, BA2 7AY UKOLN is supported by:
Contents • About UKOLN • UKOLN’s WebWatch Work For UK HEIs • Benchmarking UK Museum Web Sites • Comparison With “6 Of The Best” • Limitations Of Approach • Where To From Here?
UKOLN • UKOLN: • National focus of expertise in digital information management • Based at University of Bath • Funded by JISC (HE and FE sector) and Resource: The Council for Museums, Archives and Libraries, together with project funding (e.g. EU and JISC) • About 25 FTEs • Carries out applied research (e.g. in metadata), software development and provides policy and advisory services
UKOLN’s Dissemination Work • UKOLN carries out dissemination activities including work carried out by UKOLN’s Policy and Advice Team: Interoperability Focus Close links with Resource and Museums community (member of CIMI Executive Committee) Involved in e-GIF standards work See <http://www.ukoln.ac.uk/interop-focus/> Collection Description Focus Funded by JISC, RSLP and British Library Coordination work on collection description methods, schemas & tools with goal of ensuring consistency across projects, disciplines, institutions and sectors See <http://www.ukoln.ac.uk/cd-focus/> Bibliographic Management UK Web Focus - myself
UK Web Focus • UK Web Focus: • Funded by JISC to provide advice on Web developments • Organises events (e.g. annual Institutional Web Management Workshop), writes articles (e.g. regular columns in Ariadne e-journal), gives talks, etc. • A member of UKOLN’s Policy and Advice Team (which also includes Interoperability Focus, Collection Description Focus and Public Library Networking Focus) • Managed the original WebWatch project and continues to publish results of WebWatch surveys
Community Building • An important part of my work is community building within UK HE / FE Web management communities: • An annual 3 day workshop which provides an opportunity for Web managers to: • update their technical skills and approaches to managerial and strategic thinking • discuss and share problems and solutions with peers • Active participation in (e.g.) JISCMail mailing lists e.g.: • web-support: “My home page doesn’t look right in Netscape 4. Can anyone help?” • website-info-mgt: “A Web site has stolen text and images from my Web site. What should I do?”“How should I impose a consistent look-and-feel across all departmental Web sites?” • Comparing approaches across community and sharing best practices
WebWatch Project • WebWatch project: • Initially funded for 1 year in 1997 by BLRIC to develop and use automated robot software to analyse Web developments across various UK communities • Once funding finished the work continued, but made use of (mainly) freely available Web services to analyse various features of Web site communities • Supports community-building work across UK HE/FE Web managers (sharing, not flaming) • See <http://www.ukoln.ac.uk/web-focus/webwatch/>
WebWatch Surveys • Search Engines Used To Index UK HE Web Sites: • ht://Dig most popular and growing in popularity followed by an MS solution • Interest in licensed Ultraseek/Inktomi solution • Interest in externally hosted indexers (e.g. Google) • Surprising number of institutions with no search facility • See <http://www.ukoln.ac.uk/web-focus/surveys/uk-he-search-engines/> • Nos. of Links • Cambridge has most (231,000 links to all servers) • Sheffield has the most to a single server (46,000) • See <http://www.ariadne.ac.uk/issue23/web-watch/> • Nos. Of Web Servers • Cambridge has most (200+) • See <http://www.ariadne.ac.uk/issue25/web-watch/>
Update On Search Engines • Sept 1999 • ht://Dig: 25 Excite: 19 • Microsoft: 12 Harvest: 8 • Ultraseek: 7 SWISH: 5 • Other: 23 None: 59 • Today: • ht://Dig: 48 Microsoft: 17 • Ultraseek/Inktomi: 12 Google: 11 • Excite: 5 Webinator: 5 • Others: 22 None: 29 NOTE The growth in popularity of ht://Dig, the unexpected appearance of the Google externally-hosted service and the move from SWISH and Harvest would not have been noticed without the snapshots. The discussion of surveys informed decision-making.
WebWatch Activities • As well as these metrics a number of observations of features have been carried out 404 Error Page • The appearance of and functionality provided by the institution’s 404 error page Appearance of Main Entry Point • The appearance of the institution’s entry point, and identifying main types (menu-style vs news) and use of technologies (Java, DHTML, etc.) A “rolling demo” has been provided of these features allowing interested parties to quickly get a feel of the approaches taken within the community These have proved very popular – see <http://www.ukoln.ac.uk/web-focus/site-rolling-demos/>
Benchmarking • WebWatch approach of monitoring UK HE Web sites can be extended into a benchmarking exercise: • Making comparisons with peers • Checking compliance with standards • Checking compliance with community or funders guidelines (e.g. e-GIF guidelines) • This has advantages for organisations: • Observing best practices and learning from them • Ditto for bad practices • Community building • and some potential disadvantages: • Establishment of leagues tables • Inappropriate comparisons • Penalty clauses for failure to comply with standards
Benchmarking Museum Web Sites • WebWatch approach to benchmarking has been applied to a small number of UK Museum Web sites: • Small selection chosen in order to: • Keep resource requires to a minimum • Validate methodology • Gauge interest in this approach • Selected resources were: • Sample of museum Web sites • Guardian’s six best museum Web sites If methodology is felt to be valid and there is sufficient interest the approach could be taken more widely across the museum community Details of survey available from <http://www.ukoln.ac.uk/web-focus/events/conferences/museums-2001/>
Abbot Hall Art Gallery Aberdeen Art Gallery & Museums AccessArt Aerospace Museum Allhallows Museum Althorp House Amberley Museum American Museum in Britain Armagh Planetarium Arnolfini Gallery Ashmolean Museum of Art & Archaeology Astley Hall Museum and Art Gallery Avoncroft Museum of Historic Buildings The 13 Selected Museum Web Sites Benchmarking Activity • Choosing the sample: • mda list of UK Museum Web sites used as master source <http://www.mda.org.uk/vlmp/> • Web sites beginning with letter “A” were chosen <http://www.mda.org.uk/vlmp/#A> • Andrew Carnegie Birthplace Museum removed from sample as Web site was unavailable
Approaches http://www.netmechanic.com/toolbox/html-code.htm • Approaches taken: • Use of freely-available Web sites which provide analysis capabilities • Page of “live links” provided enabling all users to reproduce findings • Complement this with manual inspection • Benefits of this approach: • Openness, reproducibility and objectivity of survey
Domain Names Reminder – findings are for a small, non-random sample • Findings • 11 museums (92%) have an entry point which is the domain name and 2 (8%) have an entry point which is one level beneath the domain name • 6 (46%) have a .co.uk domain; 3 (23%) have .org.uk; 2 (15%) have .com; 1 (8%) has .org; 1 (8%) has .ac.uk • Discussion • Most of the museums have a short, memorable URL • The variety of top level domains may be confusing for end users • How will the new .museum domain be deployed?Is there an opportunity for a major advertising campaign?
Server Software • Netcraft used to analyse Web server software • Findings • 7 hosted on a Unix platform (4 on Linux, 2 on Solaris and 1 on BSD) • 6 hosted on a Microsoft platform (4 on NT 4 or Windows 98, 2 on Windows 2000) • Issues • Security, scalability, ease-of-use, …. http://www.netcraft.com/
Standards Compliance • Entry point examined for compliance with HTML and CSS standards using the NetMechanic and W3C Validator Web-based tools: • Findings • 0 pages were HTML compliant (according to W3C) • Of the 5 sites which contained a CSS style sheet, 0 had errors (according to W3C) • 3 pages were HTML compliant (according to NetMechanic) • Issues • HTML-compliance is important for ensuring wide accessibility and for repurposing content
Accessibility • Entry point examined for compliance with W3C WAI guidelines for accessibility using the Bobby Web-based tool: • Findings • Only 2 pages had no WAI Priority 1 error • Issues • Compliance with accessibility standards is important for ensuring access to resources for people with disabilities • Compliance with accessibility standards may be an organisational requirement • Compliance with accessibility standards may be a legal requirement
Size Of Entry Point Using Bobby • Findings (Bobby) • Largest entry point initially appeared to be 159 Kb • On further analysis of framed sites the largest entry point was found to be 236.91 Kb • The smallest appeared to be 1 Kb – but this was a FRAMES page (and not the individual linked pages) • On further analysis of framed sites the smallest entry point was found to be 15.45 Kb • Issues • Bobby flagged pages which used frames but further manual analysis and calculations were needed
Size Of Entry Point Using NetMechanic • Findings (NetMechanic) • Largest entry point initially appeared to be 237,107 b (231 Kb) • The smallest appeared to be 16,045 b (15.7 Kb) • Issues • NetMechanic flagged pages which used frames but further manual analysis and calculations were needed Bobby and NetMechanic identified the same largest and smallest sites – but this is not always the case
Comments On Size Measurements • Use of tools to analyse size of Web pages has indicated several issues: • Need for manual inspection of results (normally outliers) in order to spot invalid comparisons • Different ways of treating: • Redirects Frames • User-agent negotiation etc. and inconsistencies in handling: • robot exclusion protocol • external files (e.g. CSS and JavaScript), etc. may result in inconsistent findings • Changes in content of page (e.g. inclusion of news items, personalised interfaces, etc.) • Output generated for viewing on Web, not further processing • Current need to manual sum sub-parts
Link Popularity • The numbers of links to the Web site was found using LinkPopularity (which has an interface to AltaVista): • Findings • The most linked-to Web site had 2,731 links • The least linked-to Web site had 45 links • Issues • Links can drive traffic to your Web site • Links can be used by citation-based search engines (such as Google) to boost the ranking of your site (many links to your page means Google will give it a higher ranking than a similar page with fewer links) • Snapshots of link popularity can help gauge effectiveness of publicity campaigns
Search Engine Coverage / Size Of Web Site • AltaVista and Netscape’s What’s Related tool were used to measure the size of the museum Web sites (i.e. the numbers of pages they had indexed): • Findings • Most no. of pages indexed by AV was 2,037 pages • Most no. of pages indexed by NS was 1,919 pages • Least no. of pages indexed by AV was 0 pages • Most no. of pages indexed by NS was 0 pages • Issues • The nos. of pages indexed should be ≥ 0 and ≤ nos. of pages on Web site • If significantly fewer pages are indexed than exist, this may show a Web site which is not search-friendly (e.g. use of frames, splash screens, etc.)
Search Facility • Information on museum’s search engine was found: • Findings • 10 sites have no search facility • 3 have a search facility: • 1 uses the FreeFind externally-hosted search engine • 1 uses a Microsoft search engine • 1 uses a Perl script (to search an online catalogue) • 1 search facility not working (over 1 month period) • Issues • Users expect to be provided with search facilities • It can take < 30 minutes (and little technical expertise) to make an externally hosted search engine available, suitable for simple static Web sites (but not many people know this)
404 Error Page • Information on the 404 error page was found: • Findings • 10 sites use the default 404 error message • 3 have a lightly branded error message, but with little additional functionality • Issues • The 404 error page is (sadly) likely to be widely accessed • It is desirable that it: • Reflects the Web sites look-and-feel • Provides functionality to assist a user who is ‘lost’: • Provides access to a search facility / site map • Provides contact details • The 404 page can also be context-sensitive (e.g. different pages for users following a local link / remote link / no link)
Robots.txt • Information on the Web site’s robots.txt file was found: • Findings • 12 sites have no robots.txt file • 1 site has a simple robots.txt file • Issues • robots.txt file can be used to control indexing of your Web site e.g. stop robots from indexing: • Pre-release versions of pages • Test areas • …
Other Surveys • Additional surveys were carried out: • Cachability Of Entry Point • Cacheability Engine used <http://www.mnot.net/cacheability/> • 11 entry points were cachable and 2 were not • What’s Related To Web Site • Netscape’s What's Related? facility <http://home.netscape.com/escapes/related/> used to record: • Popularity, nos. of pages and nos. of links • Relationships with other sites
Six of the Best: Museums • Guardian’s Online supplement (18 Oct 2001) published their list of the six best Museum Web sites: • The Hermitage in St Petersberg at<http:// www.hermitagemuseum.org/> • Metropolitan Museum at <http:// www.metmuseum.org/> • SCRAN at <http:// www.scan.ac.uk> • Tate Modern at <http://www.tate.org.uk/modern/> • The Louvre at <http://www.louvre.fr/> • Design Museum at <http://www.designmuseum.org/>
Comparisons • Automated Surveys • 3 had a search facility • Nos. of links to sites ranged from 723 to 18,366 • All surveyed entry points had P1 accessibility errors • All surveyed entry points had HTML errors • Observations • 3 were providing a search facility • Most were providing a simple robots.txt file • Some of the 404 error messages were slightly better
Accessible to Browsers • How do the Web sites look in different browsers? • The Lynx text browser and an emulation of the Mosaic browser were used in order to investigate how the Web sites would look to: • Users of old browsers • Users of browsers with no JavaScript support • Users of text browsers (or an indexing robot)
Limitations Of Survey • Limitations of this type of benchmarking approach include: • Lack of standards • Limitations of the tools • Resources needed to carry out surveys • Scoping of Museum sites and invalid comparisons • Automated approach fails to address content issues which require a manual approach
Limitations - Standards • There is a lack of standards to support benchmarking work (or conflicting standards). For example: • Size of a page How do you measure the size of the museum’s entry point? You need this in order to make comparisons and if, say, you have guidelines on the maximum file size. • Problems • What do you measure (HTML file, inline images, external CSS and JavaScript files, …)? • Changes in file content (e.g. user-agent negotiation, news content, frames and refresh elements, etc.) • How do you handle the robot exclusion protocol (REP) NOTE: Bobby and NetMechanic work differently: the former only measure HTML and images, the latter obeys the REP
Limitations - Tools • Issues: • Auditing tools tend to make implicit definitions (e.g. measuring size of a page). Different results may be obtained when using different tools for same purpose (or if vendor changes its definition) • Use of Web-based auditing services:Talk has described use of (mainly free) Web-based servicesThe providers may change their policy Use of the URL interface to pass parameters (rather than direct use of the form on the Web page) may not be allowed • Use of desktop auditing toolsUse of desktop tools avoids the problems of change control of Web based services.However it means that it may be difficult for others to reproduce findings
Limitations - Resources • It can be time-consuming to: • Maintain URL of entry point to museum Web sites (need to have close links with provider of central portal) • Manage the input to the variety of Web-based services • Process the output from the Web-based services (current need to initiate inquiry, wait for results and manually copy and paste results)
Limitations – Scope of Web Site • Scope • What is a museum Web site? • What is not part of a museum Web site? • It can be difficult to answer these questions. • There are no standard ways to define a “Web site” other than by use of domain names and directory structures • Even directory structures can be inadequate if they are not used correctly • Comparisons • It may not to sensible to make comparisons between museums of different types and sizes
Limitations – Automated Only • Use of an automated approach: • Would not (easily) address content issues • Has been supplemented with manual observations (e.g. home page, 404 page & search engine page) • However: • An automated approach can be more objective and reproducible • An automated approach should be less resource-intensive (once software has been set up to maintain links to resources, surveys sites and process results) • A automated approach could be used in conjunction with a manual survey (of a representative sample set of resources)
Beyond A Pilot • Despite the limitations which have been described, would a comprehensive and systematic benchmark of UK Museum Web sites be of benefit? • Can we address the resource issues? • Are the lack of standards being addressed? • Can we find someone to do the work? • Should the focus be developmental? • Can the work be extended to provide notification of problems (e.g. search engine not working)? What may happen if we don’t do this? Might we find that funders set up inappropriate or flawed performance indicators?
A Model For Implementation • The benchmarking process can be made less time-consuming if a more flexible model for managing the data was used At present we seem to have a HTML page with links to museum Web sites Unfortunately HTML pages are difficult to repurpose A better model is to store links in a neutral databases, and to generate pages for viewing by end users and for input into benchmarking Web services The database could also be reused for other purposes e.g. checking links and email notifications of problems Page for inputto Web services Page for viewing
Towards “Web Services” • Background • Web initially implemented for provision of information • CGI allowed users to input data and provided integration with backend applications • Techniques described use URL as input to auditing service. However this provides limited functionality and is susceptible to vagaries of marketplace • Future • “Web Services” will support machine integration by providing a standard messaging infrastructure which uses HTTP protocol • XML output (e.g. EARL) will provide a neutral format for benchmarking output, and can describe benchmarking environment (EARL is RDF)
Need For Standard Definitions • Need For Standard Definitions • There is a need for standard definitions of terminology such as Web page, visit, unique visit, session, etc. in order to ensure that meaningful and objective comparisons can be made • The market place is addressing current deficiencies within Web Advertising and Web Auditing communities (and there are financial incentives for this to be solved) • With the growth in e-governments internationally and governments setting targets (X% of government work to be carried about electronically by 2005)
Doing The Work • If there is further interest, who should do the work? Who Researcher Why Volunteer Funding body Student project Part of current remit Auditing body New remit Other(s) Other central body Research interest BenchmarkingWork Dissemination What Provides benefitsto community Maintain central database Software development Producing reports
What Next? • To summarise: • Approach to the automated benchmarking of a small set of museum Web sites has been shown • Implications of the findings have been discussed • There are limitations of the methodology • It is suggested that: • Despite the limitations benchmarking of museum Web sites can be beneficial: • Community building • Learning from successes and mistakes • There may be advantages in carrying out this work within the community
Questions • Any questions? • Questions For You • Would further work be useful? • Who would do the work? • Is there a need for a portal for use by the community of museum Web managers as well as for end users? • Anyone interested in joint work in this area (possibilities of a paper for a conference - e.g. Museums and the Web 2002 conf. - proposals needed by 30 Nov)