380 likes | 471 Views
Prepared by: Stephen Edmonds December 2004. Developing the Monash Research Directory. What is it?. A searchable web based directory of research publications and researchers at Monash University. Developed using perl and open source modules. Search form. Author search results.
E N D
Prepared by:Stephen EdmondsDecember 2004 Developing the Monash Research Directory
What is it? • A searchable web based directory of research publications and researchers at Monash University. • Developed using perl and open source modules.
Why? • Each year the research activities at Monash University produce a significant amount of output in the form of: • Journal articles • Books • Conference papers • and more… • Unfortunately only a limited number of people are aware of the full range of output.
Why? • A publicly available directory could potentially raise the profile of research activities at the University. • Additionally the Monash Research Directory would be the first of a series of research oriented tools for: • Researchers at Monash • People interested in research
Initial requirements • Publicly available through the Monash website. • Restricted access interface through the my.monash staff and student portal. • Utilise existing information from systems around the University. • Present the most up to date information possible. • Only display research output generated by current staff members of the University.
Research Master • A commercial product used to track research activities around the University. • Information regarding the research activities is entered by representatives from each faculty within the University. • Within Research Master one module contains details of the research output.
Research Master • … and another contains details of the authors of the research output. • 30,000 publications covering 8 years. • 25,000 distinct authors. • The information is stored in an Oracle database for use with a client application.
Monash Directory Service • Contains an entry for each current student or member of staff of the University. • Automatically updated from a number of sources such as the payroll system or the internal telephone directory. • Staff members have the ability to enter additional information into their entry such as: • Research interests • Professional associations • Biography • Photograph (as a JPEG) • A standard LDAP service.
Public Monash website • Farm of linux boxes running Apache web servers • Perl CGI is one of many technologies available.
my.monash portal • A integrated view of the University for both staff members and students. • Uses HTML::Mason, a dynamic web site authoring system written in perl.
The problem so far… • Two backend systems: • Research Master (Oracle database) • Monash Directory Service (LDAP service) • Two frontend environments: • my.monash portal (perl through HTML::Mason) • Public website (perl CGI)
The problem so far… • Some kind of glue is required between these four systems:
And the answer was… • A module or set of modules. • Written in perl.
But how? • The preliminary analysis showed that an author: • Has a variety of details. • Relates to one or more publications. • While a publication: • Has a variety of details. • Relates to one or more authors.
But how? • This data can be represented by a simple hierarchy:
This complete encapsulation of business logic within classes means that the usage code is simply: But how? my $research = Monash::ResearchDirectory->new( ... ); if ($research->search('name' => ‘john smith’)) { foreach my $author ($research->authors()) { print $author->name(), "\n"; foreach my $publication ($author->publications()) { print $publication->title(), "\n"; } } }
Publication data issues • The data contained within the Monash Directory Service is clearly defined. • However the data stored in Research Master for a publication can vary from category to category • … and even from year to year.
Publication data issues • A solution was to retrieve the field labels from the database and then generalise the access methods on the publication class: foreach my $field ($publication->fields()) { my ($label, $value) = $publication->field($field); if ($value) { print $name, "\t", $value, "\n"; } }
Internals • As already stated the act of encapsulating as much business logic as possible in the classes means that the CGI script and HTML::Mason component aspects become trivial. • At first it appeared to be the opposite case for the internals of the classes • … however it fortunately did not become as complicated as feared.
Publication title search • Walkthrough of some of the interesting part of the publication title search process when the following call is made: $research->search('name' => ‘john smith’);
Querying Research Master • Simplified by being able to query the backend Oracle database directly. • A compromise between performance and maintenance resulted in a single SQL query. • Unfortunately information is now duplicated in the results …
Querying Research Master • … which can be selectively ignored during processing: while (my $row = $sth->fetchrow_hashref('NAME_lc')) { my $author = $self->_find_or_create_author($row); my $publication = $self->_find_or_create_publication($row); $author->add_publication($publication); $publication->add_author($author); }
Querying the Monash Directory Service • A filter is constructed from the results obtained by querying Research Master: • Which is then used to query the Monash Directory Service using Net::LDAP my @numbers = map { $_->employeenumber() || () } $self->authors(); my $ldap_filter = q{(|} . join q{}, map { qq{(employeenumber=$_)} } @numbers . q{)} ;
Correlating results • Results from the Monash Directory Service are then attached to the appropriate author object: foreach my $author ($self->authors()) { my $entry = $self->_get_ldap_entry($author->employeenumber()); $author->set_ldap_entry($entry) if $entry; }
Correlating results • The publications which do not have at least one current staff member of the University as an author are now removed from the results: foreach my $publication ($self->publications()) { unless (grep { $_->is_monash() } $publication->authors()) { $self->destroy_publication($publication); } }
Correlating results • Finally all the authors without any publications are removed from the results: foreach my $author ($self->authors()) { unless ($author->publications()) { $self->remove_author($author); } }
At this point the object represents sufficient objects to enable the search results to be displayed: Results $research->search('name' => ‘john smith’); foreach my $author ($research->authors()) { print $author->name(), "\n"; foreach my $publication ($author->publications()) { print $publication->title(), "\n"; } }
Limitations • At no point do the author or publication objects in existence represent the entire Research Directory. • Which means that a fresh search is required for the various pages in the interface. • Not such of an issue due to the stateless nature of the web.
Complicated scientific formula in titles • Plain text: • 2] • Rich text formatted: • {\rtf1\ansi\deff0{\fonttbl{\f0\fswiss Arial;}{\f1\fnil\fcharset2 Symbol;}} \viewkind4\uc1\pard\lang1033\f0\fs24 2] \fs18 Unprecedented \f1\fs24 m-h\up5\fs14 2:\up0\fs24 h\up5\fs14 2\up0\f0\fs18 - pyrazolate coordination in [\{Yb(\f1\fs24 h\up5\f0\fs14 2\up0\fs18 - \f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(\f1\fs24 m\f0\fs18 -\f1\fs24 h\up5\f0\fs14 2\up0\fs18 :\f1\fs24 h\up5\f0\fs14 2\up0\fs18 -\f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(thf)\}\dn5\fs14 2\up0\fs18 ] \par } • Correctly rendered: • 2] Unprecedented μ−η2:η2- pyrazolate coordination in [{Yb(η2- ƒBu2pz)(μ-η2:η2-ƒBu2pz)(thf)}2]
Complicated scientific formula in titles • Unfortunately this cannot be reliably rendered using HTML. • The perl module RTF::HTML::Converter is able to convert the RTF above to: • 2] Unprecedented m-h2:h2- pyrazolate coordination in [{Yb(h2 - ¦Bu2pz)(m-h2:h2 -¦Bu2pz)(thf)}2] • While not perfect it is a significant improvement and deemed satisfactory.
Conclusion • A practical example of how perl can be used to draw information from two sources, one a commercial application, and present the information in two similar but disparate environments. • All by using two widely used modules: • DBI (and DBD::Oracle) • Net::LDAP • And a third publicly available module: • RTF::HTML::Converter
Thank you • Any questions? • The publicly available version of the Monash Research Directory is available at: • http://monash.edu/research/directory/