1 / 27

Building Greenstone Collections from the Command Line

Building Greenstone Collections from the Command Line. Basic commands. Type “setup.bat” (for Windows users) or “setup.sh” for (Unix/Linux users) when you’re in the Greenstone installation directory To create a collection, type “perl –S mkcol.pl –creator youremail@somewhere.com collection_name”

questa
Download Presentation

Building Greenstone Collections from the Command Line

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building Greenstone Collections from the Command Line

  2. Basic commands • Type “setup.bat” (for Windows users) or “setup.sh” for (Unix/Linux users) when you’re in the Greenstone installation directory • To create a collection, type “perl –S mkcol.pl –creator youremail@somewhere.com collection_name” • To import documents into a collection, type “perl –S import.pl collection_name” • To build a collection, type “perl –S buildcol.pl collection_name” • For further details, read page 9 – 19 of the developer’s guide

  3. Documents Documents Documents Building A Collection In Greenstone XML documents Browsing and full text Web Import Archives Index import.pl (plugins) build.pl (classifiers)

  4. Importing documents • Plugins are used to process source documents in different formats and associate the corresponding metadata to them • The output of this process is XML documents encoded in the Greenstone Archive format specified by the following DTD <!DOCTYPE GreenstoneArchive [ <!ELEMENT Section (Description,Content,Section*)> <!ELEMENT Description (Metadata*)> <!ELEMENT Content (#PCDATA)> <!ELEMENT Metadata (#PCDATA)> <ATTLIST Metadata name CDATA #REQUIRED> ]>

  5. Automating collection building tasks • Batch files can automate many of the tasks • You can create a batch file to import and rebuild a collection • Try copy and paste the following lines into a batch file named “rebuild.bat”: Perl –S import.pl –removeold %1 Perl –S buildcol.pl %1 • Execute the batch file by typing “rebuild.bat collection_name” • There are many commands that you can combined in a batch file

  6. Importing documents (cont.) • An example: <Section> <Description> <Metadata name="gsdlsourcefilename">ec158e.txt</Metadata> <Metadata name="Title">Freshwater Resources in Arid Lands</Metadata> <Metadata name="Identifier">HASH0158f56086efffe592636058</Metadata> <Metadata name="gsdlassocfile">cover.jpg:image/jpeg:</Metadata> <Metadata name="gsdlassocfile">p07a.png:image/png:</Metadata> </Description> <Section> • Note: gsdlsourcefile is the original file from which the Greenstone archive file was generated, and gsdlassocfile is File associated with the document (e.g. an image file)

  7. Document Metadata • Greenstone Plugins recognize only a small set of metadata tags • There are three ways to assign metadata to documents in a collection: 1) index.txt, 2) metadata.xml and 3) modify an existing Greenstone plugin • An index.txt file is a space separated file that assigns a list of metadata to documents in a collection. It should be placed in the collection import directory

  8. Document Metadata (cont.) • To inform Greenstone about the existence of this file, include the IndexPlug plugin in your collect.cfg file or add this plugin to your plugin list in GLI • An example of the index.txt file is as follows: key: Title Date Cast Director "analyze.html" "Analyze That" "2002" "Robert De Niro, Billy Crystal, Lisa Kudrow" "Harold Ramis“ "majestic.html" "Majestic, The" "2001" "Jim Carrey, Bob Balaban, Jeffrey DeMunn" "Frank Darabont“ • Each of the fields in this file are seperated by a space and enclosed in double quotes. Their offsets are matched with the listing of fields shown in the first lien of the file • Note that the first field of this listing must be the filename of a document • The trailers collection uses this approach to assign metadata to documents in a collection

  9. Document Metadata (cont.) • The second approach uses an XML file to assign metadata to documents in a collection • To inform Greenstone that you would like to use the metadata.xml file, include the string “plugin RecPlug -use_metadata_files” in your collect.cfg file or check the use_metadata_files flag after clicking on the configure plugin button in the GLI • The benefits of using an XML file over the previous approach is that the browser can perform tag checking for you

  10. Document Metadata (cont.) <?xml version="1.0" ?> <DirectoryMetadata> <FileSet> <FileName>MARTYN_DR_02002066.html</FileName> <Description> <Metadata name="PlayerID">MARTYN_DR_02002066</Metadata> <Metadata name="PlayerProfile"></Metadata> <Metadata name="PlayerName">Damien Richard Martyn</Metadata> <Metadata name="FullSizeImage">http://www-usa.cricket.org//perl/picture.cgi/030730</Metadata> <Metadata name="ThumbnailImage">http://www-usa.cricket.org//perl/picture.cgi/030730/inline?alt=1</Metadata> <Metadata name="CoverImage">MARTYN_DR_02002066.jpg</Metadata> <Metadata name="Country">Australia</Metadata> <Metadata name="BattingStyle">Right Hand Bat</Metadata> <Metadata name="BowlingStyle">Right Arm Medium</Metadata> </Description> </FileSet> <FileSet> <FileName>POTHECARY_JE_03001137.html</FileName> <Description> <Metadata name="PlayerID">POTHECARY_JE_03001137</Metadata> <Metadata name="PlayerProfile"></Metadata> <Metadata name="PlayerName">James Edward Pothecary</Metadata> <Metadata name="Country">South Africa</Metadata> <Metadata name="BattingStyle">Right Hand Bat</Metadata> <Metadata name="BowlingStyle">Right Arm Medium</Metadata> </Description> </FileSet> • Can you recognize the XML structure this uses?

  11. Document Metadata (cont.) • Here’s the answer: <DirectoryMetadata> <FileSet> <FileName>text </FileName> <Description> <Metadata name=“name1">some text</Metadata> <Metadata name=" name 2"> some text </Metadata> other Metadata tags… </Description> </FileSet> other FileSet tags … <DirectoryMetadata> • Note that XML is case sensative • The cricket collection uses the metadata.xml to assign metadata to the documents

  12. Document Metadata (cont.) • We can also customize a plugin to extract metadata from a document • We will look at modifying the TextPlug to extract Ratings, Genre and Subject from a few documents in the trailers collection

  13. Structuring Documents into Sections • Sometimes source documents have to be structured into sections and subsections • This can be done easily by incorporating the following HTML tags into your documents: <!-- <Section> <Description> <Metadata name="Title"> Realizing human rights for poor people: Strategies for achieving the international development targets </Metadata> </Description> --> (text of section goes here) <!-- </Section> --> • You can also embed subsections within another section by embedding another level of <Section> before the </Section> tag • Look at one of the HTML files in the demo collection for an example

  14. Browsing Indexes

  15. Types of Browsing Indexes • SectionList • AZList • AZSectionList • DateList • Hierarchy

  16. Creating Browsing Indexes • Certain classifiers generate browsing structures that are hierarchical • They are useful for subject classifications and organization hierarchies • Therefore specific hierarchies will have to be provided using the flag –hfile <filename> when the classifier is defined in the collect.cfg file • For example: classify Hierarchy –hfile sub.txt –metadata Subject –sort Title

  17. Creating Browsing Indexes (cont.) • Note that sub.txt has to reside in the /etc directory • Certain classifiers don’t require explicit hierarchies to be defined. For instance, the AZList, DateList and List classifiers that generates a selection list of the corresponding metadata classify List –metadata Howto classify AZList –metadata Title

  18. Creating Browsing Indexes (cont.) • Explicit hierarchies have to be define according to the following format: <identifier> <position in hierarchy> <name> • For example: 1 1 “General reference” 1.2 1.2 “Something else” 2 2 “….” • What this means is that the metadata type associated to the current classifier will be assigned to the first classification if it has the value 1 within the document • Look at the demo collections for examples

  19. Creating Browsing Indexes (cont.) • Documents are treated internally as tree nodes by Greenstone • There are three types of nodes: Vlist, Hist and Datelist • For example, an AZList consists of a collection of Vlist nodes that represent documents • Arguments accepted by various classifiers are in page 48 of the developer’s guide

  20. Formatting Browsing Indexes • Each classifier has an implicit name from its position in the collect.cfg file. For example, the third classifier specified in the file is called CL3 • Tags in the formatting strings: • [Text] – document text • [link] … [/link] – link to the document itself • [icon] – icon representing the resource • [metadata-name] – value of the metadata associated to this document

  21. Formatting Browsing Indexes (cont.) • For example: format CL4Vlist “<br>[link][Howto][/link]” • Conditional statements are supported in the formatting string. They are enclosed by the ‘{’ and ‘}’ characters in these formats: {If}{[metadata], then clause, else clause} {Or}{action, another-action, another-action, etc} • The {If} statement is the same as most program languages • The {Or} statement evaluates the items in the list and stops when one of them is non-null. Its value is sent to the output and evaluation is terminated.

  22. Formatting Browsing Indexes (cont.) • For example: format VList "<td valign=top>[link]<img src=_httpprefix_/collect/cricket/images/[PlayerID].jpg border=0></link></td><td>[link][Title][/link]</td><td>{If} {[HasAudio],<a href=[audioURL]><img src=_httpprefix_/collect/cricket/images/wav.jpg border=0></a>}</td>"

  23. Customizing the look and feel of Greenstone

  24. Customizing the look and feel of Greenstone • Involved files are in gsdl/macros directory: • Base.dm – global macros, such as custom buttons • English.dm – text for the corresponding language • Home.dm – The main GSDL page • Gsdl.dm – About Greenstone page • Style.dm – Page layout • Query.dm – Query form layout

  25. Customizing the look and feel of Greenstone (cont.) • Background image (chalk.gif) Base.dm: _httpiconchalk_ {_httpimg_/chalk.gif} _widthchalk_ {2000} _heightchalk_ {10} • Custom Button Base.dm: _Genrewidth_ {_widthtGenrex_} _imageGenre_ {_gsimage_(_httpbrowseGenre_,_httpicontGenreof_,_httpicontGenreon_,Genre,_textimageGenre_)} _icontabGenregreen_ {<img src="_httpicontGenregr_" width=_widthtGenrex_ border=0>} _icontabGenregreen_[v=1] {_texticontabGenregreen_}

  26. Customizing the look and feel of Greenstone (cont.) • Document.dm _textGenrepage_ {_texticonhGenre_} _iconGenrepage_ {<img src="_httpiconhGenre_" width="_widthhGenre_" height="_heighthGenre_">} _iconGenrepage_ [v=1] {<h2>_texticonhGenre_</h2>}

  27. Customizing the look and feel of Greenstone (cont.) • English.dm _textimageGenre_ {Browse by Genre} _texticontabGenregreen_{Genre} _httpicontGenregr_{_httpimg_/tGenregr.gif} _httpicontGenreon_{_httpimg_/tGenreon.gif} _httpicontGenreof_{_httpimg_/tGenreof.gif} _widthtGenrex_ {114} _texticonhGenre_ {Genre} _httpiconhGenre_ {_httpimg_/h\_Genre.gif} _widthhGenre_ {250} _heighthGenre_ {57} _textGenreshort_ {access publications by Genre} _textGenrelong_ { <p>You can <i>access my documents by whatever I have defined</i> by pressing the <i>Genre</i> button. This brings up a list of documents. }

More Related