630 likes | 787 Views
LIS65 4 lecture 1 Introduction to the course, the ssh protocol. Thomas Krichel 2012-09-07. what's up doc?. This lecture is to introduce the topic of digital libraries. First part: we look at the nature of digital libraries. This part is informed by the first chapter of Arms book.
E N D
LIS654 lecture 1Introduction to the course, the ssh protocol Thomas Krichel 2012-09-07
what's up doc? • This lecture is to introduce the topic of digital libraries. • First part: we look at the nature of digital libraries. This part is informed by the first chapter of Arms book. • Second part: we talk ssh. A sorry excuse to play with computers.
first part contents • digital libraries • digital librarianship • a course on digital libraries • with the aim of training digital librarians
digital libraries • Generally, we can think about digital libraries are • information stored on a computer • delivered via a network • mimics existing libraries • As Arms puts it “a managed collection of information, with associated services, where the information is stored in digital formats and accessible over a network”.
prospects • We are at the start of digital libraries. • The problem is that the technology is still expensive, the cost is still coming down. • The opportunity is that we can build pioneering systems now, that will have a lasting social impact.
example • ISI journal citation report is based on two years of data of citations to journals. • When Eugene Garfield founded it, he published the report in the second year of getting data. • For the next issue, he chose the same horizon of data. • Citation rankings of journals still use 2 years, almost 50 years after.
benefits: availability • Digital libraries bring the information closer to the user than physical libraries can • physically • temporarily • Even when you are in the physical library you still get faster access to digital library items.
benefits: findability • Information can be more easily found in digital than in print. • Some non-textual information is still only findable via metadata. • But computer scientists are working on that.
benefits: sharing • Information can be shared. • Items can not be damaged. • Items can not be stolen.
benefits: updating • Information can be kept up-to-date more easily. • To update a book, you have to reprint all copies, and replace them.
benefits: new media • Information can be created and manipulated in completely new ways. • For example location information can be mixed up with subject information.
issue: costs • The cost of storing print information is very high. It is a multiple of acquisition costs. • Digital storage devices decline in price. • But digital information manipulation requires skills that are not easy to procure. • The overall cost comparison is difficult to assess.
drawback: preservation • Preserving information is easy on paper. • Preserving digital information looks very hard. • We will not look at this issue in the course, because there is a specialized Palmer School course dealing with this.
drawbacks: monopoly dangers • Since the information only needs to be kept in one copy, and others can access it, there are inherent dangers of the build-up of monopolies. • One example is Google search engine.
drawbacks: free information • Since the information is more easy to copy it is harder to police illegal sharing. • Some creators and intermediaries are feeling the pinch. • The newspaper industry is one. • Physical libraries are one potential victim.
drawbacks: professional upheaval • Digital librarianship is as yet, largely undefined. • This leads me to the next topic.
digital librarianship • Librarianship has always been a bicephal occupation. • Libraries always have a collection and service aspect them. • Digital libraries are no different.
collection aspect • The collection has to be managed and organized. • The organizers deal with dead matter, documents. • This organization is a scientific activity. • Librarianship is a natural science. • The librarian is a cataloger in a corner.
service aspect • Users have to be shown how the library works. • Librarians have to understand users’ needs to build services users want. • All these are social activities. • Librarianship is a social science. • The librarian is a people service person.
digital information was hard to use • Computers had to be driven by esoteric commands. • Screens were hard to read from. • Telephone lines where hard to get to work to transmit information • Access costs to digital information was high. • The service aspect was important.
digital information is becoming easier • Computers are more and more easy to use. • Digital information providers tend to communicate directly with customers, bypassing libraries. • Subject literacy becomes relatively more important than information literacy. • The service aspect is being reduced over time.
an important caveat • Most items in the modern (19th, 20th century) are mass-produced. • There is no mass production or mass storage in the digital library. • The difference between publishers, archives and libraries become very blurred.
a course on digital libraries? • My initial thought is that a course on digital libraries is nonsensical. • In the recent future, all libraries will be digital.
digital libraries course • Literacy and use of digital media. • The idea is to look at what digital libraries exist and how to use them. • This is really already done in LIS511. • The course has the “building” theme to it.
building aspect • Building a digital library can basically take three for • electronic resource management • repository building • cross-repository services
electronic resource management • Libraries license digital contents from providers and make them available. • There are some minor technical issue • authentication • integration with ILS • legal issues with the licensing • minor training issues with users
repository building • Libraries are building repositories of local digital or digitized contents. • This is firmly on the technical side. • It is the main focus of the LIS654 course as it has been developed in the past. • We cover digitization as part of repository building.
cross-repository services • I think of repositories as publishers, rather than libraries. • Digital libraries are cross-repository datasets and services attached to them. • This is where I have done almost all my work. • It can not be done without custom computer programming.
course syllabus • It draws on Brian Hoffman’s syllabus in for his Manhattan, Spring 2011 section. • It is quiet non-technical. I will tune up the technology over the years. • One can argue that without computer programming, one can not be a digital librarian. • But most digital libraries fail because of non-technical issues.
my expertise • My main expertise is in setting up completely new open-access digital library services and collections. • In non-technical terms, I can discuss how to set up these service and how they run. • But I am reluctant to appear like a self-promoting pompous git.
the wider environment • Since 2008 I have been trying to build a special digital information concentration in the Palmer MSLIS program. • The current version is at http://openlib.org/ home/krichel/proposals/wic.html. • The LIS654 course is not part of the proposed concentration.
second part: ssh • a bit about the computer • the Internet • ssh and our host • a brief discussion of the operating system of the host
a few words about a computer • To use a computer you need to know something about its operating system (o/s). • The o/s sets out how the computer behaves. • There are two o/s flavors that are widely used • MS Windows (in various version) • UNIX-like operating system • Our server runs an o/s called “Debian GNU/Linux”
some generalities about Debian Debian is an open-source computer operating system developed and maintained by a large group of volunteer. Debian packages together a very large set of pieces of software into a coherent system. It provides a version of the UNIX operating system using Linux. The following notes hold for all (?) Unix flavor operating systems.
files • Files are continuous chunks data on disks that are required for software applications. • Files have names. • Files have permissions attached to them, discussed in "permissions model". • Files have times attached to them. Usually the mtime (time of contents modification) is the only one shown.
directories • Directories are files that contain other files. Microsoft calls them folders. • They have names, permissions and times like other files. • In UNIX, the directory separator is “/” • The top directory is “/” on its own.
links • Links are files that contain the address of other files. • In MS Windows, links are called shortcuts. • The times and permissions of links are kept but they are of no importance.
users and groups • “root” is the user name of the superuser. • The superuser has all privileges. • There are other physical users, i.e. persons using the machine • There are users that are virtual, usually created to run a daemon. For example, the web sever in run by a user www-data. • Arbitrary users can be put together in groups.
permission model • Permission of files are given • to the owner of the file • to the group of the file • and to the rest of the world • A group is a grouping of users. Unix allows to define any number of groups and make users a member of it. • The rest of the world are all other users who have access to the system. That includes www-data!
user name & password • To work with our server, you need a use name and a password. • You can choose your user name as a short form of your own name. • It should be all lowercases and can not have spaces. • Please don’t choose an insecure password.
the Internet • The Internet is an interconnected set of physically disparate networks. • Each computer, when connected to the Internet has an IP address. It's a four-byte number written as four decimal number from 0 to 255 connected by dots. Example: 148.4.2.231. • Once a computer has an address, it can communicate with others using a protocol known as IP.
Internet application protocols • Once we have the Internet, we need protocols to work with it. • They are called Internet application protocols. • Their king is the domain name system. • Two other protocol we will work with are http and ssh.
Domain Name System • Domain Name System allows us to associate human-friendly names with IP addresses. These names are called domains names. • Domain names can be leased from domain name registrars. • A machine with a domain name on the Internet is called a host. • When we know the domain name of the host, we can communicate with the host.
protocols to communicate with hosts • There are two protocols we use in this class. • We use http to work with the omeka web interface • We use ssh for some special operations. • Both protocols are client/server protocols. • You run as ssh or http client on your local machine. • You communicate with a machine that runs ssh or http server software.
the ssh protocol • ssh is protocol that uses public key cryptography to encrypt a stream of communication between client and server. • This allows us to privately manipulate the server. Or “manipulations” are really just changes to files on the server that contain our web pages. • The ssh client software we use on the PC is called WinSCP. It is a file transfer program.
our server • Is a machine called tiu. • The machine has been given a domain name dlib.info. • We also say it is a “host” on the Internet. • It is a rented machine. It cost me money to rent it, about 50 euros a month. • It runs the testing version of Debian/GNU Linux. • It runs both http and ssh server software. • It is maintained by Thomas Krichel.
tiuvs dlib.info • Tiu is a rented machine. It is a physical box somebody can actually touch. • dlib.info is a name. Nobody can touch it. • When the name is resolved into an IP address, that IP address, at this point in time, is the same as tiu’s. • In the future it may be a different machine.
the web site • As part of the course, you are being provided with web space on the host dlib.info, at the URL • http://dlib.info/home/user • where user is a user name that you have chosen. • This shows a list of available fails as prepared by the web server at tiu. • This is a page that Thomas has prepared for you.
ssh protocol • The ssh protocol implements a secure connection to the server over which we can • send instructions to it • store files on it. • wotan run an ssh server. • On your machines, you run ssh client software.
ssh client software • On MS Windows machines, we run • putty for interactive use • WinSCP for file storage and retrieval. • Usually, students in this class only need to understand WinSCP. • On the Mac, you can use • Cyberduck • Fugu • For interactive use on the Mac use Terminal.