670 likes | 801 Views
The new Web Infrastructure and Services. Alberto Pace, Per Hagen Information Technology Division - CERN. alberto.pace@cern.ch http://cern.ch/alberto.pace. per.hagen@cern.ch http://cern.ch/per.hagen. Agenda. Review of the Web Namespace
E N D
The new Web Infrastructure and Services Alberto Pace, Per HagenInformation Technology Division - CERN alberto.pace@cern.ch http://cern.ch/alberto.pace per.hagen@cern.ch http://cern.ch/per.hagen
Agenda • Review of the Web Namespace • Review of the Central Web Services (Basic) • Review of the Central Web Services (Advanced) • Conclusions, Discussion and Questions
Yesterday’s namespace • http://www.cern.ch/… • http://wwwinfo.cern.ch/… • http://network.cern.ch/… • http://home.cern.ch/… • http://nicewww.cern.ch/… • http://wwwas.cern.ch/… • http://wwwlhc.cern.ch/… • …
The solution … A Database of Web sites, A Unique Web namespace for CERN, A Translation / Redirection Service … Generic address Physical address http://[www.]cern.ch/SiteName http://host.cern.ch/path Global namespace domain Physical host location
Similar to the MAIL solution Translation Service Generic address Physical address First.last@cern.ch user@host.cern.ch http://cern.ch/SiteName http://host.cern.ch/site
The redirection is recursive • Once a ‘SiteName’ is registered, the redirection works for all subdocuments Example: http://cern.ch/SiteName -> http://myhost.cern.ch/ http://cern.ch/SiteName/data/a.html -> http://myhost.cern.ch/data/a.html
Advantages • Sites can be reorganised and arranged • Possible migration from local servers to central servers • Possible migration from central servers to local servers But not all problems solved !
Back to the MAIL architecture • The GENERIC address is useful for the Mail DELIVERY only • Unfortunately, “Mail Composing” tools are unable to resolve the generic address to the physical address when connecting to the user’s mailbox • Therefore, for the central mail services, given a user’s mailbox, the mail host can be found using the ‘mailbox’ DNS domain First.last@cern.ch user@user.mailbox.cern.ch
The Web has the same problem • The GENERIC address is useful for HTML reading only • Unfortunately, “Web Authoring” tools are unable to resolve the generic address to the physical address when reading from the generic address • Therefore, we should also register (for the author’s use only) a web host alias that can be found using the ‘web’ DNS domain http://cern.ch/SiteName http://sitename.web.cern.ch/SiteName
Not limited to the central servers • In the mail architecture, the Generic Address (first.last@cern.ch) can point to the the central web servers (xxxx@mail.cern.ch) or to locally managed servers (xxxx@dxcoms.cern.ch) or even to servers outside CERN (xxxx@fnal.gov) • Very similar for the web: http://cern.ch/xxxx http://xxxx.web.cern.ch/xxxxcan point to the the central web servers or to any locally managed server
Once registered … • The user can use any form: • http://cern.ch/SiteName - Read/Only • http://SiteName.web.cern.ch/SiteName - Read/Author/Write • http://host.cern.ch/SiteName - Read/Author/Write • And in addition … all machines on which we have control are aware of the namespace ! This means that, once registered, also the following URLs will work, whatever registered SiteName is used ! • http://cern.ch/SiteName - Read/only • http://www.cern.ch/SiteName - Read/only • http://web.cern.ch/SiteName - Read/only • http://nicewww.cern.ch/SiteName - Read/only • http://home.cern.ch/SiteName - Read/only
More on cern.ch subdomains … • mailbox.cern.ch mail server load balancing • print.cern.ch print server load balancing • web.cern.ch official web sites load balancing • home.cern.ch personal web sites load balancing • webtest.cern.ch test web sites load balancing • IMPORTANT: webtest domain visible only within CERN
Site Aliases • The user can register alternative Site Names (ALIAS) • More descriptive names, old names, NickNames • http://cern.ch/CERN.Web.Services (descriptive name) • http://cern.ch/web (nickname) • http://cern.ch/WebOffice (old name) • http://cern.ch/Alberto.Pace (descriptive name - Personal site) • Alias are mapped to existing sites • http://cern.ch/Alberto.Pace -> http://cern.ch/pace • A site can have an unlimited number of aliases • Aliases have less restrictions in the ‘allowed characters’ (tildas, dots, underlines, …)
The new architecture … • … is compatible with the existing infrastructure • Existing servers and existing URLs integrates smoothly in the namespace • No broken links
The new architecture … • … allows the evolution of the service • Migrations from local servers to central servers • Migrations from central servers to local servers • Migrations between central servers • Split of large servers into smaller ones • Group small servers into bigger ones • Suppression of local servers • Multiple central servers (differentiation possible, if necessary) • Server Load balancing • Stable HTML only service versus full CGI-BIN interfaces • Multiple server platforms • Test versus production sites • Personal versus official sites • …
Renaming sites • The Site Alias feature allows web authors to rename web sites when necessary without breaking existing hyperlinks that have been bookmarked or hardcoded in an unknown number of html documents worldwide
Physical Architecture Web Redirector • Heterogeneous pool of Web servers • Multiple OS (NT4, Win2000, Solaris, Linux) • Web Sites can also be hosted on servers not managed by the Web Services team (locally managed servers) WEBR Database WEB0 WEB1 WEB2 WEB3 ...
A frequent question … • Question: • I do not like the syntax http://cern.ch/xxxx. Can I use http://xxxx.cern.ch ? • Answer: • YES, you have to register xxxx.cern.ch to have the same address of the cern.ch domain. This can be done from the http://cern.ch/register web interface or by sending an e-mail to us (www.support@cern.ch) • Note that you DO NOT need a special or dedicated server
Another frequent question … • Question: • How should I write my http links in my html pages ? Which is the recommended form ? • Answer: • Within your web site, always use relative addressing. This will make your site ‘relocatable’ • <a href=“../foo/bar.html”> • From site to site, use absolute addressing using “cern.ch” or “www.cern.ch”. You can also use the form “xxxx.web.cern.ch” which can be slightly faster. • <a href=“http://cern.ch/xxxx/foo/bar.html”>
More questions … • Question: • I do not like the syntax http://xxxx.web.cern.ch/xxxx because the site name is repeated twice. Couldn’t you use just http://xxxx.web.cern.ch/ ? • Answer: • Yes, we could. However this was incompatible with the existing infrastructure (based on virtual directory) and an important effort in html re-visiting would have been necessary. • However, we can swap to it at any time in the future.
Yet another frequent question … • Question: • What URL should I print on my business card ? • Answer: • Firstname.Lastname@cern.ch for the mailhttp://cern.ch/Firstname.Lastname for the web • Ensure that the alias http://cern.ch/Firstname.Lastname has been created and works properly.
Agenda • Review of the Web Namespace • Review of the Central Web Services (Basic) • Review of the Central Web Services (Advanced) • Conclusions and Questions
Site Hosting • Site hosting means offering disk space to store web files and then make them available via http • In addition to sites stored on AFS or DFS, web sites can (should) be saved on the “central web servers”. This is a pool of (cheap) servers to host web sites of customer who do not want to maintain their own server • Load balanced using the web/home sub-domains • Every web site has one (and only one) owner • Owners are responsible for the site content • Owners can manage or delegate the site security • Owners are able to grant/delegate authoring rights to other people
Site Hosting • Flat Namespace (part of the CERN web namespace) • http://cern.ch/sitename • Subsites are possible but managed by owners of upper sites • http://cern.ch/mainsite/subsite1subsite1 is managed by the owner of mainsite • Example: http://www.cern.ch/CERN/Divisions/ …
Authoring interfaces • Authoring interfaces • HTTP-PUT http://sitename.web.cern.ch/sitename • FTP ftp://sitename.web.cern.ch/sitename ftp://user@sitename.web.cern.ch/sitename • OSE = Microsoft Office Server ExtensionsSee http://www.microsoft.com/office/ork/2000/five/75t5.htm • DAV = Distributed Authoring and versioningSee http://www.w3.org/Protocols/ • No direct support for authoring tools through the file system • Authoring URL (file://xxxxx) different from Publishing URL (http://cern.ch/…) • Endless user support problem • Platform Specific – important effort duplication • (Users should start considering web sites on AFS and NOVELL as legacy)
Site Registration / Creation Services • Site Registration Service • Allows the registration of existing sites in the web namespace • It is the service to manage the web namespace • Useful only for “locally managed servers” – central sites are always pre-registered on their creation • http://cern.ch/webredirect • Site Creation Service • Allows the creation of new central sites in the web namespace • It is the service to manage the web sites (creation/deletion/…) • http://cern.ch/webregister • Users can create/delete web sites or registrations themselves • Users are authenticated on AFS or NICE • Ownership of the site or of the registration is always checked
Site Registration Services WebRedirect site (hosted on web0) Connects to cern.ch/webredirect (using the redirector) Reload Database Namespace mgt client (Web Browser) Update Dynamic Update Web Redirector (webr.cern.ch) DNS of the web.cern.ch subdomain (wgs01.cern.ch, wgs02.cern.ch) Oracle Database (cerndb1.cern.ch)
Sites Creation & Mgt Services WebRegister site (hosted on web0) Connects to cern.ch/webregister (using the redirector) Site Creation And Management Client Computer Site Management (Web Browser) Update Redirector Webx.cern.ch X=1,2,… n Web Site Database WebRedirect site (hosted on web0) Web Redirector web.cern.ch subdomain
Site Naming • The shared listbox/web naming convention is proposed when sites or aliases are created • http://cern.ch/Web/AuthoringDoc/Addressing/NamingRules.htm • Site owners are allowed to register sites that do not follow the convention • A moderator entity is notified of all official sites that are created • The moderator has the necessary tools to block a site that he judges having an improper name or description or content • Site creation, moderator notification and site blockage happens all in real time
Agenda • Review of the Web Namespace • Review of the Central Web Services (Basic) • Review of the Central Web Services (Advanced) • Conclusions and Questions
Support for Electronic Forms • All central web sites are form ready • Users can create Electronic forms themselves • The user can have form results • sent to a user-written form handler • (sent to a generic form handler) • saved to a file on the web server* • in various formats including HTML and comma separated • sent by E-mail to him or to any address he specifies • saved to a database* (see next page) • used to generate dynamic queries to databases* *: Non-FrontPage users will require small programming skills to use this service
Support for Databases • All central web pages can be connected to databases • Reduced programmatic effort from user written scripts • Databases can be “Access” files stored in the Web site • Databases can be remote Oracle databases on the network • Read and Write access • FrontPage users have all these services without writing code* *: Support for non programmatic database access is supported only to FrontPage 2000 users
Searching and Indexing • Two independent methods: client-side (infoseek) and server side (web server) • The Infoseek server indexes all ‘http-reachable’ documents. • Ideal for CERN global searches • It takes 3-4 days before a page gets revisited, “What’s new” requests miss recent documents • Server-side indexing service • Instantaneous index update • Very fast queries for subsearches • No CERN-Wide search possible • The Infoseek and the server-side indexing are complementary services
Recommended solution Access Control & Security • Web Site Owners can manage access to their pages stored in the central Web servers • IP addresses restrictions • example: Pages accessible only within cern.ch) • Access Control List • Authorized users must be registered in account database (CCDB) • Will soon benefit form the new centralized authentication service (Kerberos 5) • Custom authentication mechanisms available • login forms and name username/password pairs to authenticate users unknown in CCDB • Requires little programming skills. Fully documented • They can password-protect a document • Everybody uses the same password to access the document
Secure connections and SSL • All sites on the central servers supports SSL and have a world-wide recognized server certificate signed by Verisign • Installation of the certificate root not necessary on the client • (this may change as CERN could soon become a certification authority) • Every Web page on the central Web server* can be accessed using HTTP *or* HTTPS *: SSL support is limited to the central servers and is not available to sites on AFS or NOVELL
CSS and Themes • Cascading Style Sheets are supported for both FrontPage and Dreamweaver authors • FrontPage Themes are supported for FrontPage users • Similar functionality but does not require a CSS-compatible browser • There are conversion tools between the two • We have now the necessary tools to • Separate content from look and feel • Make available Corporate / Divisional look and feel policies
Source Control • Every Site hosted on the central server can activate source control for sites authored by many persons simultaneously* *: Source control is a feature provided by the access protocol and it does not exist for the FTP protocol. It is available only through the Office Server Extension protocol or using Distributed Authoring and Versioning (DAV).For this reason, this feature is available to Office/FrontPage users only
CGI-Interface and Scripting* • Every site on the central servers has a cgi-bin directory • Standard CGI-Interface • Executables stored in that directory are executed when requested via http. Parameters posted by the requestor are available as environment variables • Within this context, the executable is run with no write privileges on the server in a separate address space (as a fork) • Scripting • Two interpreters: *.PL (Perl scripts) and *.ASP (Active Server Pages) • For ASP, there are three possible scripting languages: vbscript, jscript and perlscript *: CGI and scripting support is not available to sites on NOVELL AFS sites supports scripting only using the Perl interpreter
Accounting and Statistics • We are storing usage statistics for all sites hosted on the central servers • We have plan to offer a service where Web Owners can consult accounting data for their web sites • The service is already available for sites stored in the Sun cluster (currently named as web1.cern.ch)
Authoring tools for Web Sites • Microsoft Frontpage • Built in Forms and Database Access services • Site management includes Access control, sub-webs, and Source control • Integrated in the Microsoft Office suite, serious advantage for the occasional web author • Macromedia Dreamweaver • High-end HTML editor for the web professionals • Programming skills required to implement Forms, Database connections • Access Control not possible within the program
Web Servers versus File Systems • Web Servers are a real cross-platform alternative to file servers • No need for platform specific solutions (AFS, Appleshare, DFS, IFS, NFS, Netbios, Novell, SMB, …) • This does not mean that there is no future for file systems, but only that file system can remain local to the platform • Web Sites as project space • A Web site can be used for a world wide file system • End-users can create read-protected web sites with unlimited number of authors • This will give them the possibility of getting at their files from any internet access point in the world
Web sites as home directories ? • Web Sites as home directories • Users can create read-protected web sites with only one author • This gives them the possibility of getting at their files from any web browser worldwide • The personal web site can play the role of the current home directory • We are seriously investigating to offer full WebDav http access to the WIN/DFS file space • Web Storage can be the file system of the future
Agenda • Review of the Web Namespace • Review of the Central Web Services (Basic) • Review of the Central Web Services (Advanced) • Conclusions, Questions and Discussion
Before ending • Supported Authoring tools • Microsoft Office (Windows + Macintosh only) • FrontPage / Word / PowerPoint / Access • FrontPage is the recommended tool for beginners • Macromedia Dreamweaver (Windows + Macintosh only) • Netscape Composer (All platforms) • Getting at your web files • Internet Explorer 5 (Win + Mac + HP-UX + Solaris only) • Any good FTP client • Note • FrontPage and Dreamweaver are supported by the Web services only when the web site is hosted on the “Central Web Servers”. Sites stored on locally managed servers or on AFS or on NOVELL are not supported for FrontPage and Dreamweaver authoring
Accessing your site - Summary HTTP (get) File System (not recommended) HTTP Read/Write (recommended) FTP (recommended)
Conclusion • An important set of pending requirements for the web services has been addressed with the new architecture • The new architecture in place allows us to move forward and open a wide set of new services which will have several implications on the ‘way of working’