1 / 33

NUWeb System

NUWeb System. sw@gais.cs.ccu.edu. WWW Architecture. Web Server (e.g., Apache, IIS) Browser (e.g., IE, Firefox) Addressing and Information Channel (DNS, URL, SearchEngine) Abstract Model: Provider (server), Consumer (client), Channel Client-Server architecture, Centralized Service.

larue
Download Presentation

NUWeb System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NUWeb System sw@gais.cs.ccu.edu

  2. WWW Architecture • Web Server (e.g., Apache, IIS) • Browser (e.g., IE, Firefox) • Addressing and Information Channel (DNS, URL, SearchEngine) • Abstract Model: • Provider (server), Consumer (client), Channel • Client-Server architecture, Centralized Service

  3. Problems of the WWW due to the fundamental design • Naming/Addressing problem: • Physical naming/addressing • Static Binding through DNS • URL may not be a good design, (hard-to-remember) • DNS could be slow • Information flow organization not designed in the first place, • Hotspot bottleneck problem, bandwidth waste problem, • Cache and Proxy tech are added separately afterwards, • Linkrot problem • Dead links, wrong links, faked links, • Approximately up to 15% of links • Need static IP, need to apply for URL, need knowledge in building up and managing Websites • Creating and maintaining a website is costly • Webpage creation is not easy • Divide the computer world into two hierarchies • Server: Website owners, service providers • Client: ordinary users

  4. Weaving the Web(quoted from wikipedia) • In Berners-Lee's book, Weaving the Web, several recurring themes are apparent: • It is just as important to be able to edit the Web as browse it. Wikis are a step in this direction, although Berners-Lee considers them merely a shadow of the WYSIWYG functionality of his first browser. • Computers can be used for background tasks that enable humans to work better in groups. • Every aspect of the Internet should function as a Web, rather than a hierarchy. Notable current exceptions are the Domain Name System and the domain naming rules managed by ICANN. • Computer scientists have a moral responsibility as well as a technical responsibility.

  5. What Is NUWeb? • Marriage of WWW with P2P • Technologically: • NUWeb = WebServer + Browser + WNS + SearchEngine + Proxy/Cache + WebBuilder + Blog + CommunityEngine + KIM + P2P – URL – DNS and – Cost  • Logically: • A New Web System for any net user to build his/her own web in an extremely easy-to-use way. • A platform for web-building, information sharing, information management, community, and service management • A platform for Webilization • A project to pursue Wemocracy

  6. NUWeb Functions • A platform for Public Sharing and Publishing • Personal website/blog • Public community • Search Engine, • A platform for Private Sharing and Community • Personal community builder • Sharing management • A platform for personal information / knowledge management, content engine,

  7. NUWeb Software Architecture • NUWeb system is composed of three subsystems • NUWeb.CC CyberCenter • WNS, (web name service), • Search engine, Cache • Commuity services, (Photo, Blog, Video…) • NUWeb CP (Community Portal) • Community services, (Blog, Photo, Video…) • Search Engine service, • Proxy and Cache • NUWeb PP (Personal Portal) • NUWeb browser, kim, • NUWeb server, • NUWeb personal portal/blog builder

  8. How it works • Personal Web server on Windows platform • Auto indexing, thumbnail, • Auto page generation and run-time rendering • Auto caching, • Bundled with php/perl platform • Registration to WNS in the set up, • Site name, user-account, SiteKey, … • UPNP to handle firewall/NAT, • Packet forwarding Proxy to handle the cases where UPNP does not work correctly.

  9. How it works (2) • Each time a client gets on line, send the current IP and name/key info to the WNS center. • The connection request to a personal site will first send the name of the site to the WNS to get the IP of the target site (dynamic binding) • If the requested site is not online, then the center will redirect the request to the cache server. • If the site is connected through proxy, then connect it through relay proxy.

  10. Naming and Dynamic Addressing • A page is a textual web document. It contains UltraLinks or tags and the display of such page might instantiate the display of some other objects such as included images. • An object is either a richtext document such as pdf, msdoc, msppt, etc., a multimedia file, or any singular file that can be accessed in the web space. • A resource is either a page or an object • GRN, global resource naming • SiteUniqName#objectname[#class#type#location] • fixed IP is not necessary • ABN (AddressByName), ABI (AddressById), ABC(AddressByContent) • USI (UniversalSiteId),

  11. NUWeb CyberCenter • GRI: Global Resource Index • A distributed index structure for objects/pages on the NUWeb space • Use hash data structure • Search engine, Community Service, Portal for NUWeb • Proxy & Caching • Auto backup and versioning • Info filtering, content switching • Packet forwarding, center relay • Relay casting, media streaming • Hierarchical search • Collaborative cache (super cache)

  12. Site Initialization • When a new site is installed: • Register the following info • SiteUniqName, to be interacted by the center • Titles of the site (at most T bytes) • Abstract of the site (at most P bytes) • tags, (if inappropriate, such as infringing others right, will be abolished by the center) • Country/city/county, real world geography info • Profile of personal info • Residents : SUN.resident will identify a user • Decide which directories to be open to public • Decide which directories to be open to private connections • Decide whether to open caching of the public directory

  13. Site Initialization • The server will build an index for the pages/objects that are covered in the site . The index for public and private areas are separated such that the privacy will be secured. • The index is on the name and signature level, plus the content of pages, the support for object content index such as ms-doc files pdf files will be optional • After the site is set up, the user will be asked to provide a list of friends to which the system will send invitation letters.

  14. NUWeb Services • NUSite, NUBlog • NUSearch, NUSM • NUCommunity, NUBBS, • NUBot, NUWatch, NUPush • NUCache, NUProxy • NUPedia, knowledge authoring/manager • NUMail, P2P secure mail system • NUJournal

  15. Searching • The search in the nuweb center includes: • Search pages/objects by name (WNS) • Page content search • * attributed search , for example, search for pages authored by Hamming • The indexer in each nusite will send the raw-index to the center, and the center will build an index . The raw-index is a record containing indexable texts for each page or object. A text extractor will be used to extract text from rich text documents such as MS-DOC/PPT documents. The upload of such raw index will get approval from the users first. • Before rendering the search result to the user, the searcher needs to check whether the result page/object exists at that moment. • It uses the SSN to check the SiteDB and to see whether that site is avalable. It also use grn to check where such resource is available in the cache.

  16. Caching • Caching • Every site page will be automatically cached, unless explicitly disabled • In the first phase, the caching will be done in the center and the NUWeb CP cache spaces. Objects will be cached if accessed • The client will cache it in its cache spool, and an index will be sent to the center to notify the center that it has such object in cache. • In the second phase, the caching will be done by collaborative caching in the p2p space too, assuming that some of the personal sites are willing to participate. • The cache object will be indexed by GRN and MD5 • Note that if an object is modified, it will trigger a update to the global cache space to remove the original cache indexed by GRN • Each cache object will record a timestamp of the content (the time such content is created.)

  17. GRI & Collaborative Proxy • GRI: • Object indexed by MD5-signature & GRN • Home page indexed by GRN • Instance indexed by MD5 • Syntax: • GRN: SUN#OBN • Distributed/Collaborative GRI • Multi-tier Collaborative Proxy

  18. Indices (1) • In the nuweb center, there are several indices: • SiteDB: indexed by SSN • Last live time, access cnt, data size, • When alive, each site will periodically send alive info to the center (every K minutes) • NameDB: indexed using gaisindex • Each name is associated with a SSN by which we can check whether such page/object exists. • Each name will have a record, which will have a SSN value, and a GRN cache flag • In the search result of name db, if a record does not have a online instance (either roiginal site or the cache copy), it will have a flag indicating “not available”

  19. Indices(2) • MD5 index, objects/pages indexed by MD5 signature. Each site will produce MD5 signatures for each object, and the (grn,md5) info will be sent to the center to be indexed.The return of a MD5 lookup is the source SSN/IP or the cache site/s IP • Page/document Content index • Indexed through gais search engine

  20. NUWeb Portal Service • Search engine for the NUWeb cyberspace • Websites, pages, pictures, videos, documents, articles, etc., … • Browsing and Viewing • What’s hot, what’s new, what’s cool, • Automatically generated through page rendering tool based on a CountDB and list manager.

  21. NUWeb DB • NUWeb cache is implemented through NUWeb DB system. • NUWeb DB is to store Web Objects and relationship and provide search function. • Web DB: • ODB, (Object DB) • NDB, (Name DB) • IDB, (Index DB) • TDB, (Term DB) • UDB, (User DB) • SDB, (Site DB) • Page Engine • Access Log DB (PV DB) • Access Control • Query Interface (including SQL) *

  22. Web DB implementation • ODB and NDB is the kernel storage DB • The key technique used in ODB and NDB is the Hash DB which needs to minimize the disk seeks and maximize the memory usage. • PV DB (Access log DB) is implemented on top of ODB and NDB. • Term DB is implemented on top of ODB too. Term DB will record the term frequency, term score … information.

  23. Web DB implementation (2) • Site DB records the site info such as access frequency, size, dynamics, etc. • IDB is a real time index engine for all the objects stored in Web DB. • Access Control: • Authorization: permission list based • Authentication: through an authentication center in WNS server. • SQL is not supported yet, on the todo list.

  24. NUDB • Net User’s DataBase • Easy to use, • No background of database is needed. • No need to program • Define the spec and start to use, • Spec can be adjusted flexibly • Scalable • Combine the advantages of Table processing software such as Excel and Database systems • Portable, computable, mergeable

  25. NUDB implmentation • Physical DB Kernel • Hash DB • Inverted Index • Pattern Matching • Schema Layer, and Query Processing • User Interface Layer • Data Presentation Management • DUA (Database User Agent, 類似 MUA)

  26. NUBlog • AJAX Based Blog System • Personal Blog Home Base • Can have multiple copies in the web • Creation, Management, Posting • Import, Export: • XMLRPC • Robot, simulating Browser behaviour

  27. NUWatch • Personal Web Agent • Event Watch, News Watch • Service Watch, • Site Watch, • Commerce Watch,

  28. NUWatch Implementation • Personal Profile Manager • Matching Platform • On the fly matching • Batch mode matching through searching • Data Source Agents • Per user agent • Centralized agent (can reduce overhead) • Notification Agent • Relay casting to speed up • Gateway to message system

  29. NUCommunity • Personal and Regional Community Engine • Forum, Vote, • Calendar, File Sharing, • Address Book, DB, .. • Interaction mechanism, (auto notification,..) • A community is conceptually a given a NUWeb site • A community is treated like a user in the NUWeb space’s authentication and authorization

  30. Access Control • Support both password-based and membership based protections. • Each directory is associated with a protection data structure • Authentication in WNS server • Use Permission List technique for membership based protection • The protection is a directory base, no inheritance will be assumed.

  31. NUJournal • Why the publication is through paper?! • Traditionally, publication HAD TO BE published through paper in the old age • Journal is both a channel and a barrier  • Most of the papers entered the dead state once published  • A new model of publication • Separate the concept of publication and evaluation • Publication is an autonomous will, and publication can be through own website!, reviewed, commented by readers, or reviewers. • Journal is a marketplace to glue/guide the accesses of publications and to comment and evaluate the publications • A publication can be a long time living object • Other authors can join the published work along the time, if they make substantial contributions to the work. • A publication is evaluated by its contribution and impact.

  32. Thanks!

More Related