580 likes | 1.43k Views
Databases in Internet Applications: Case Studies Anil Nori CTO AserA Inc. Palo Alto USA anori@asera.com. Acknowledgements. Sources for some of the material Oracle Corporation CNN Custome News Excite Cisco. Database Technology Timeline. Simple Data Management.
E N D
Databases in Internet Applications: Case Studies Anil Nori CTO AserA Inc. Palo Alto USA anori@asera.com
Acknowledgements Sources for some of the material • Oracle Corporation • CNN Custome News • Excite • Cisco
Database Technology Timeline Simple Data Management Global Enterprise Management Early 80s Late 80s Early - Mid 90s Late 90s - 21st C EarlyRelational Client-server Relational Enterprise -capable Relational Internet Computing Pre- relational Packaged & Vertical Applications Data Warehouse & Hi-end OLTP Simple OLTP Active Database Middleware (messaging, queues, events) Java, CORBA, Web interfaces Scaleable OLTP, parallel query, partitioning, cluster support, row-level locking, high availability Simple transactions, on-line backup & recovery Support for all types of data, extensibility, objects Stored procedures, triggers
Current State of DBMSs • OLTP applications • Large amounts of data • Simple data, simple queries and updates • Update statement from debit/credit transaction:UPDATE accounts SET abalance = abalance + :deltaWHERE aid = :aid; • Typically update intensive • Large number of concurrent users (transactions) • Data warehousing applications • Large amounts of data • Simple data but complex querying • Typically read intensive • Large number of users
Current State of DBMSs • These applications require: • Large users/transactions • High performance • High availability (7x24 operations) • Scalability • High levels of security • Administrative support • Good utilities
Transaction Processing Larger User Populations Trained Self-Service Network Systems Gigabytes Terabytes Independent Integrated Systems Management Usage Batch Immediate Simple Intelligent Operations Hours Importance Local Global Business-Critical Useful Internet Applications: Challenges Data Warehousing Users Analysts Every Employee Size
E-commerce/Apps Information Management APIs Type Proprietary Open Tabular Heterogeneous Applications Delivery Standalone Integrated Generic Personalized Access Read/write Lots of read-only Content Direct Search Internet Applications: Challenges Site Operation Management Low TCO, Mission Critical Availability Occasional 24X7
Internet Challenges • Availability • Need near 100% availability • Must be easy to manage • Replication, hot standby, foolproof system? • Scalability • Number of users is orders of magnitude higher • Security • Global users • Managing millions of users • Encryption • Performance • Internet user expectations • Speed vs correctness • (e.g. Search engines vs blade/cartridge/extender • Availability vs correctness
Internet Application Architecture: Today Client Tier authoring Browser Browser tools etc. HTTP HTTP Physical Middle Tier WEB/APP Server Data Integration, Storage, Query, Management Middle Tier Application Application messages Remote messages Gateways Data Sources Other OLE/DB ORDBMS Data Data source Sources
Case Studies • CNN Custom News • Excite • Cisco Internet Applications
CNN Custom News • On-line news service • Allows users to customize news in a personalized manner • Offers variety of news items (e.g. national, international, business etc.)
Application Server Application Server Application Server Oracle DBMS Oracle DBMS Custom News Application Architecture Client Tier Browser Browser HTTP Hardware Load Balancing Physical Middle Tier WEB Server WEB Server WEB Server ... Database Tier OPS
CNN Custom News • Backend: • SUN SOLARIS enterprise servers • Oracle Parallel Server 7.3.4 • Middle-Tier (9 Machines) • Web Servers • Oracle Application Servers • PL/SQL Cartridges • Load Balancing • Harware based • DNS router • Round -robin
Cartridge Cartridge Cartridge Oracle Application Server Adapter CORBA Backend
CNN Custom News • Data feeds into the database • Keeps text in the database • Images in files • Images accessed in the middle-tier • PL/SQL Cartridge
PL/SQL Cartridge PL/SQL Cartridge Connection pooling Session Caching Parameter Marshalling Validation Result Processing OAS Oracle DBMS PL/SQL
PL/SQL • Server-side • Used to generate HTML • Suited for database logic
Searching • Uses Oracle ConText cartridge • Content-based searching • Uses bitmap indexes
CNN Custom News: Observations • Database-centric • Uses PL/SQL based scripting • Application Server for scalability
Excite • Personalized online service that gives Web users everything they want, all in one place • Builds tools that manage vast amounts of information available on the internet • Provides variety of user services (apps): • News • Money and Investing -- stock quotes • Message boards and Chat • Mail • Communities • Classifieds • Jobs
Excite • Supports suite of applications • Each application uses three-tier architecture • Federated approach • Many databases • Databases specific to applications • Application logic in the middle-tier as multi-threaded embedded C programs (pro*c programs)
Middle Tier Application Middle Tier Application Middle Tier Application Oracle DBMS Excite: An Application Architecture Client Tier Browser Browser HTTP HTTP Physical Middle Tier WEB Server WEB Server Database Tier
Excite - PFP Application • Personalized front page application • Application is deployed as 50 middle-tier daemon processes • The middle-tier application daemons perform: • Application logic in C • Connection pooling • Each daemons keeps about 40 connections to the database (about 2000 total connections to the database) • Load balancing
Excite - PFP Database Configuration • Oracle8 on SUN solaris server • 2 SUN 6500s -- 28 way SMP • PFP database is split into multiple databases for load balancing and scalability • Scalar data stored in the database in relational tables • About 20 tables for storing user profiles; 100 tables for content
Excite - PFP Database Configuration • Multi-media content (e.g. Stock quotes or news item) stored in memory mapped files for fast access. File references stored in the database • Lot of the content is read-only; need not be backed up; can be reconstructed from the original sources
Excite - Scalability • By partitioning the application across multiple databases • Each application partition supported by multiple middle-tier daemon processes • Multiple web servers to reduce traffic congestion
Excite - Availability • Using replication and hot standby • Uses oracle8 hot standby feature • Uses asynchronous replication. Data replicated at 10 sec latency • Almost every database is replicated for failover • Replication preferred over hot standby. Hot standby cannot be used for normal usage
Excite - Other Applications • Most of the Excite applications have similar three-tier architecture
Excite - Observations • Some content (specially, for communities applications) could be stored in the database. Management benefits attractive. If content stored in the database, access performance is very critical • Need fast replication • Currently not using middle-tier caching. Caching could be quite useful but coherency is an issue
Cisco • Successfully implemented applications for the internet • Internet commerce • Order placement • Checking order status • On-line, guided product configuration • Price quotes • Employee self-service • Provides all employee services electronically • Employee directories • Employee benefits • Expense reports
Cisco • Supply chain management • Networked suppliers, resellers and customers • Enables business partners to manage and operate major portions of its supply chain • Entire supply chain works off one central demand forecast • Customer care • Exchange of technical information • Software upgrades (90% of software upgrades via internet) • On-line support ( 70% of support on-line) • On-line, assisted trouble-shooting
Cisco • Communications and collaboration • Sales and technical training • Virtual classrooms • Company-wide meetings and broadcasts
Commerce Server Oracle DBMS Cisco Commerce Server Architecture Client Tier Browser Browser HTTP HTTP Physical Middle Tier WEB Server Oracle DBMS Database Tier Oracle Applications
Cisco Commerce Server • Typical three-tier architecture • Proprietary web server • Performs content aggregation • Encryption • Accesses oracle DBMS • Runs on a dedicated SUN server • Proprietary commerce server • Proprietary application server • Performs variety of commerce functions
Cisco Commerce Server • Scalability and availability • Big servers for scalability • Multiple commerce server processes for load balancing • Databases replicated • Hot standby for availability
Case Studies: Observations • Database is being used mostly for storage • Application in the middle-tier • Middle-tier also provides: • scalability • load balancing • large number of users
Analyzing Internet Applications • Web integration • Web publishing • Application integration • E-commerce
WEB Integration • Heterogeneous data sources • Heterogeneous data types • 1000s of data sources • Dynamic data • Warehousing
Web Publishing • Problem: internet placing new requirements on content management • Heterogeneity: access different types of content from browsers e.g. Email, data warehouses, reports, HTML files • Personalized: structured, dynamic, customized content • Transactive: content blending with application • Aggregation: portalization via major “gateways”
Application Integration • Integrating Multiple Applications (e.g. ERP/Front Office) • Application workflow specification • Asynchronous communication • Queuing and propagation • Message tracking • Message warehouse (persistence) • Message broker/server • Data transformation • Transforming messages to different application formats (e.g. SAP, CLARIFY, …I
Electronics Commerce • Automating business-to-business, business-to-consumer interactions • Selling and buying • Order management • Product catalogs • Product configuration • Sales and marketing • Education and training • Service • Communities
Database Technology Uses • Business/workflow transactions • Support across multiple database/ERP systems • Transactional • Tools to generate compensating actions • Transformations • Queuing • Support for heterogeneous messages • Transactional • Querying, e.g. On attribute, value pairs • Indexing, e.g. On attribute, value pairs • Publish/subscribe
Database Technology Uses • Rule engines • Complex business processing rules • Customization/profiling rules • Business domain rules • Presentation rules • Repositories for Application Development • Managing Java objects, interfaces, etc. • Must for application integration • Standardized object models and protocols • Directories vs repositories
Database Technology Uses • XML support • XML schema/storage • XML caching • XML querying • Coexistence with SQL -- current efforts seem disjoint • Multiple caches • Consistency of middle-tier and database caches • Data mining • Algorithms need to become more pragmatic
Database Technology Uses • Internet user expectations • Speed vs correctness • (e.g. Search engines vs blade/cartridge/extender) • Availability vs correctness • Component Architecture • Caching • XML support • Querying • Transactions • Rule engines • Metadata management • Queueing
Database Technology Uses • Availability • Need near 100% availability • Must be easy to manage • Replication, hot standby, foolproof system? • Scalability • Number of users is orders of magnitude higher • Security • Global users • Managing millions of users • Encryption • Performance
XML documents on the Web Internet Applications Architecture: Future Client Tier XML enabled tools: Browser Browser authoring tools etc. XML XML Logical Middle Tier WEB/APP Server XML enabled Application Messages XML Integration & Query Server; XML Database Warehouse Server XML XML XML XML Transformer & Gateway Data Sources XML enabled Other documents OLE/DB ORDBMS on the Web Data source e.g. HTML, WORD
XML in the Database • XML has the potential to impact four important markets • Web integration • Web publishing • Application integration • Electronic commerce Xml-enable the DBMS
Xml-enabled DBMS • “Xml-enable” the database system • Store XML data/documents the database server • Querying and searching of structured and unstructured XML • In generate XML data from the database server • Add XML capabilities in supporting database facilities DBMS Integrate with other facilities Generate XML Store XML
Store XML Data • Enhance XML storage facilities in the database with support in utilities • Facilities to load XML data into the database • Provide more efficient database storage (componentized storage, compression, indexing,…) • XML export facilities from the server