140 likes | 328 Views
Data Inglorious. Atlas: “All this data sure is heavy.”. Data: “Indeed, may I suggest moving it to the cloud.”. d atabase defined. A database is a collection of data, which is organized into files called tables. These tables provide a systematic way of accessing, managing, and updating data.
E N D
Data Inglorious Atlas: “All this data sure is heavy.” Data: “Indeed, may I suggest moving it to the cloud.”
database defined • A database is a collection of data, which is organized into files called tables. • These tables provide a systematic way of accessing, managing, and updating data. • A relational database is one that contains multiple tables of data that relate to each other through special key fields. • Relational databases are far more flexible (though harder to design and maintain) than what are known as flat file databases, which contain a single table of data.
overview, the payload • Oracle Internet Directory, (OID) • Zynga Games/Farmville • Facebook • bioinformatics • Calmail
ex. oracle OID • Oracle Internet Directory: 400,000 operations per second on a 500 million user database
ex. zynga games • 65 million players a day, millions of web browsers open, millions of farms (Farmville game), millions of frontiers, millions of objects bought and sold…all recorded on a database • 500,000 operations-per-second database behind Farmville • http://www.readwriteweb.com/cloud/2010/08/membase-the-database-powering.php
ex. facebook • 60,000 servers • 1,800 MySQL servers, • 400 million active users, • 200 million a day • 50 million operations per second
ex. bioinformatics • DNA sequence data = prime candidate for study with database systems, • Homologous strings • Nucleic acids: Adenine, Guanine, Cytosine, Thymine • 3.4 million base pairs in the human genome, expressed as a string of AGC and T • Human Genome Project : 3.4 billion letters of the human genome, Sanger Institute: 1 billion on MySQL
ex. calmail • Calmail: 4 million e-mails offered a day, 1 million served, MySQL backend, that just failed
flat file v. relational • Imagine the needs of two small companies that take customer orders for their products. Company A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables: orders and customers. • When a customer places an order with Company A, a new record (or row) in the table orders is created. Because Company A has only one table of data, all the information pertaining to that order must be put into a single record. This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as product description, quantity, and price. If customers place more than one order, their general information will need to be re-entered and thus duplicated for each order they place. • Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's change of address would require the database manager to find all records in orders that the customer placed, and change the address data for each one. • Company B is much better off with its relational database. Each of its customers has one and only one record of general information stored in the table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from Company B, the record in orders need contain only a reference to the customer's code, because all of the customer's general information is already stored in customers. • This approach to entering data solves the problems of duplicate data and making changes to customer information. The database manager need change only one record in customers if someone changes addresses. • This is document ahrp in domain all.Last modified on April 24, 2006. • Indiana University, Knowledge Base http://kb.iu.edu/data/ahrp.html
flat file v. relational • Single table (flat file) v multiple tables (relational)
web Connection • Example: Plone Content Management System connection to a MySQL database
go graphic, phpMyAdmin • A graphic interface tool for working with MySQL
phpMyAdmin • GSPP and phpMyAdmin • localhost
other database systems • Hadoop: distributed processing of large data sets • http://code.zynga.com/2011/06/deciding-how-to-store-billions-of-rows-per-day/ • Membase: new for games and other apps • http://www.readwriteweb.com/cloud/2010/08/membase-the-database-powering.php • CouchDB: no schema • http://couchdb.apache.org/docs/intro.html