1 / 38

MySQL 5.1 for Advanced Business Intelligence Applications

Learn how to exploit MySQL 5.1 for advanced business intelligence applications, including data integration, partitioning, clustering, and metadata.

hickerson
Download Presentation

MySQL 5.1 for Advanced Business Intelligence Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting MySQL 5.1 For Advanced Business Intelligence Applications Matt Casters: Chief Architect, Data Integration and Kettle Project Founder MySQL User Conference, Tuesday April 24, 2007

  2. Agenda • Great news • Pentaho Data Integration introduction • Pentaho Data Integration MySQL support • Table partitioning • Database partitioning • Clustering • Metadata to the rescue • Pentaho company overview & platform demo • Questions & closing

  3. Great news • The San Francisco Giants beat the Arizona Diamondbacks last Saturday • Barry Zito pitched a great game • The hot-dogs where great

  4. Pentaho news • Mondrian 2.3. was released a few weeks ago • Detailed cache control (clearing) • Sun Microsystems & Pentaho announced that Pentaho Reporting will be used to drive Business Intelligence requirements in Open Office 2.3: to be released in the second half of this year. • Tomorrow we’ll do another one.

  5. Pentaho Data Integration introduction • From source systems … • to the data warehouse … • to reports … • to analyses … • to dashboard reports … • to better information

  6. Data Transformation and Integration Examples • Data filtering • Is not null, greater than, less than, includes • Field manipulation • Trimming, padding, upper and lowercase conversion • Data calculations • + - X / , average, absolute value, arctangent, natural logarithm • Date manipulation • First day of month, Last day of month, add months, week of year, day of year • Data type conversion • String to number, number to string, date to number • Merging fields & splitting fields • Looking up date • Look up in a database, in a text file, an excel sheet, …

  7. Pentaho Data Integration (Kettle) Components • Spoon • Connect to data sources • Define transformation rules and design target schema(s) • Graphical job execution workflow engine for defining multi-stage and conditional transformation jobs • Pan • Command-line execution of single, pre-defined transformation jobs • Kitchen • Scheduler for multi-stage jobs • Carte • Remote execution “slave” server with embedded web server • Pentaho BI Platform • Integrated scheduling of transformations or jobs • Ability to call real-time transformations and use output in reports and dashboards

  8. Data Transformation and Integration Examples • Demo time : load 1M rows into a database table

  9. Case Study: Pentaho Data Integration • Organization: Flemish Government Traffic Centre • Use case: Monitoring the state of the road network • Application requirement: Integrate minute-by-minute data from 570 highway locations for analysis (1550 lanes) • Technical challenges: Large volume of data, more than 2.5 billion rows • Business Usage: Users can now compare traffic speeds based on weather conditions, time of day, date, season • Best practices: • Clearly understand business user requirements first • There are often multiple ways to solve data integration problems, so consider the long-term need when choosing the right way

  10. Summary and Resources • Pentaho and MySQL can address help you manage your data infrastructure • Extraction, Transformation and Loading for Data Warehousing and Data Migration • kettle.pentaho.org • Kettle project homepage • kettle.javaforge.com • Kettle community website: forum, source, documentation, tech tips, samples, … • www.pentaho.org/download/ • All Pentaho modules, pre-configured with sample data • Developer forums, documentation • Ventana Research Open Source BI Survey • www.mysql.com • White paper - http://dev.mysql.com/tech-resources/articles/mysql_5.0_pentaho.html • Kettle Webinar - http://www.mysql.com/news-and-events/on-demand-webinars/pentaho-2006-09-19.php • Roland Bouman blog on Pentaho Data Integration and MySQL • http://rpbouman.blogspot.com/2006/06/pentaho-data-integration-kettle-turns.html

  11. Pentaho Data Integration MySQL support • Reading: MySQL Result Streaming (cursor emulation) support • Writing: MySQL dialects for data types • Job entry: Bulk Loader of text files for MySQL • Job entry: Bulk writer to a text file for MySQL

  12. Table partitioning • Challenges • More data is being gathered all the time • Data is coming from more sources than ever • Faster access to stored information is becoming more important • More people require concurrent access to the data • The old solution • Split up the data into different tables • SALES_2003 • SALES_2004 • … • SALES_2006 • Hardcode reports to use one or the other table • Lots of database management issues • High report maintenance costs • Nothing is automatic, everything is manual

  13. Table partitioning • The new solution: • MySQL 5.1. partitioned tables • Automatic split up of data into smaller chunks • SQL engine that can automatically pull results back together

  14. Year 2003 Partition Year 2004 Partition Year 2005 Partition Year 2006 Partition Table partitioning Sales table Sales table

  15. Table partitioning • Advantages • Reduces query time by reducing the amount of data to “plough” through. • Increases performance by • “Pruning” the list of partitions to search for automatically. This is done using the MySQL query optimizer based on the query that’s being issued. • Massive reduction in I/O • Smaller partitioned indexes leading to faster index tree traversal • Allowing parallel access to the different partitions

  16. Table partitioning • How do I create a partitioned table? CREATE TABLE sales ( sales_date TIMESTAMP, ordernr INT, customernr INT, productnr INT, discount_pct TINYINT ) PARTITION BY RANGE( YEAR(sales_date) ) ( PARTITION P2000 VALUES LESS THAN (2001), PARTITION P2001 VALUES LESS THAN (2002) )

  17. Table partitioning • How do I add another partition to the table? ALTER TABLE sales ADD PARTITION ( PARTITION P2002 VALUES LESS THAN (2003) )

  18. Table partitioning • How do I drop a partition from the table? ALTER TABLE sales DROP PARTITION P2002 ;

  19. Table partitioning • Retrieving partition information mysql> desc information_schema.partitions; +-------------------------------+--------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------------------------------+--------------+------+-----+---------+-------+ | TABLE_CATALOG | varchar(512) | YES | | NULL | | | TABLE_SCHEMA | varchar(64) | NO | | | | | TABLE_NAME | varchar(64) | NO | | | | | PARTITION_NAME | varchar(64) | YES | | NULL | | | SUBPARTITION_NAME | varchar(64) | YES | | NULL | | | PARTITION_ORDINAL_POSITION | bigint(21) | YES | | NULL | | | SUBPARTITION_ORDINAL_POSITION | bigint(21) | YES | | NULL | | | PARTITION_METHOD | varchar(12) | YES | | NULL | | | SUBPARTITION_METHOD | varchar(12) | YES | | NULL | | | PARTITION_EXPRESSION | longtext | YES | | NULL | | | SUBPARTITION_EXPRESSION | longtext | YES | | NULL | | | PARTITION_DESCRIPTION | longtext | YES | | NULL | | | TABLE_ROWS | bigint(21) | NO | | 0 | | | AVG_ROW_LENGTH | bigint(21) | NO | | 0 | | | DATA_LENGTH | bigint(21) | NO | | 0 | | | MAX_DATA_LENGTH | bigint(21) | YES | | NULL | | | INDEX_LENGTH | bigint(21) | NO | | 0 | | | DATA_FREE | bigint(21) | NO | | 0 | | | CREATE_TIME | datetime | YES | | NULL | | | UPDATE_TIME | datetime | YES | | NULL | | | CHECK_TIME | datetime | YES | | NULL | | | CHECKSUM | bigint(21) | YES | | NULL | | | PARTITION_COMMENT | varchar(80) | NO | | | | | NODEGROUP | varchar(12) | NO | | | | | TABLESPACE_NAME | varchar(64) | YES | | NULL | | +-------------------------------+--------------+------+-----+---------+-------+

  20. Table partitioning • Retrieving partition information SELECT partition_name FROM information_schema.partitions WHERE table_name = ’sales’ ; +----------------+ | partition_name | +----------------+ | P2000 | | P2001 | | P2002 | | P2003 | | P2004 | | P2005 | | P2006 | | P2007 | | P2008 | | P2009 | | P2010 | | P2011 | +----------------+

  21. Table partitioning • Use-case: Flemish Traffic Center • Medium sized data set: 514M rows • Partitioned by week • PARTITION BY RANGE ( YEAR(mtime)*100 + WEEK(mtime) ) • 80GB of data in hundreds of partitions • 1 row of data: speed, number of vehicles (5 classes), etc. • Per minute • Per lane of traffic • For 1550 detectors •  815M rows of data per year • A test query aggregating speed & counts per road position per minute • Gave back 642.319 rows • 9 seconds for MySQL to get the result • 29 seconds to pass the data to the client over JDBC

  22. Table partitioning • Should we do table partitioning? Yes! • However… • Even though you can do sub-partitioning, you need to partition the data in one certain way, by range, by list, … • The entry point for optimization is always going to be that partitioning method • Pruning is always going to be based on that partitioning method • Queries across the whole data set don’t always get benefit (neither a penalty) • More detailed information: Partitioning in MySQL 5.1 and Onwards Mikael Ronström, Senior Software Architect, MySQL AB Tomorrow 11:50am – 12:35pm, Ballroom A • How can we improve upon this?

  23. Database partitioning • Challenge: • Dealing with even greater data sets • The need to present results even quicker • The old solution • Large boxes • Huge amounts of RAM • Very fast disk systems (Fibre attached, SAN, …) • Multi-CPU monsters  $$$ Very expensive and complicated systems $$$ • The new solution: • 2 or more MySQL servers • Basic, off the shelf, hardware • Standard software • Simple configurations • Pentaho Data Integration & MySQL 5.1

  24. Sales Sales Sales Sales Sales table Year 2003 Partition 2003 2003 2003 2003 Year 2004 Partition 2004 2004 2004 2004 Year 2005 Partition 2005 2005 2005 2005 2006 2006 2006 2006 Year 2006 Partition Database partitioning DB1 DB2 DB3 DB4

  25. Database partitioning • Split the data over several databases • Pick a splitting mechanism in advance • Divide the load over the databases • Make the databases smaller • Increase the query performance • Increase the load performance

  26. Database partitioning • How do you set it up? • First create a partitioned / clustered database connection • Then create a Partitioning Schema

  27. Database partitioning • Demo time: • Creating partitions partitioned • Loading data partitioned • Reading back data partitioned • Reading back data partitioned and ordered

  28. Table partitioning • Use-case: Flemish Traffic Center • Medium sized data set: 514M rows • Same partitioning as before • Same table layout, engine and configuration • Partitioned over 3 databases on detector (per lane) • A test query aggregating speed & counts per road position per minute • Gave back 642.319 rows • 3 seconds for MySQL to get the result • 10 seconds to pass the data to the client over JDBC •  Demonstrating almost linear scalability!!

  29. Table partitioning • Should we do database partitioning? Yes! • However… • The scalability is limited to the raw CPU power that is available on a single system. • This poses a scalability problem • when dealing with larger numbers of partitions • Reading back large volumes of data  Enters Pentaho Data Integration: Clustering

  30. Sales 2003 DBx 2004 2005 2006 Pentaho Data Integration : Clustering Server X

  31. Pentaho Data Integration : Clustering • The challenge is to eliminate CPU bottlenecks • Pentaho Data Integration “Carte” comes to the rescue • Can be started up on any system turning it into a “Slave Server” • The combination of database partitioning and PDI Clustering: • Will split up the number of database partitions among the slave servers • Will split the total CPU load over the slave servers • Will split the total I/O load over the database partitions (MySQL instances)

  32. Pentaho Data Integration : Clustering • Demo-time • Start up 2 slave servers • Run a step across the 2 servers • Monitor

  33. Pentaho Metadata to the rescue • The problem: • Reporting becomes harder on a database partitioned system • In stead of 1 database you read from a bunch of them • This is where our new project Pentaho Metadata Comes in • Due in a few months • Auto-generates SQL now • Will auto-generate PDI transformations in the near future • Will auto-deliver the data to your report in the platform • Will make it very easy for your users to create drag and drop reports

  34. Pentaho Introduction • World’s most popular enterprise open source BI Suite • 2 million lifetime downloads, averaging 100K / month • Founded in 2004: Pioneer in professional open source BI • Key Projects • JFreeReport Reporting • Kettle Data Integration • Mondrian OLAP • Pentaho BI Platform • Weka Data Mining • Management and Board • Proven BI veterans from Business Objects, Cognos, Hyperion, SAS, Oracle • Open source leaders - Larry Augustin, New Enterprise Associates, Index Ventures • MySQL Gold Partner

  35. Departmental Pentaho Introduction Strategic Scorecards Analysis Aggregates Reports Operational Sales Marketing Inventory Production Financial

  36. Pentaho Introduction • Demo and overview

  37. Questions and Closing ?

  38. Birds of a Feather • MySQL Data Warehousing and BI • Tonight at 7:30 • Camino Real • With: • Chief Geek James Dixon: Senior architect & CTO, Pentaho • Julian Hyde, OLAP Architect and Mondrian Project Founder, Pentaho • Brian Miezejewski, Principal Consultant, MySQL • Roland Bouman, Certification Developer, MySQL • MySELF™ • A few bottles of beer

More Related