910 likes | 2.23k Views
A introduction to Databricks, what is it and how does it work ? What can it do ?
E N D
Databricks • What is Databricks ? • Cloud services used • Functionality • Languages • Spark Usage • 3rd Party Apps • Architecture • Books www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – What is it ? • A Cloud based Apache Spark cluster service • Offers scalable Spark clusters based on AWS • Developed by the same people who created Spark • Multiple cluster management • Job scheduling and library import • Offers access to all Spark modules www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – Cloud Services • Currently uses Amazon AWS • Uses EC2 and has access to S3 buckets • Uses a minimum of 2 EC2 instances • Attempts to optimise EC2 usage • Plans to extend to other cloud providers www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – Functionality • Architecture based on Notebooks and folders • Has a cluster manager for • Defined (min 54gb) clusters • Spot clusters • On Demand clusters • Has a job manager and scheduler • Has user management • Has full Spark functionality • Has strong data visualisation capability • Can export reports and dashboards www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – Languages • Can have Notebooks in • Scala • Python • SQL • SQL can be executed in non SQL Notebooks • Markdown comments can be placed in Notebooks • Notebooks can be shared by multiple sessions • Libraries can be imported and called in Notebooks www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – Spark Usage • Lastest Spark version available • i.e. DB 1.3.4 uses Spark 1.3.1 at June 2015 • All Spark modules available • SQL, GraphX, MlLib, Streaming • Strong integration between modules and visualisation • Extensive use of tables to import data • Tables available via SQL www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – 3rd Party Apps • Current available and more to come • Pentaho • Qlik • Tableau • TIBC Jaspersoft • PanTera • ZoomData www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Databricks – Architecture www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Available Books • See our Hadoop book from Apress / Springer • “Big Data Made Easy” • Look out for our Apache Spark based book • from Packt in 2015 www.semtech-solutions.co.nz info@semtech-solutions.co.nz
Contact Us • Feel free to contact us at • www.semtech-solutions.co.nz • info@semtech-solutions.co.nz • We offer IT project consultancy • We are happy to hear about your problems • You can just pay for those hours that you need • To solve your problems