1 / 8

Building an intelligent big data app in 30 minutes

Learn to design & deploy intelligent big data apps easily in this detailed session from Strata Barcelona. Explore architecture, security, APIs, SQL, analytics, and more. Discover advanced tools like Mesos, Tachyon, Cassandra, and Spark for efficient data workflows. Get hands-on experience with data pipelines and model generation to create powerful applications. Check out Atigeo's open-source contributions and innovative approaches for building scalable apps. Visit xpatterns.com for insightful tech blogs.

rodriguezf
Download Presentation

Building an intelligent big data app in 30 minutes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building an intelligent big data app in 30 minutes Strata Barcelona Nov 2014

  2. Agenda • Demo: FindFraud application • Demo: data pipeline, apis • Architecture • BDAS++

  3. Infrastructure Design Principles • Build, run & monitor end-to-end workflows • Same code for batch and streaming modes • Abstract away direct access to infrastructure • Unified REST API’s for cross-cutting concerns Whole System • Unified Security • Unified Config. • Spark SQL REST API • Monitoring & Logging • Data Workflows • Ingestion Starting Point • Mesos • ZooKeeper • Tachyon • Cassandra • Spark • Hadoop

  4. Analytics Design Principles • Declarative feature & model generation • Same code for local, distributed & serving layers • Experimentation & model optimization is key • Provide a very broad algorithmic toolbox Whole System • Proprietary Modeling Algorithms • Publish Models as REST API’s • Distributed Featurization & Training • Online Learning REST API’s • Visualized Metrics • Feature Engineering Framework Starting Point • Scikit-learn, pybrain, nltk, … • Pandas • Hive, Shark, Spark SQL • IPython

  5. Application Design Principles • Build app UI against REST API’s from Day One • Separate servers for analytics & app end users • Abstract away direct access to infrastructure Whole System • Distributed Feature Generation • Distributed Training • Visualized Metrics • Publish Models as REST API’s • Modeling Workflows • Feature Engineering Framework Starting Point • PostgreSQL • Lucene & SolrCloud • TomCat,Spray.io, Node.js • Cassandra

  6. Let’s Build Something

  7. Open Source Contribs • Jaws, http spark sqlrest service • http://github.com/Atigeo/http-spark-sql-server • Backward compatible with Shark and Spark 0.x stack • Spark Job Server • multiple Spark contexts in same JVM, job submission in Java + Scala https://github.com/Atigeo/spark-job-rest • Mesos framework starvation bug • submitted patch… detailed Tech Blog link soon at http://xpatterns.com • Tachyon patch (https://github.com/amplab/tachyon/pull/482)

  8. Thank you! @atigeo • linkedin.com/company/atigeo • Blog: • xpatterns.com

More Related