Spark Development Bootcamp

Spark Development Bootcamp

By Big Data Partnership

Date and time

Mon, 8 Feb 2016 09:00 - Wed, 10 Feb 2016 17:00 GMT

Location

Imparando

56 Commercial Road London E1 1LP United Kingdom

Refund Policy

Contact the organiser to request a refund.

Description



Spark Development Bootcamp


Overview

On this course you will learn how to build and manage Spark applications using Spark's core programming APIs and its standard Libraries. Spark is a unified framework for big data analytics. It provides one integrated API for use by developers, data scientists, and analysts to perform diverse tasks that would have previously required separate processing engines such as batch analytics, stream processing and statistical modeling. Spark supports a wide range of popular languages including Python, R, Scala, SQL, and Java. Spark can read from diverse data sources and scale to thousands of nodes.


This training course is the first of its kind, developed by Databricks with close guidance by the original Spark team from UC Berkeley. You will receive a free Databricks account for the duration of the training.


Duration

3 days


Who is the course for

Engineers, Data Scientists, and Analysts


Prerequisites

Prerequisites for this course:

  • Basic understanding of software development

  • Some familiarity with coding in Python, Java, SQL or Scala

  • Modern operating system (Windows, OS X, Linux), browser (Internet Explorer not supported), and Internet access


What you will learn

  • Build a data pipeline using Spark DataFrames and Spark SQL

  • Understand Spark concepts, architecture, and applications

  • How to execute SQL queries on large scale data using Spark

  • Explore and visualize your data by entering and running code in Notebooks

  • Train, and use an ML model on real data with Spark's Machine Learning library MLlib

  • Tune Spark job performance and troubleshoot errors using logs and administration UIs

  • Find answers to common questions using Spark documentation and discussion forums

  • Write and monitor a Spark Streaming job to analyze data with sub-second latency

  • Understand common use-cases and business applications of Spark

  • Recognize all of the topics tested by the Spark Developer Certification and know what further work is required to prepare to take and pass the exam


Course Outline

Day 1

  • History of Big Data & Apache Spark

  • Introduction to the Spark Shell and the training environment

  • Just enough Scala for Spark

  • Intro to Spark DataFrames and Spark SQL

  • Introduction to RDDs

    • Lazy Evaluation

    • Transformations and Actions

    • Caching

    • Using the Spark UIs

Day 2

  • Data Sources: reading from Parquet, S3, Cassandra, HDFS, and your local file system

  • Spark's Architecture

  • Programming with Accumulators and Broadcast variables

  • Debugging and tuning Spark jobs using Spark's admin UIs

  • Memory & Persistence

Day 3

  • Advanced programming with RDDs (understanding the shuffle phase, partitioning, etc.)

  • Visualization: matplotlib, gg_plot, dashboards, exploration and visualization in notebooks

  • Introduction to Spark Streaming

  • Introduction to MLlib and GraphX


Hands-on Labs

  • Introduction to the Databricks notebook and running Spark exercises in the notebook.

  • Manipulation on RDDs

  • Transformations and actions

  • Using accumulators and broadcast variables

  • Using Spark SQL and Dataframes to query and transform data from multiple sources.

  • Building predictive models with Spark.

  • Write Spark Streaming job.


Format

50% Lecture/Discussion

50% Hands-on Labs



Refreshments

Teas, coffees & water provided. Lunch is not included however there is a wide range of cafes and amenities in the local area.

Cancellation & Reschedule Policy

You must provide a written notice to Big Data Partnership at least 2 weeks' prior to the start of the class if you cannot attend this class. Big Data Partnership will transfer your registration to a future class of equal or lesser value.

Students who fail to cancel within 2 weeks' and/or do not attend the class, will not receive a refund and will be charged the full amount.

Big Data Partnership can cancel or reschedule at any time at our discretion. In the event that the class is cancelled or rescheduled, we will work with you to apply your registration to another date or refund your fee in full. Big Data Partnership is not responsible for non-refundable travel or other expenses incurrred by the student.


Contact Information

If you have any questions concerning this class, please do not hesitate to contact training@bigdatapartnership.com.

Sales Ended