Walkthrough Data Manipulation using Sparklyr (TCH)

Event Information

Share this event

Date and Time

Location

Location

ICT Suite 1

Office for National Statistics

Titchfield

PO15 5RR

United Kingdom

View Map

Event description

Description


This is an ONS only event

This course aims to introduce the CDSW environment and give an overview of working with Sparklyr within DAP.

During the session we will focus on using Spark's DataFrame API to perform a number of common data manipulation tasks as we walk through the analysis of an example dataset. The course will include several short exercises to build familiarity, and also leave attendees with some more in-depth exercises to explore themselves following the course.

As Spark and distributed computing are themselves broad and complex topics, this course has been designed to be a more targeted deep dive to give participants an initial taste of working with CDSW and what the sparklyr library can do. We aim to provide participants with enough understanding and experience to get going, and provide them with pointers on how to find help and further their own understanding going forwards after the course.

Prerequisites

We will be focusing on using R’s sparklyr library in the this course. And so to get the most out of it, participants should have at least a basic familiarity with R’s syntax.

In addition we will touch briefly on the following areas, and while not essential, any prior experience would be beneficial.

• SQL

• The dplyr R library

Learning Outcomes

* Familiarity with the CDSW environment.

* How to access data on HDFS with Spark.

* Basic data manipulations with sparklyr’s DataFrame API.


Examples include:

o Reading and writing data.

o Column creation / renaming / dropping.

o Selecting by column and filtering rows.

o Handling Missing Values

o Group by operations and aggregations.

o Joining DataFrames

o Using SQL with Spark


IMPORTANT NOTE -

If you have any accessibility requirements or queries regarding this event please contact vicky.pickering@ons.gov.uk.

Registration for tickets will close four days in advance of the session.. The reason being that we need to put requests around technology in place ready for your attendance. This means that should you be unable to attend you are not able to give your ticket to someone else.

Please make sure that you bring your ONS laptop, BMS token and charger along for the session.

Please be aware that training events with less than 6 people will need to be reorganised, those who have purchased tickets who are subject to this will be contacted directly.



Date and Time

Location

ICT Suite 1

Office for National Statistics

Titchfield

PO15 5RR

United Kingdom

View Map

Save This Event

Event Saved