Running Serverless HPC Workloads on Top of Kubernetes and Jupyter Notebooks

Event Information

Share this event

Date and Time

Location

Location

COM-G12-Main Lewin

Computer Science Department

Regent Court, 211 Portobello St

Sheffield

S1 4DP

United Kingdom

View Map

Event description

Description

As part of RSE@Sheffield, we will be hosting a seminar series for sharing practical issues relating to research software. The next talk will take place on 27th of March at 13:00 in the COM-G12-Main Lewin, Computer Science department (ground floor). The invited speaker is Dr Christopher Woods from the University of Bristol. Christopher's abstract and Bio for the talk are below.

Talk Abstract:

The cloud holds the promise of a new way to perform digital science - interactive, elastically scaling, open data, open compute, and sharing reproducible workflows to collaboratively solve global grand challenge problems. The Research Software Engineering group at Bristol work with most of the major public cloud companies (Amazon, Google, IBM, Microsoft and Oracle) on a range of projects creating everything from elastically scaling slurm clusters, through workflows for Cryo-EM image refinement, to national platforms for tracking UK greenhouse gas emissions. Through this, we’ve recognised that comparing the cloud to on-premise HPC is like comparing a fixed-line telephone to a smartphone. Just as an iPhone is more than just a mobile telephone, so the cloud is more than just an “on-demand cluster”.

To this end, via all of our projects, we have been gradually building Acquire. Acquire provides multi-cloud identity and access management to cloud-based storage and compute services. Acquire is designed to make it easy for HPC jobs to be run interactively within Jupyter notebooks. Jupyter notebooks, deployed on top of kubernetes (k8s), are finding rapid adoption in universities and industry. While k8s can spawn new pods for each notebook session, launching high performance computing (HPC) jobs during dynamic workflows, and then managing access to the resulting output data is complicated. Acquire builds on top of the Fn serverless framework (https://fnproject.io) to deploy individual simulations as Fn functions that are called dynamically from workflows run within Jupyter notebooks. A notebook running on a lightweight k8s cluster can burst HPC workloads via Fn serverless calls to a dynamically provisioned cluster running on a bare metal or VM-based HPC/GPU cloud. Using Fn, we are constructing a distributed identity, access, and accounting layer around dynamically scaling compute resources and globally distributed object stores. This adds security and accountability, thereby making it easy for end users to manage complex multi-cloud workflows. Researchers can control costs by translating billing into units of “simulation” rather than “core hours”, and will be able to publish and share the results via access-controlled DOIs. Altogether, Acquire will help us realise the potential of the cloud as a truly planetary supercomputer. Put more succinctly, Acquire is helping us build the Netflix of simulation.

About the speaker:

Christopher is an EPSRC Research Software Engineering (RSE) Fellow, managing the RSE Group in the Advanced Computing Research Centre at the University of Bristol. Christopher’s started his research career as a computational chemist, developing new methods and software for biomolecular simulation (https://protoms.org, https://siremol.org). This software is now sold and used in the pharmaceutical industry (https://www.cresset-group.com/flare/). Christopher’s aim is to improve the quality and sustainability of research software by raising awareness of the importance of software engineering skills, and advocating the development of sustainable funding pathways and careers for people who develop research software. Christopher is joint-chair of the Research Software Engineering Association (https://rse.ac.uk). He provides software engineering training to researchers across the UK (https://chryswoods.com/main/courses), and regularly provides advice to universities on how to set up and manage successful RSE teams.

From a start in computational chemistry, Christopher’s research now covers software engineering in everything from monitoring greenhouse gases, via manufacturing airplane components, to resolving Cryo-EM images and managing complex biomolecular simulation and data analysis workflows. Most of these projects now require developing and adapting software for the cloud. As such, the Bristol RSE group has a growing international reputation for being at the forefront of cloud software engineering research. Christopher contributes to long-term UK cloud strategy via close working relationships with engineers at many of the public cloud companies, and membership of the EPSRC eInfrastructure Strategic Advisory Team and UKRI eInfrastructure expert group.



FOR GENERAL SOFTWARE ENGINEERING TRAINING FOR RESEARCH PLEASE VISIT THE RSE SHEFFIELD WEBSITE.

Date and Time

Location

COM-G12-Main Lewin

Computer Science Department

Regent Court, 211 Portobello St

Sheffield

S1 4DP

United Kingdom

View Map

Save This Event

Event Saved