Data Engineering Across Secure Analytics Platforms

Data Engineering Across Secure Analytics Platforms

Online event
Thursday, June 4  •  1 PM - 1:30 PM GMT+1
Overview

Explore the practical realities of data engineering across multiple secure data research platforms, focusing on health and population data.

A webinar from the Data Analytics Team at Leeds Institute for Data Analytics.

We will discuss how robust data engineering underpins reproducible, scalable, and trustworthy health data science.

The talk will draw on real examples from working across environments such as LASER and the UK Biobank Research Analysis Platform (RAP), highlighting differences in data models, tooling, governance, and platform constraints. A key focus will be cohort creation based on researcher‑defined phenotypes, using a combination of clinical coding systems, including Read codes, ICD‑9/10, Med codes, BNF codes, and DM+D.

We will also cover common challenges when working with messy real‑world data, translating research questions into computable definitions, and building reusable pipelines for extraction, transformation, and cleaning. The aim is to provide insight into the often unseen but critical role of data engineering in bridging raw data and downstream analysis across multiple secure platforms.

Ifeanyi Chukwu is a Research Software Engineer in the Data Analytics team at the Leeds Institute for Data Analytics (LIDA). I support a wide range of data science projects which are using sensitive data of all sorts, primarily providing data engineering expertise across secure research environments such as LASER and the UK Biobank Research Analysis Platform. My work focuses on cohort extraction, phenotype definition using clinical coding systems, data cleaning, and building robust, reusable pipelines that enable reproducible and scalable research across multiple platforms and datasets.

Sign up to LIDA news & events

Explore the practical realities of data engineering across multiple secure data research platforms, focusing on health and population data.

A webinar from the Data Analytics Team at Leeds Institute for Data Analytics.

We will discuss how robust data engineering underpins reproducible, scalable, and trustworthy health data science.

The talk will draw on real examples from working across environments such as LASER and the UK Biobank Research Analysis Platform (RAP), highlighting differences in data models, tooling, governance, and platform constraints. A key focus will be cohort creation based on researcher‑defined phenotypes, using a combination of clinical coding systems, including Read codes, ICD‑9/10, Med codes, BNF codes, and DM+D.

We will also cover common challenges when working with messy real‑world data, translating research questions into computable definitions, and building reusable pipelines for extraction, transformation, and cleaning. The aim is to provide insight into the often unseen but critical role of data engineering in bridging raw data and downstream analysis across multiple secure platforms.

Ifeanyi Chukwu is a Research Software Engineer in the Data Analytics team at the Leeds Institute for Data Analytics (LIDA). I support a wide range of data science projects which are using sensitive data of all sorts, primarily providing data engineering expertise across secure research environments such as LASER and the UK Biobank Research Analysis Platform. My work focuses on cohort extraction, phenotype definition using clinical coding systems, data cleaning, and building robust, reusable pipelines that enable reproducible and scalable research across multiple platforms and datasets.

Sign up to LIDA news & events

Good to know

Highlights

  • 30 minutes
  • Online

Location

Online event

Organized by
Report this event

More events from Leeds Institute for Data Analytics

Discover more events from Leeds Institute for Data Analytics, from Science & Tech to other experiences you might love.

Still looking for the right event?

Explore all online events to browse and filter by date, category, and more.