Data Engineering Across Secure Analytics Platforms
Explore the practical realities of data engineering across multiple secure data research platforms, focusing on health and population data.
A webinar from the Data Analytics Team at Leeds Institute for Data Analytics.
We will discuss how robust data engineering underpins reproducible, scalable, and trustworthy health data science.
The talk will draw on real examples from working across environments such as LASER and the UK Biobank Research Analysis Platform (RAP), highlighting differences in data models, tooling, governance, and platform constraints. A key focus will be cohort creation based on researcher‑defined phenotypes, using a combination of clinical coding systems, including Read codes, ICD‑9/10, Med codes, BNF codes, and DM+D.
We will also cover common challenges when working with messy real‑world data, translating research questions into computable definitions, and building reusable pipelines for extraction, transformation, and cleaning. The aim is to provide insight into the often unseen but critical role of data engineering in bridging raw data and downstream analysis across multiple secure platforms.
Ifeanyi Chukwu is a Research Software Engineer in the Data Analytics team at the Leeds Institute for Data Analytics (LIDA). I support a wide range of data science projects which are using sensitive data of all sorts, primarily providing data engineering expertise across secure research environments such as LASER and the UK Biobank Research Analysis Platform. My work focuses on cohort extraction, phenotype definition using clinical coding systems, data cleaning, and building robust, reusable pipelines that enable reproducible and scalable research across multiple platforms and datasets.
Explore the practical realities of data engineering across multiple secure data research platforms, focusing on health and population data.
A webinar from the Data Analytics Team at Leeds Institute for Data Analytics.
We will discuss how robust data engineering underpins reproducible, scalable, and trustworthy health data science.
The talk will draw on real examples from working across environments such as LASER and the UK Biobank Research Analysis Platform (RAP), highlighting differences in data models, tooling, governance, and platform constraints. A key focus will be cohort creation based on researcher‑defined phenotypes, using a combination of clinical coding systems, including Read codes, ICD‑9/10, Med codes, BNF codes, and DM+D.
We will also cover common challenges when working with messy real‑world data, translating research questions into computable definitions, and building reusable pipelines for extraction, transformation, and cleaning. The aim is to provide insight into the often unseen but critical role of data engineering in bridging raw data and downstream analysis across multiple secure platforms.
Ifeanyi Chukwu is a Research Software Engineer in the Data Analytics team at the Leeds Institute for Data Analytics (LIDA). I support a wide range of data science projects which are using sensitive data of all sorts, primarily providing data engineering expertise across secure research environments such as LASER and the UK Biobank Research Analysis Platform. My work focuses on cohort extraction, phenotype definition using clinical coding systems, data cleaning, and building robust, reusable pipelines that enable reproducible and scalable research across multiple platforms and datasets.
Good to know
Highlights
- 30 minutes
- Online