CASCADE Showcase event
Overview
Our Research Showcase on Tuesday, 10 March 2026, will feature a series of short talks by the CASCADE students. The event will take place in Lecture Theatre 2 at the Department of Computer Science and Technology.
Please join us to catch up with some of the cutting-edge research in computer architecture being performed by students in CASCADE and more widely in the Computer Architecture Group through a showcase and light refreshments. We'll have five short talks from the first cohort of students in CASCADE and other researching computer architectures, presenting their work on topics from EDA verification and hardware for language runtimes to quantum and ML for systems.
Following this, you can meet the speakers and faculty in computer architecture at your leisure whilst enjoying an early evening drink and light snacks in the Department of Computer Science and Technology.
The confirmed speakers are:
Jiayi Nie - KernelCraft: Benchmarking for Agentic Close-to-Metal Kernel Generation on Emerging Hardware
Bio: Jiayi is a second-year PhD student in the Systems Architecture Group, working on co-designing foundation models and AI accelerators, supervised by Dr. Rika Antonova and Prof. Robert Mullins.
Abstract: New AI accelerators with novel instruction set architectures (ISAs) often require developers to manually craft low-level kernels --- a time-consuming, laborious, and error-prone process that cannot scale across diverse hardware targets. This prevents emerging hardware platforms from reaching the market efficiently. While prior LLM-based code generation has shown promise in mature GPU ecosystems, it remains unclear whether agentic LLM systems can quickly produce valid and efficient kernels for emerging hardware with new ISAs. We present KernelCraft: the first benchmark to evaluate an LLM agent’s ability to generate and optimize low-level kernels for customized accelerators via a function-calling, feedback-driven workflow. Within KernelCraft, the agent refines kernels under ISA and hardware constraints using automated feedback derived from compilation checks, simulation, and correctness validation against ground truth. In our experiments, we assess agent performance across three emerging accelerator platforms on more than 20 ML tasks, each with 5 diverse task configurations, with special evaluation of task configuration complexity. Across four leading reasoning models, top agents produce functionally valid kernels for previously unseen ISAs within a few refinement steps, with optimized kernels that match or outperform template-based compiler baselines. With that, we demonstrate the potential for reducing the cost of kernel development for accelerator designers and kernel developers.
Luisa Cicolini - Bitblastable ISAs and where to find them
Bio: Luisa is a first year PhD Student researching how to best mechanize the semantics of hardware abstractions to improve the verification of compilers, particularly with interactive theorem provers.
Abstract: The verification of compilers with interactive theorem provers (ITPs) is hindered by scarce automation, requiring manual correctness proofs for every step of the compilation. In this work, we map part of the RISC-V Instruction Set Architecture (ISA) to the bitvector library of the Lean theorem prover, extending its verified bitblaster to automate reasoning about ISA primitives and automatically proving the correctness of the instruction selection pass in LLVM’s RISC-V backend.
Qianhui Wang - Enhancing temporal safety of CHERI-aware language runtimes with ARM MTE
Bio: Qianhui is a 1st-year PhD student working on hardware-assisted memory safety, in particular looking at the CHERI research. She had previously completed a M.Phil. in ACS at Cambridge, where she worked on securing CPython runtime with CHERI capabilities, and a Bachelor in Advanced Computing at the ANU, where her focus is on cryptographic protocols and their applications to blockchains.
Abstract: Using capability instructions for memory access enables deterministic traps of out-of-bounds and use-after-reallocation errors in the CHERI-aware languages. However, benchmarking the CHERI-CPython allocators reveals very prominent overheads associated with the current temporal safety mechanism, which discourages industrial adoption. While sources of overheads could be the currently less-than-optimal revoker design, complex interaction of the quarantine and runtime allocator behaviours, we are motivated to explore adding ARM's memory tagging extension (MTE) to recolor freed allocations for reuse immediately, reducing the amount of memory quarantined and the frequency of revocation sweeps that installs bulk of memory and runtime overheads currently.
Sanaa Sharma - Space-time Optimisations for Early Fault-Tolerant Quantum Computation
Bio: Sanaa is a second-year PhD student in the CompSci department working in Prakash Murali's group. Sanaa works on resource estimation for fault-tolerant quantum computers.
Abstract: Fault-tolerance is the future of quantum computing, ensuring error-corrected quantum computation that can be used for practical applications. Resource requirements for fault-tolerant quantum computing (FTQC) are daunting, and hence, compilation techniques must be designed to ensure resource efficiency. There is a growing need for compilation strategies tailored to the early FTQC regime, which refers to the first generation of fault-tolerant machines operating under stringent resource constraints of fewer physical qubits and limited distillation capacity. Present-day compilation techniques are largely focused on overprovisioning of routing paths and make liberal assumptions regarding the availability of distillation factories. Our work develops compilation techniques that are tailored to the needs of early FTQC systems, including distillation-adaptive qubit layouts and routing techniques. In particular, we show that simple greedy heuristics are extremely effective for this problem, offering significant reduction in the number of qubits compared to prior works. Our techniques offer results with an average overhead of 1.2X in execution time for a 53% reduction in qubits against the theoretical lower bounds. As the industry develops early FTQC systems with tens to hundreds of logical qubits over the coming years, our work has the potential to be widely useful for optimising program executions.
Good to know
Highlights
- 2 hours
- In person
Location
Computer Laboratory
15 JJ Thomson Avenue
Cambridge CB3 0FD United Kingdom
How do you want to get there?
Organised by
Cambridge Computer Science Department
Followers
--
Events
--
Hosting
--