ChatGPT for proteins: Designing molecular machines using text prompts
A talk by Andrew Ferguson
Date and time
Location
International Centre for Mathematical Sciences (ICMS) G.03
Bayes Centre 47 Potterrow Edinburgh EH8 9BT United KingdomAbout this event
- Event lasts 1 hour
About:
Proteins are the molecules of life. They are the essential building blocks of muscles in our bodies, but they also perform many other critical functions such as digesting food, fighting disease, and cellular signaling. Biological proteins have evolved to do these functions under natural selection, Darwin’s “survival of the fittest,” but scientists and engineers have developed synthetic proteins to perform other tasks – everything from acting as better stain fighters in laundry detergents to serving as new therapeutic drugs.
Designing proteins to perform these tasks is challenging since there is an astronomically large number of possible protein sequences and the rules linking the sequence to the function are not well understood. Advances in machine learning and artificial intelligence now touch nearly all corners of modern life, and these powerful tools are also opening new opportunities in the understanding and design of proteins. Many of us are now familiar with so-called “large language models” such as ChatGPT based on artificial neural networks – computational models loosely inspired by the human brain – capable of performing language processing tasks.
These networks, and others like them, are becoming proficient in interacting with humans through natural language to produce not only novel text responses, but also images (e.g., Dall-E), computer code (e.g., CoPilot), and videos (e.g., Pika). Broadly speaking, these models learn the simplified rules and patterns underlying large training sets of text, images, computer code, or videos, and then use this learning to produce new examples with desired characteristics.
Protein language models have recently emerged that use similar mathematical and computational ideas to learn the rules and patterns in protein sequences that govern protein structure (e.g., AlphaFold2) and, more recently, protein function. This public lecture will describe recent advances in protein language models to design synthetic protein with desired functions, their mathematical and algorithmic foundations, and how they are being used today to help design new proteins and democratize protein design by serving as a user-friendly “ChatGPT for proteins.”
The Speaker:
Andrew Ferguson
Pritzker School of Molecular Engineering and Department of Chemistry, University of Chicago.
https://pme.uchicago.edu/faculty/andrew-ferguson | https://www.ferglab.com/