The development of ESM3 builds on earlier research conducted at Meta, the parent company of Facebook and Instagram, before EvolutionaryScale was founded in 2024. ESM3 is a generative AI model similar to OpenAI’s GPT-4 but focuses on biology.
AI Develops Genetic Code
Proteins are made of chains of amino acids. And their sequences are encoded by genes.
Each protein folds into a unique shape that determines its function. To train ESM3, researchers fed it data from 2.78 billion naturally occurring proteins. The AI was tasked with filling in missing parts of protein sequences, much like completing a sentence with missing words.
“The same way a person can fill in the blanks in the soliloquy ‘to _ or not to _, that is the _,’ we can train a language model to fill in the blanks in proteins,” Rives explained.
Green fluorescent proteins like esmGFP are widely used in research labs, where their genetic code is often fused to other DNA sequences to track proteins and cellular processes. According to Rives, ESM3 could accelerate protein engineering for applications like drug development.