A model for predicting the structure of AlphaFold 3 proteins is presented
Google DeepMind and Isomorphic Labs, owned by Alphabet, have unveiled AlphaFold 3, a new version of a machine learning-based model for predicting the exact structure of proteins and their interactions with each other and other substances. According to the developers, it is the first to surpass the accuracy of physical research methods. A paper on the model has been accepted for accelerated publication in the journal Nature. In addition, the development is featured in an editorial podcast and article, as well as a Google press release.
Knowledge of protein structure is essential in a wide variety of areas of biology, from understanding the fundamental mechanisms of how living organisms function to describing disease pathogenesis and rational drug design. Before the advent of machine learning, its determination was an extremely difficult, time-consuming and costly task. This began to change in 2018, when DeepMind employees introduced the first high-performance model, AlphaFold 1, which immediately won the CASP competition. A second, more efficient version, AlphaFold 2, saw the light of day in 2020 and has so far served as the standard in protein structure determination research. It has been used to develop malaria vaccines, various drugs, enzymes and more.
Project leader John Jumper (John Jumper) from Google DeepMind together with a team of authors developed AlphaFold 3 based on the previous version, but each of its components has undergone significant modifications. At its core is an improved version of the Evoformer deep learning module used in AlphaFold 2. After processing the input data, AlphaFold 3 predicts the structure using a diffusion model similar to those used for image generation. The process starts with a disorganized cloud of atoms, which is transformed through many iterations into the most accurate protein structure possible.
The capabilities of the new model include predicting the structure of complex complexes containing proteins, nucleic acids, low molecular weight compounds, ions and modified residues. According to the developers, AlphaFold 3 outperforms the best physical methods included in the PoseBusters test by 50 percent in terms of accuracy in predicting protein-ligand interactions. Isomorphic Labs is already collaborating with various pharma manufacturers to use the model to design new drugs. And in some practical aspects, the accuracy of the model can be twice that of existing methods. It has also proven to be significantly more efficient than RoseTTAFold All-Atom, another popular machine learning-based tool.
Unlike AlphaFold 2 and RoseTTAFold, scientists will not receive the AlphaFold 3 code and will not be able to run it in-house. To work with the new model, Google DeepMind has launched AlphaFold Server, which is for non-commercial research only (and it doesn't allow you to get the structure of proteins linked to potential drug molecules). The server is much faster than the AlphaFold2 app, and access is free but limited to a quota of ten predictions daily.
Pharmacochemist Brian Shoichet of the University of California, San Francisco, noted that because of such limitations, AlphaFold 3 will not be able to have the same impact on science and practice as AlphaFold 2. At the same time, evolutionary biologist Sergey Ovchinnikov of the Massachusetts Institute of Technology (MIT) in a conversation with Nature expressed hope that the information from the publication of the model will be enough to develop open source versions, and they may appear before the end of the year.
Earlier in 2024, the American company Profluent presented a neural network tool OpenCRISPR, which is designed to generate fully artificial CRISPR-Cas9 genome editing systems. The most successful one created so far was called OpenCRISPR-1, successfully tested on human cells and made available to the public.