29 April 2024

The CRISPR-Cas9 system created by the neural network has edited the genome of human cells

American researchers have presented the first neural network tool for creating fully artificial CRISPR-Cas9 genome editing systems. One of them was successfully tested on human cells and put into open access, according to the press release of Profluent. The preprint of the publication on the creation of the tool, called OpenCRISPR, is available at bioRxiv.org, briefly described on the project's website.

The creation of genome editing systems based on microbial CRISPR-related enzymes such as SpCas9 has opened new perspectives in molecular biology, medicine, agriculture and biotechnology. Nevertheless, natural bacterial systems, once they find themselves in the unnatural conditions of eukaryotic cells, have significant functional disadvantages - primarily insufficient editing efficiency, action outside the defined target, and low stability. This makes it necessary to search for new variants of the CRISPR-Cas system in microorganisms or optimize the existing ones for practical applications, and this is a long and time-consuming process.

To simplify it, Ali Madani (Ali Madani) and his colleagues from Profluent utilized the ProGen2 large language model (LLM) they had previously created for neural network design of protein molecules. They did this by systematically collecting data among 26.2 trillion base pairs of assembled microbial genomes and metagenomes from different genera and biomes. This allowed them to identify nearly a million and a quarter CRISPR-Cas operons of different types (their largest dataset to date, called the CRISPR-Cas Atlas), including Cas endonucleases, CRISPR sequences, CRISPR trans-activating RNAs (tgasgRNAs) and protospacer-adjacent motifs (PAMs).

The ProGen2-based language model was then customized to work with CRISPR-Cas Atlas and used to generate four million sequences balanced by protein families and cluster size (this took three days using 16 GPUs). They were categorized into CRISPR-Cas types and weeded out obviously non-functional variants using BLAST and HMM tools. Comparison with natural CRISPR-Cas using MMseqs2 showed that the generated sequences expanded diversity by 4.8-fold. Most of these sequences matched the closest natural sequence by only 40-60 percent, but their conformation as calculated by AlphaFold2 appeared to be close, indicating potential functionality. Further experiments with a select number of sequences and more precise instructions for the model yielded a variety of full-length CRISPR-Cas type II effectors with compatible guide RNAs (gRNAs).

To obtain Cas9-like proteins for experimental characterization, the researchers employed a limited generation strategy using either the N-terminal or C-terminal sequences of natural SpCas9 to ensure similar compatibility with PAM and gRNA. The 209 generated Cas9-like proteins were selected for functional analysis. Plasmids containing them and plasmids with SpCas9 gRNA targeting three known DNA sites were transfected into human immortalized cell line HEK293T. Some of the Cas9-like proteins showed efficacy comparable to or superior to SpCas9. After that, a similar experiment was performed using 48 fully (including N-terminal and C-terminal sequences) generated Cas9-like proteins, and many of them showed high efficiency and specificity.

The best of these, PF-CAS-182, was comparable to SpCas9 in on-target efficiency, while having much higher specificity (off-target editing level lower by 95 percent). Its sequence matched SpCas9 by 71.7 percent. After successful tests on a wide range of genomic targets, this protein was named OpenCRISPR-1 and put its sequence in the public domain, encouraging everyone to use and test it, providing feedback to the developers. Moreover, OpenCRISPR-1 was combined with language-generated adenine deaminases to produce a functional nitrogenous base editing system that efficiently replaces adenine with guanine at specified DNA sites.

Profluent staff expressed their willingness to collaborate with teams that need to optimize OpenCRISPR-1 for specific applications. They also noted that in accordance with ethical standards, the license for the technology includes some restrictions - for example, the prohibition of editing germline cells.

In November 2023, the UK became the first country in the world to approve the clinical use of CRISPR-based therapies. The drug is used ex vivo to treat hereditary anemias. Soon after, the US made the same decision and a month later licensed the use of the drug for a second indication.

Found a typo? Select it and press ctrl + enter Print version