11 February 2021

Virtual genomes

Machine learning generates realistic genomes of non-existent people

XX2 century

Thanks to new algorithms and advances in computer technology, electronic computing machines can now train complex artificial intelligence models and generate high-quality synthetic data, such as photorealistic images or summaries of fictional people. A study recently published in the international journal PLOS Genetics (Yelmen et al., Creating artificial human genomes using generative neural networks) presents a machine learning algorithm trained on the bases of existing biobanks that generates fragments of human genomes that do not belong to real people, but have the characteristics of real DNA.

realistic-genomes.jpg

The generator generates random noise, while the discriminator checks the generated data against the database of available real data. At the end of the process, the algorithm generates artificial data that looks like real data, but is actually completely new. – Existing genome databases are an invaluable resource for biomedical research, but they are either inaccessible to the community or protected by lengthy and exhausting application procedures due to reasonable ethical considerations.

This creates a serious barrier for researchers. Machine genomes, or artificial genomes, as we call them, can help us overcome this problem in a safe ethical framework, said Burak Yelmen, the first author of the study, a specialist in modern population genetics from University of Tartu (Tartu Ülikool).

A multidisciplinary team of scientists conducted many analyses to assess the quality of the generated genomes compared to real ones.

– Surprisingly, these genomes, created by chance, mimic the complexities that we can observe in real human populations, and in most properties they do not differ from other genomes from the database that we used to train our algorithm, except for one detail: they don't belong to any of the donors, the doctor explains Luca Pagani, one of the senior authors of the study.

The proximity of artificial genomes to real ones is also evaluated to check whether the confidentiality of the original samples is preserved.

– Although searching for leaks among thousands of genomes may seem like searching for a needle in a haystack, a combination of many statistical calculations allowed us to thoroughly test all models. A detailed study of complex patterns of leakage can lead to improved evaluation and design of generative models, and will also contribute to the development of machine learning, she said Flora Jay, work coordinator and researcher at the Interdisciplinary Computing Laboratory of the University of Paris-Saclay.

Using machine learning, faces, biographies, and now genomes of non-existent people are already being generated. These imaginary people with realistic genomes could serve in research as a kind of representatives of real genomes, access to which is difficult for scientists.

Portal "Eternal youth" http://vechnayamolodost.ru

Found a typo? Select it and press ctrl + enter Print version