09 June 2016

DNA instead of a hard drive

One day we will be able to encode all the information of the world in a few liters of DNA

Ilya Khel, Hi-News

Over the past few years, scientists have deciphered the genomes of mammoths and horses 700,000 years old, using DNA fragments extracted from fossils. DNA definitely lasts much longer than organisms for which it carries genetic codes. Computer scientists and engineers have long dreamed of harnessing the miniaturization and stability of DNA for storing digital data. They want to encode all these zeros and ones into molecules A, C, G and T, which form a spiral staircase of the DNA polymer – and the achievements of this decade in the field of DNA synthesis and sequencing have led to a major breakthrough. Recent experiments have shown that one day we will be able to encode all the digital information of the world in a few liters of DNA – and read it again thousands of years later.

Interest from Microsoft and other technology companies increases the tension in this area. Last month, Microsoft Research announced that it would pay a startup in the field of synthetic biology Twist Bioscience a certain amount to create 10 million DNA strands developed by Microsoft computer scientists for data storage. Leading memory manufacturer Micron Technology is also funding DNA data storage research to determine whether a nucleic acid-based system can expand the limits of electronic memory. This influx of money and interest can gradually reduce exorbitant costs and make storing data in DNA possible in ten years, the researchers say.

People will generate more than 16 trillion gigabytes of digital data by 2017, and most of it will need to be archived. Legal, financial and medical data, as well as, of course, multimedia files. Today, data is stored on hard drives, optical disks in energy-intensive data centers the size of a warehouse. At best, this data is stored for thirty years, at worst – several. In addition, as Microsoft Research computer architect Karin Strauss says, "we produce much more data than the industry manages to produce devices for storing them, and forecasts show that this gap will grow."

And now let's add DNA to all this. It lives for centuries if you keep it in a dry, cold place. Theoretically, it can be used to pack billions of gigabytes of data into the volume of a sugar cube. Magnetic tapes, the densest of modern data storage environments, hold 10 gigabytes in the same amount of space. "DNA is an incredibly dense, durable and non–volatile data carrier," says Olgica Milenkovich, professor of electrical and computer engineering at the University of Illinois at Urbana-Champaign.

This is because each of the four building molecules – adenine (A), cytosine (C), guanine (G) and thymine (T) – occupies a cubic nanometer in volume. Using an encoding system–say, in which A represents bits "00", C represents "01" and so on– scientists can take rows of zeros and ones forming digital data files and create a DNA chain containing a snapshot or video. Of course, the real coding technique is much more complicated than we have written to you here. The synthesis of a designer DNA chain is the process of recording data. Scientists can then read them by sequencing the chains.

Harvard University geneticist George Church founded this field of research in 2012 by encoding 70 billion copies of a book – one million gigabits –in a cubic millimeter of DNA. A year later, scientists from the European Institute of Bioinformatics showed that they could read, without a single mistake, 739 kilobytes of data contained in DNA.

Last year, several teams of scientists demonstrated fully functioning systems. In August, E.T.H Zurich scientists encapsulated synthetic DNA in glass, subjected it to conditions simulating the expiration of 2000 years, and completely restored the encoded data. At the same time, Milenkovich and her colleagues reported on the preservation of Wikipedia pages by six American universities in DNA and – providing special "addresses" to sequences – selectively read and edited parts of the written text. Random access to data is very important to avoid having to "sequence an entire book to read just one paragraph," says Milenkovich.

In April, Strauss and scientists George Seelig and Louis Cese from the University of Washington reported that they were able to write three image files, each several tens of kilobytes, into 40,000 DNA strands using their own encoding scheme, and then count them individually without making mistakes. They presented the work in April at the conference of the Association for Electronic Computing. With the help of 10 million chains that Microsoft buys from Twist Bioscience, scientists plan to prove that DNA data can be stored on a much larger scale. "Our task is to demonstrate a finite system in which we encode DNA files, synthesize molecules, store them for a long time, and then restore them by sequencing DNA," says Strauss. "We start with the bits and go back to the bits."

Memory manufacturer Micron is studying DNA as a post-silicon technology. The company is funding the work of Church and scientists at the University of Idaho to create an error-free storage system in DNA. "The rising cost of data storage will stimulate alternative solutions, and DNA storage is one of the most promising solutions," says Gurtey Sandu, Director of Advanced Technology Development at Micron.

Scientists are still looking for ways to reduce the number of errors in encoding and decoding data. But the main parts of the technology are already in place. So what's stopping us from moving from shoebox-sized data warehouses to glass DNA capsules? Cost. "The recording process is a million times more expensive," Seelig says.

And here's why: creating DNA involves stringing nanometer–sized molecules one by one with high precision is not an easy task. And although the cost of sequencing has fallen due to the rapidly developing demand for this service, DNA synthesis has not had a similar driver on the market. Milenkovich paid about $150 to create a series of 1,000 synthesized nucleotides. Sequencing a million nucleotides costs about a cent.

The interest in data storage from Microsoft and Micron may be exactly the impulse that is needed to start reducing costs, Seelig says. Smart engineering and new technologies like microfluidics and nanopore DNA sequencing, which help to reduce and speed up the process, will also contribute to the advancement. Now it takes several hours to sequence several hundred pairs of nucleotides – and days to synthesize them – using a bunch of equipment. I would like to do all this in a small box, otherwise the advantage of data storage density will be lost.

If all goes well, Strauss imagines the emergence of companies offering archival DNA storage services over the next decade. "You can open a browser and upload files to their website or take your bytes back, just like with the cloud," she says. Or you can buy a DNA disk instead of a hard drive.

Portal "Eternal youth" http://vechnayamolodost.ru  09.06.2016

Found a typo? Select it and press ctrl + enter Print version