02 March 2018

DNA is like a flash drive

Why write digital data into the genome

RIA News

The growing volume of digital information encourages scientists to look for more compact ways to record and store it. And what could be more compact than DNA? RIA Novosti, together with an expert, found out how to encode words with nucleotides and how much data one molecule holds.

Base codes

DNA is a sequence of nucleotides. There are only four of them: adenine, guanine, thymine, cytosine. To encode information, each of them is assigned a digit-code. For example, thymine – 0, guanine – 1, adenine – 2, cytosine - 3. Coding begins with the fact that all letters, numbers and images are translated into binary code, that is, a sequence of zeros and ones, and they are already converted into a sequence of nucleotides, that is, a quaternary code.

You can use only three nucleotides to build a code (a ternary code), and the fourth is to break sequences into parts. There is an option with the construction of bases in the form of a binary code, when two of them correspond to zero, and two correspond to one.  

Several techniques are used for reading. One of the most common is that the DNA molecule chain is copied using bases, each of which has a color label. Then a very sensitive detector reads the data, and the computer reconstructs the sequence of nucleotides from the colors.

"The DNA molecule is very capacious. Even in bacteria, it usually contains about a million bases, and in humans – as many as three billion. That is, each human cell carries a volume of information comparable to the capacity of a flash drive. And we have trillions of such cells. A huge amount of data can be recorded in DNA, but writing and reading from such a medium is still too slow and expensive," says Alexander Panchin, Candidate of Biological Sciences, senior researcher at the A. A. Harkevich Institute of Information Transmission Problems of the Russian Academy of Sciences. 

The recording density is growing

In June 1999, the journal Nature published an article by American scientists who developed a technique for sending secret messages using DNA. They synthesized the molecule by including a sequence of nucleotides formed using a quaternary code. The secret DNA in the mixture was sent to another laboratory. Its employees, using special chemical keys, found the right molecule and extracted information from it. 

"In general, there are two approaches to recording data on DNA. The first is when you synthesize a completely new DNA using a chemical synthesizer. At the command of the computer, the nucleotides are added to the solution in a certain order, and the necessary chain of bases gradually "grows". In the second case, data is encoded in the already existing DNA of an organism," explains Panchin. 

In May 2010, the group of Craig Venter, who was the first to map the human genome, published a paper on the creation of an artificial bacterium. They took a bacterial cell purified from the genome as a basis and placed the formed sequence of bases there. It turned out to be a new bacterium, quite active and alive, differing from the usual one only in that its DNA was created manually. In addition, the team demonstrated a sense of beauty by writing their names and quotes from classical works using a quaternary code in the DNA of the bacterium. 

In 2012, a group led by molecular biologist George Church approached the matter more thoroughly and encoded in DNA a book "Regeneration: How Synthetic Biology will Rediscover Nature and ourselves" with a volume of 52 thousand words, several pictures and one program written in Java. They used binary code. The total amount of data was 658 kilobytes. The information density turned out to be almost 1018 bytes per gram of molecules. For comparison: a 1012-byte hard drive weighs about a hundred grams. The main disadvantage of the method is the instability of the recorded information. 

"The DNA molecule tends to mutate, which reduces the reliability of data storage. Especially if the DNA carrier is a living cell capable of division: when DNA is doubled, errors creep in especially often. The reliability of data storage will increase if you have thousands of copies of the same message. Well, or just store DNA, say, in the freezer. At low temperatures, the ability of the molecule to mutate is significantly reduced," the expert explains. 

In addition, information is sometimes lost when reading. Errors can be of a chemical plan, when an incorrect base is attached to the element, or purely calculated, that is, depending on the computer.

Expensive, reliable

In March 2017, Science magazine published an article by American scientists who managed to record 2 *1017 bytes per gram of DNA. Biologists emphasize that they have not lost a single byte. Simply put, what was recorded was received at the output. 

For the average user, a "genetic flash drive" is not yet available, because it is very expensive to store information on it, and the read/write speed is low. According to scientists, reading only one megabyte requires about three and a half thousand dollars and several hours of time. 

The undoubted advantages of recording information on DNA include the huge density of data storage, as well as the stability of the carrier – however, only at low temperatures.

Portal "Eternal youth" http://vechnayamolodost.ru


Found a typo? Select it and press ctrl + enter Print version