DNA Storage Technology

Many people may know various forms of data storage used for storing digital data. The data storage has evolved continuously over the years since it was CDs, flash drives, hard disks, and now it becomes popular to store data on Cloud storage.

Nowadays, since people use the Internet widely, the amount of data including pictures, videos and various contents on the Internet tends to increase all the time. According to data collection survey at the end of 2020, there was approximately 44 zettabytes* of data worldwide on the Internet, or around 4.4 billion terabytes. (*A zettabyte (ZB) is a measure of digital storage capacity. One zettabyte is 1021 or 1,000,000,000,000,000,000,000 bytes.)

It is estimated that by 2025, the total annual generated data on the Internet will be approximately 175 zettabytes.

From the above-mentioned numbers, you might wonder where those data are stored and how big the storage must be to keep all data. To visualize and clear your doubts, let’s take a look at a data center of Google in the picture below. You can now see how huge it is.

An interesting new way is to store data in DNA, also known as DNA storage, that uses technology to synthesize DNA. The beginning of DNA storage was in 1959. Richard Feynman, one of the most influential physicists of the 20th century, presented his idea of how to make things smaller. When objects that can perform the same functions become smaller, we can make more use of the same space. This became the concept of nanotechnology later. He also commented that we might be able to take advantage of biological matter that is as small as a cell.

This idea was tested and practiced later until it was able to collect data successfully in the form of DNA storage. When comparing the capacity to weight of DNA and hard drives, you can see from the picture below that the difference in capacity is 2 million times per gram.

This is because DNA is very small and its main structure consists of a long polynucleotide double helix. Each nucleotide contains a small storage unit known as a “Nitrogenous Base”. This base is important because it maintains the data status in the same way as computer bits. It can only match with a specific base pair (similar to the state of 0 and 1) but the size of the base is small at the nanometer scale, so it can contain a huge status of data.

There are 3 steps of storing data on DNA storage as below.

Step 1: Data Encoding

Convert all data you want to save into computer binary digit or bit with only 0 and 1 states. This is the digital data that the computer can read.

Step 2: DNA Synthesis

Once the data are converted to the digital files, we use algorithms or special programs to convert these data into the base sequences in the DNA strands with a specific format.

Step 3: Data Decoding

After recording the data in the synthesized DNA strands, if you want to read the data, you can bring those DNA strands to do a DNA sequencing process. This is to read them back into computer bits and then process them to be the image, audio, or video data that you have recorded.

Research and development of DNA storage technology continues to move forward, for example, in 2016 Microsoft and the University of Washington jointly developed the first DNA-based system to fully store digital data. Microsoft has revealed that the research team is now able to collect and retrieve all data automatically. And there are projects that will use DNA storage, possibly starting from archiving large data in the Cloud service.

Let’s imagine if DNA storage can actually be used in the future, a huge data center may be reduced to the size of just a small coin.