Meghan Bialt-DeCelie – ’19
Scientists have explored the concept of data storage in DNA, one of the most fundamental biological molecules to living things. According to the Shannon information capacity, a nucleotide can ideally contain 2 bits of data.
However, DNA does not currently have this capacity due to difficulties and errors associated with high GC content and areas with long runs of the same nucleotide called homopolymers. DNA also is not always reliable in retrieving data and bringing the concept to a larger scale. Dropout of oligonucleotides can also occur due to oligo synthesis, DNA amplifications, and damage of DNA during storage.
Researchers Yaniv Erlichvand PhD and Dina Zielinski from the New York Genome Center address all of these problems faced by DNA data storage with the DNA Fountain method. As the name suggests, they utilize fountain codes in hopes to prevent oligo dropouts. It works in three steps: a binary file is preprocessed into smaller segments of binary sequences, the data is packaged into “droplets,” and they are attached to “seeds”. These seeds are short binary sequences of a fixed length that help identify the data of the droplet. The droplets are then translated to DNA, and the DNA is accepted or rejected depending on the biochemical restrictions of DNA including homopolymer and high GC content. This method is an improvement from other methods in that it reduces oligo dropout and is more reliable to retrieve the data from the DNA. The research team was also able to densely store and retrieve data from DNA that held 215 million gigabytes of data per gram, which exceeds all other attempts of DNA storage. For the future, the cost effectiveness and capacity of DNA data storage should be further investigated.
- Erlich, Yaniv, and Dina Zielinski, DNA Fountain enables a robust and efficient storage architecture. Science (2017). doi:10.1126/science.aaj2038
- Image retrieved from: http://www.publicdomainpictures.net/view-image.php?image=31530&picture=structure-of-dna