We live in an age of information overload. A recent estimate of the amount of digital data in the world pegged this number at a whopping 1.8 zettabytes- that’s 1.8 x 10^21 bytes, or 1800 billion gigabytes of data. Indisputably, a lot of this information is extremely important, such as hospital files on patients’ clinical histories, or data on poverty levels across the globe, and needs to be accessed frequently. However, I suspect that a further part of this data is a little less important, and needs to be accessed much less often- case in point (although the number of views of this video suggest that my conception of the importance of this data is quite misconstrued). Currently, the most popular and cost-effective means for long-term storage of all this data is to transfer it onto magnetic tape and store the tape in repositories. However, because of the nature of magnetic tape and the frequent change in media formats, this data needs to be rewritten from time to time to prevent its loss, a rather expensive exercise. An experiment performed by a handful of scientists from the European Bioinformatics Institute in England, led by Nick Goldman and Ewan Birney, and published recently in Nature suggests that to solve this rather costly data archiving problem, we may just have to look within ourselves. Where plastic fails, DNA may succeed!
The authors of the study played with the idea of using DNA as a medium for long-term data storage. This idea itself has been around for a while, first having been proposed in 1995. Indeed, Harvard synthetic biologist George Church and his colleagues even tried it out- the results of their experiment were published in Science last year. Although novel, Church’s team’s approach was error-prone. Using a more sophisticated approach Goldman and team encoded ~750 kilobytes of digital data in 18 megabases of DNA. For their proof-of-principle experiment, the scientists chose 4 different types of digital files to demonstrate the utility of their approach- an ASCII text file with all of Shakespeare’s 154 sonnets, a fitting pdf file of Watson and Crick’s classic paper on the double helix structure of DNA, an mp3 audio file with an excerpt of Martin Luther King’s “I have a dream speech”, and a jpeg picture file of the institute where the work was done. Using a code to convert binary data into the As, Gs, Cs, and Ts of the DNA code, they encoded these 4 computer files as long DNA sequences. The novelty of their approach is that they managed to encode the data into DNA nucleotides by avoiding runs of the same nucleotides (AAAA, CCCC), which is what made Church’s approach error prone. They then synthesized this DNA in short overlapping fragments in the lab, flew it half the way around the world, and sequenced it to try and recover the original data. And they succeeded, with 100% accuracy. In the process, they used only 10% of the synthetic DNA sample they synthesized to be able to decode the data within, leaving plenty more to repeat the experiment several times.
The authors then went a step further in trying to estimate the costs of their DNA-based storage approach for larger amounts of data and comparing it to current costs with magnetic tape. At their current estimate of ~$12,620 of writing and reading costs per megabyte of data, magnetic tape is clearly more cost-effective, with DNA-based storage breaking even only after 600-5000 years of repeated tape rewriting. However, with the rapid advancement of DNA synthesis and sequencing technologies, and the associated drop in costs, the authors predict that within a decade DNA-based long-term data storage may become a viable option for data stored for at least 50 years.
If you think about it, DNA makes a pretty attractive medium for data storage- that is afterall it’s innate function, and it’s been doing so for well over a billion years. Considering that 6 gigabases or 6.6 picograms (a picogram is 1 millionth of a microgram) of DNA is all it theoretically takes to encode one human being, it’s a pretty dense storage medium. It’s also no-fuss- it is stable even in extreme climatic conditions- just take the case of the DNA sequencing projects of the woolly mammoth and the Neanderthal. So in another few decades, if DNA synthesis and sequencing costs plummet as expected, libraries may be hiring geneticists for their archiving needs!