From the beginning of civilization, the media for storing data have been continuously evolving from such as stone tablets, animal bones and bamboo tablets to paper, with improvements on data density over time. Since the invention of electronics in the last century, the percentage of data stored in digital form has been increasing rapidly to almost 100% recently. Moreover, the amount of data generated has been increasing exponentially, from several ZB in 2008 to an expected 74 ZB in 2021, causing a much increased demand for data storage correspondingly. Most of the digital data are stored in physical media such as hard drives. In addition, many of the data are rarely accessed and are archived on reels of magnetic tapes. However, the physical thickness of the tapes and the size of magnetic domains limit the maximum data density, which is expected to reach a plateau soon. Furthermore, data in old tapes need to be copied onto new tapes regularly, as the magnetic tapes can normally last for ten to twenty years only. This process is time-consuming and expensive. Hence, next-generation media that can store digital data with a much higher data density and durability are needed.
Here we report the use of peptide sequences for digital data storage, a method that has not been reported before. The data-bearing peptides are commercially synthesized, and the data retrieval process is described here. As an example, we stored one dataset consists of (i) 848 bits of ASCII formatted text in 40 peptides, and (ii) another dataset consists of 13752 bits of the “silent night” music in MIDI format together with its title in ASCII format in 511 peptides. These files are available in Supplementary Files section.