The Internet Archive and Paper: Together At Last?

The Internet Archive has a new long-term solution for backing up digital books: printed copies. And while we may often think of the differences between the printed and the digital, The Archive blog (“Why Preserve Books? The New Physical Archive of the Internet Archive”) emphasizes some of the real, physical similarities that exist between the two mediums:

we have found that the digital versions have more and more in common with physical versions. The computer hard disks, while holding digital data, are still physical objects … hard drives are just another physical format that stores information. This connection showed us that physical archiving is still an important function in a digital era.”

There’s a lot to be said about the physical existences of digital books — sure, they’re bits and bytes, or they exist in the cloud, or etc., etc. — but there is still a very real physical existence on some server, on some hard drive, somewhere. And that physical existence and the accompanying question of data degradation is what motivates the Archive’s Physical Archive project. The project’s stated goal is a laudable one, even if it is a reach: to preserve one copy of every published work (which, by some estimates is around 100 million items). Realistically, the Archive hopes to archive 10 million books (for comparison’s sake, Oxford’s Bodleian Library collection is somewhere around 11 million items).

I rather like the parallel (and semi-symbolism) between this physical book repository for the Internet Archive and a seed bank — “as an authoritative and safe copy that may be called upon in the future.” It might not seem like much now, but the books seed bank is humanity’s insurance policy after the inevitable post-apocalyptic cataclysm when hi-tech technology is no more, and mankind can start afresh with printed book technology (A Canticle for Leibowitz?). Ok, that’s enough of that.

All of this of course relates to a much broader issue of data preservation, and the life expectancy/reliability of different forms of information storage. On a day to day basis, we’re all more than familiar with the fragility of digital storage, or how (relatively) quickly one format or another becomes obsolete. The BBC Domesday Project is a favorite example of exactly such digital obsolescence, which in 1986 utilized the most advanced storage medium available: the laserdisc.* Multiple means of backing up our most precious cultural commodity — knowledge, in the form of books — certainly seems like the surest way to go, doesn’t it? The more preservation projects, the better as far as I am concerned.

Why have a physical archive for stuff that already exists in digital form? Maybe it’s obvious, but the fact that digital copies exist on computers and can only be read by computers is both its greatest strength and its greatest potential weakness. And after all, ink-on-paper technology has proven to be the most durable and long-lasting form of information storage thus far.

Also worth browsing is the interesting discussion generated in the comments section of an Ars Technica article on this topic (“Internet Archive starts backing up digital books on paper”). My favorite part debates the merits of different book preservation mediums, such as the susceptibility of a purely digital archive to EMP attacks by aliens vs. the risk of fire for printed books in the Library of Alexandria scenario. From the specific issue of how digital book storage works, to the broader philosophical question of how and where knowledge exists, this is and will continue to be an interesting thing to keep an eye on.

* Although, the Domesday Project is making a comeback. Check out Domesday Reloaded.


Surprise me


I run the ThinkLab at the University of Cambridge, and research digital habits, productivity, and wellbeing.

tyler shores cambridge

What I’m Reading Now:

Supercommunicators by Charles Duhigg

Related Articles

Have questions or ideas or requests for working together?

Get in touch