Category: Preservation

It Takes a Village to Save a Hard Drive

Doron in utter disbelief

In the final days of  the XFR STN exhibition at the New Museum, we encountered what was hands-down the most challenging born-digital recovery to have occured during the run of the exhibition. On August 30th, artist Phil Sanders arrived at the New Museum with an amalgam floppy disks, and two external hard disk drives. XFR STN technician Kristin MacDonough went into production mode with recovering the floppy disks. In just a few hours Kristin was able to recover 146 of Phil’s floppy disks. Prolific!

Kristin MacDonough rescuing floppy disks

Kristin MacDonough, Phil Sanders, and family

While Kristin tended to the sea of floppy disks, I investigated the hard drive situation. The first external disk drive was a peripheral used with an Amiga. The enclosure’s external interface was nothing we could use with our variety of adapters and forensic bridges, so I opened it up to take a look at the internal interface.

Phil Sanders' Amiga external hard disk drive

Phil Sanders' Amiga external hard disk drive

Luckily the drive inside was just a standard 3.5″ SCSI hard disk. Using a Tableau SCSI bridge (or “write-blocker”) and FTK Imager we made a raw (dd) disk image of the drive. Having worked previously with Amiga hard disk images I knew this wasn’t the end of the story. This raw image of the entire disk would only really be useful if it was a system disk, and if Phil wanted to emulate his old Amiga system. If all that Phil really wanted was the files on the disk, the image would be useless to him for a few reasons: 1) the Amiga Fast File System (AFFS) is not supported in FTK Imager, so we would be unable to browse the file system or dump the files for him there, as was the workflow for most disk images at XFR STN. 2) AFFS is fortunately supported in Linux, but the partitioning scheme used on Amiga disks is not, meaning this raw image we’ve produced of Phil’s disk can not be mounted as-is. Michael Kohn made a brilliant tool that provides a solution to this – not only can you view partitions on the disk image, and browse around the file system, you can use his tool to dump a raw image of just one partition. This “dumped” raw image can then be mounted natively in Linux, allowing you to get the files and do whatever it is you please. We used this process to provide Phil with the full raw image of the disk, the raw image devoid of partitioning scheme, and a dump of all of the files. Start-to-finish this does not take much time at all… if you’ve used the tools before, maybe one hour tops.

Phil Sanders' "Sider" external hard disk drive

The next external hard disk drive, originally used with an Apple //e, was a wholly different scenario. The external interface appeared at first glance to be SCSI, but after counting the pins it became apparent that we were dealing with something else. I posted pictures of the drive to the Digital Curation list, and Mark Matienzo was able to find the manual for the drive, confirming that the connection was in fact SASI, an interface that was precursor to SCSI.

The Sider's external SASI interface

I opened up the enclosure, hoping that the internal interface was something that we could easily work with, only to find that not only was the internal interface equally obscure, but that the disk was a whopping 5.25″ form factor, as opposed to the standard 3.5″ encountered in most personal computer hard disk drives.

Pictured right: The Sider's hard disk. Left: 3.5" hard disk for scale

Phil’s hard drive is pictured above on the right. On the left is a 3.5″ SCSI hard disk for scale. As we had no simple way of interfacing with the Sider, either through its external or internal interface, we knew that given our limited time and expertise, the easiest way to work with the drive would be with the original computer it was used with. Luckily, Phil had held on to his Apple //e. Knowing that our lab setup at XFR STN could easily recover 5.25″ floppy disks, my plan was to essentially migrate files from the 10MB hard drive, to 5.25″ floppies, which we could then recover and extract the files from. The following week, on September 6th, Phil brought in his Apple //e and all of its peripherals. By some crazy stroke of fate, Apple ][ expert and friend Jason Scott,  happened to stop by to visit XFR STN just as we were setting up Phil’s computer. This was a life saver, as my knowledge of Apple DOS leaves a bit to be desired. Phil informed us that The Sider contained the bulk of his artistic output from the 80’s, including much material that he produced while artist-in-residence at NYU. He also informed us that the Sider was in fact the boot disk for the //e. All of the hardware was plugged in and waiting. It was with great anticipation we proceeded to power on Phil’s Apple //e, and listened to a 10lb, 10MB hard disk attempt to spin up for the first time in over two decades.

Jason Scott powering on The Sider, Doron and Phil looking on

Initially, nothing happened. The drive sounded horrible, and the //e informed us that there was an IO error. The drive sounded like it was spinning, but it just sounded bad. It sounded like a mechanical device that had not been properly exercised in two decades. Slowly though, it began to spin faster. After some coaxing, powering the //e on and off a few times, and perhaps a few prayers to the god of dead media, we heard some head activity coming from the drive, and miraculously, the CRT monitor offered up a menu.

Booting the Apple //e

Phil’s memory had been accurate. Not only was the Sider the //e’s boot disk, but it had multiple operating systems available, and this boot menu that allowed us to choose what OS to use. We booted into DOS and proceeded to take a look at what was on the disk.

Jason Scott, hard drive whisperer.

The disk was mainly cooperative, and we only had to reboot a few times due to IO errors. There was one area of the disk that appeared to be corrupt or, unreadable. We found a lot of content on the healthy areas of the disk, but what we saw were mostly system files and software. No art, but we copied the files we found to 5.25″ floppy disk anyway just to be on the safe side. Phil then informed us that he hardly used Apple DOS, and primarily worked in ProDOS. We restarted the machine and booted into ProDOS. We then found the mother lode. What had appeared to be a corrupt part of the disk was in fact the area of the disk that only ProDOS could read – and it contained massive amounts of artwork made by Phil during the 80’s.

We had to act fast – there was no telling how long the Sider would spin before finally buying the farm. While in DOS we had simply copied files in batches to floppy disk, we found that in ProDOS we could use the Copy II Plus program to actually produce a backup of the entire Sider hard disk. Jason initiated the process.

L to R: Jason Scott, Doron Ben-Avraham, Walter Forsberg, Phil Sanders

We only happened to have seven 5.25″ floppy disks on hand,  yet after indexing the entire hard disk, Copy II Plus told us that we would need 24 floppies in all. This was not initially any cause for alarm, as we realized that we would recover the floppies immediately as Copy II Plus produced them, and that once a floppy was imaged, it could be re-used. As Matthew Kirschenbaum so eloquently put it at the next day’s symposium, we were operating a veritable “bucket brigade” between the Apple //e and our floppy recovery station, bits sloshing over the side as we rescued Phil’s artwork from certain oblivion.

Copy II Plus backup on 5.25" Floppy disk

The post-it note pictured above was an indicator that this particular floppy was a backup of Slot 7, Volume 1, disk 1 of 24. Slot 7 was the Sider, and on the Sider there were in fact two volumes. After seven disks, we attempted to re-use the first one we created, only to come to the terrible discovery that Copy II Plus refuses to overwrite what it detects as a backup disk. We ran back to our Kryoflux, as I recalled that one can use the device not only for recovery, but for writing back to disk. Unfortunately DOS 3.3 is not yet one of the supported writing formats. We wrote an Amiga format disk image back to disk, hoping that the Apple //e would see that it was not in DOS 3.3 format, and attempt to reformat the floppy before backing up to it. Unfortunately it simply refused to acknowledge this disk, rather than offer to reformat. It was then, that New Museum director of IT, Doron Ben-Avraham posed the idea of erasing the floppy disks with a magnet. I was completely skeptical, figuring that if the //e refused to reformat an Amiga disk, why would it react differently to a disk whose geometry had been obliterated? Doron managed to find a tiny magnet in the office.

Pure ingenuity

Amazingly… it worked! Doron assumed floppy erasing duties, and our bucket brigade was back in action, writing to floppy with the //e, recovering, and then erasing and reusing the disk. We managed to back up the first volume of the disk. The day had come to an end, and we needed to call it quits, but there was a whole other volume remaining to be backed up. This was on a Friday, and the soonest we would be able to pick up where we left off would be the following Sunday. Not wanting to risk spinning down the Sider hard disk drive, we left the whole system set up and running for two days in the resource center.

Restoration in progress

On Sunday, Phil and I took a close look at the contents of the two volumes, and found that Volume 2 was simply a direct mirror of Volume 1. Our backup was complete! We took this as an opportunity to run and document Phil’s work. Walter Forsberg had the brilliant idea to do direct video capture from the //e, so we moved to one of the video preservation stations, and proceeded to do just that. I asked Phil questions about the fidelity of the image quality we were seeing.

Phil Sanders demoing his work during live capture

It is rather incredible that likely none of this recovery process would have been available or accessible to Phil without a resource like XFR STN. Nearly a decade of Phil’s born-digital artwork now lives on the Internet Archive in the form of Floppy disk images, hard drive dumps, and  an hour of 10-bit uncompressed direct video capture. It sets the stage for further work in restoring an operational emulation of Phil’s //e. This really drives home what was the core and fundamental principle of the XFR STN project. This is a level of care and preservation commonly only available to artists that have already been written into the cannon. It is not simply rehetoric or an overstatement to say that this project did indeed turn the capitalist meritocracy of institutional preservation on its head. It is incredibly rewarding to know that we set the stage for allowing some lesser known artists to have the opportunity to be discovered decades from now. More important and more rewarding than seeing these all of these fundamental ideals in action though, was getting the opportunity to witness Phil and his wife see his work for the first time in decades, and to share it with their daughter for the first time ever. Thanks Phil.

Phil Sanders and family

Authenticity is Relative


For those interested in video game preservation, I highly recommend giving the following article a careful read: “In Search of Scanlines: The Best CRT Monitor for Retro Gaming.” Considering the wave of acquisitions at MoMA my colleagues and I have been spending a whole lot of time thinking about how display hardware shapes the visual experience of a game, and in each case, what should be considered the ideal rendering by which to judge any sort of emulation. Needless to say, I’ve been chatting a bit with Nick Montfort.

I found the article interesting not for its discussion of CRTs, but because Fudoh’s approach is a bit different than most I have encountered. He doesn’t care about CRTs out of concern for historically accurate hardware, and thus image quality. Rather, his desire is to achieve the best possible image. He is obsessed with the signal to the extent that he will modify a console that originally output composite, so that it offers RGB. The quality he is achieving is one that game designers and players would never have seen when designing and playing these games. This is the antithesis of the CRT emulation camp, whose concern is accurate reproduction of an image quality that bears fidelity to consumer grade CRTs of a given game’s period.

Fudoh’s work is impressive to be sure, but is he barking up the wrong tree? On the other hand, does CRT emulation preserve the wrong thing? Is there a hybrid approach that combines these two apparently opposing schools of thought? What do you think?

How to backup your Tumblr

By now you’ve likely heard that Yahoo! intends on acquiring Tumblr. While, the acquisition is not (at time of writing) confirmed and even if it goes down, it does not mean death to Tumblr, you are likely wondering how to take your stuff and run. Today. Here’s how.

Bad news: there is no official Tumblr backup tool. Some people have cooked up well intentioned tools, but none of them preserve the look and feel of your Tumblr. For those that have spent countless hours perfecting their own theme, this will simply not do. If you want a backup of your Tumblr that looks right, and that you can easily upload to your own server I’d recommend using HTTrack.

Head over to the official HTTrack site and download the distribution you need. If you’re on a Mac and use Homebrew, you can just do that instead. The Windows build provides a GUI, but the others do not. The command line interface is cross-platform however. If you’ve installed correctly, you can copy and paste the below, replacing the URL with the URL of your Tumblr, and let er rip.

httrack -w -n -c8 -N0 -s0 -q -v -I0 -p3

This can take quite a while depending on the the size of your Tumblr. If you use infinite scroll, this should work regardless, so long as you’ve maintained the “next” and “previous” pagination hyperlink markup in your template. If you haven’t (this would certainly be an edge case, but I’ve seen it with some artist’s themes), I’m sorry, but your site just isn’t crawlable. When all is said and done you’ll be left with flat HTML files, css, js, images, videos, audio, etc with all hyperlinks to crawled content modified to relative paths – meaning it is a backup you can toss on any server. If you’re curious about the options I’ve used in the line above, here are their full descriptions from the documentation. Enjoy.

w *mirror web sites (--mirror)
n get non-html files 'near' an html file (ex: an image located outside) (--near)
cN number of multiple connections (*c8) (--sockets[=N])
NN structure type (0 *original structure, 1+: see below) (--structure[=N])
  or user defined structure (-N "%h%p/%n%q.%t")
q no questions - quiet mode (--quiet)
%v display on screen filenames downloaded (in realtime) (--display)
I *make an index (I0 don't make) (--index)
pN priority mode: (* p3) (--priority[=N])
  0 just scan, don't save anything (for checking links)
  1 save only html files
  2 save only non html files
 *3 save all files
  7 get html files before, then treat other files

Take a Picture, It’ll Last Longer…

Last week, early web folklorist, OG net artist, and friend of Rhizome, Olia Lialina wrote a post that dug at for how severely their image processing system had mangled an image of her piece My Boyfriend Came Back From The War. Despite (for the lulz) comparing the problem to the recent destruction of a 19th century fresco, Olia is correct: the image of her work, as processed by’s system does look pretty bad. This is just one manifestation of an underlying problem I have been pondering lately: how can documentation of works that are screen-based, and inherently low-resolution, exist within systems that are designed specifically for high-resolution documentation of works that exist in the physical world?

For a while now Rhizome has been sharing records and images of works from the ArtBase with a hand full of carefully chosen fine art image databases. It’s a nice thing to see lesser known computer based works alongside more established artists and media, and we like the idea of exposing our collection and the history of art engaged with technology to a broader audience. Every time we begin one of these projects we are faced with the same conundrum: image specifications. Image collections such as ArtStor,, and Google Art Project all serve high resolution images of paintings, prints, photographs, and objects. The user experience of these platforms is engineered to best represent documentation of an object that exists in the physical world. However, nearly all artworks in the ArtBase are screen based – be they software, web sites, video, or animated gifs. This means that these works are inherently low-resolution. With compter or screen based works, there is often no finer grain of visual detail than native screen resolution. In documenting these works, we are not faced with the bottomless pursuit of capturing (or exceeding) human perception, as with the documentation of physical works of art; the pixel is the lowest level of detail. Furthermore, when endeavoring to capture images of authentic renderings (i.e. period specific web browser and operating system), the dimensions of the image are (or at least, in some cases should be) limited to the native resolution of displays of the time when the work was created.


Detail of My Boyfriend Came Back From The War


For example, the image of My Boyfriend Came Back From The War we shared with (seen here) is a 746 x 436 px lossless PNG screenshot of the website, as rendered by Netscape Navigator 3.0 (1996) running in Mac OS 9.0 (1999) emulated by SheepShaver. Although though the image was cropped to remove the operating system’s graphical user interface, and the outer frame of the web browser, it still possesses inherent historic accuracy and artifactual and evidential quality. The dimensions of the image could have been slightly smaller or slightly larger, but they were defined by what was a comfortable browser window size within the emulation, which was sized to a resolution (800 x 600) appropriate to typical hardware of the time. As well, the images embedded in Olia’s HTML have variable percentage based widths, and adjust to the size of the browser window. This reinforces the importance of the size of the rendering, as modern browsers use a blurry interpolation algorithm, as opposed to the browsers at the time of the work’s creation. The delicate and sensitive nature of screen capture images is significant. Any scaling or heavy handed compression can easily destroy the subtle artifactual qualities that the image was carefully designed to capture. With screen graphics, especially text and images from the early web, the difference of a few pixels can completely alter the feeling of a work.


Detail of My Boyfriend Came Back From The War, as processed by


It is unsurprising that’s system messed with the image so severely, as it is a system designed for down-scaling incredibly high resolution images, not upscaling low-res images. Here’s a few thoughts on how the system could potentially handle intentionally low-res images of born-digital materials:

1) Do nothing: do not scale the images, use lossy compression with care.
2) Improve the image processing methodology to be adaptive to images that are intentionally low-res. I am guessing that when high-resolution images are uploaded to the cms, they derive a set of progressively smaller images that can be fed to the image-zooming viewer. A reverse/mirror image of this process could be developed, where instead of scaling down, the images are scaled up using nearest neighbor interpolation at each level. In theory the original image size would be the smallest, and zooming in the image viewer would appear to provide a strict enlargement of the original pixels.

Speaking realistically, is a unique entity among the image repositories we are talking about. They have an in-house team of talented and curious engineers constantly working on improving the platform, which of course is still very new. They are thinking about how they can attack this problem this as I type. I seriously doubt if larger, older platforms with less resources, or a different engineering culture would be able to invest in developing new image processing solutions for what is a very small subset of their content. In light of this, it behooves archivists and conservators of computer based works to consider how we can use documentation strategies that gel with these existing systems. Furthermore, although screenshots are the reigning paradigm in the documentation of computer based works, do they really do the work justice in these contexts? If not – why should platforms invest in accommodating them? A strategy used by SFMOMA when contributing documentation of Miranda July’s web based Learning to Love You More, to Google Art Project, was to tile many screenshots to compose one high-res image.



While on the one hand, this strategy solves the problem of resolution, the result just doesn’t feel right. It amplifies what I feel to be the problem with screenshot based documentation: it denies the work any broader context. While lossless screenshots of computer based works are immensely valuable for preservation purposes, this approach completely neglects the physical aspect of the works. Software is not experienced in a disembodied graphical space – we interact with it though machines. If one of the major driving forces behind sharing with these image repositories is education, it seems logical to employ a documentation strategy that is simple and effective in visually communicating the context of these works, not simply a strategy that meets the image specifications. We are beginning to employ a documentation strategy at Rhizome that will touch all of these bases. It’s quite simple really: take a picture.


Rafaël Rozendaal’s falling falling .com


The above two photos taken (the latter taken with my iPhone) are not suggested to be an example of quality documentation – I just happened to have these on hand. They are, however, exemplary of how instantly readable a still image of a web based work of art is, when it depicts the work from the perspective of the viewer, not the computer. Such documentation does not replace the role of lossless screenshots of authentic renderings, but in the context we are speaking of – image repositories that are designed for handling high resolution content, and which have a diverse audience – they are arguably far more evocative of the work, more educational in terms of historic context and technology, and finally, these images are inherently more durable in terms of image processing and compression. Of course there are significant setup costs involved in producing this type of documentation: camera, lighting, and period specific hardware. In some cases there are software shortcuts that can be taken if hardware isn’t your thing. For example, document the work displayed on a CRT display of the proper vintage, but rather than going to the trouble of setting up a vintage Mac or PC, connect it to a modern computer running a fullscreen emulation. This approach also requires less maintenance – a library of virtual machines is far more stable than a collection of vintage computers.



It will take some time as we go about collecting the hardware, purchasing a camera and lighting, and developing a workflow (computer displays, especially CRTs are a tricky thing to photograph), but Rhizome should be able to start producing documentation under this new rubric (high resolution, photographic, historically accurate hardware [not just software]) in the very near future. Until then, perhaps we’ll see something from that does a better job of handling sensitive pixel-perfect historic screenshots.


Storify Is Bad For Preservation

tl;dr: Storify is not a Twitter archiving tool, but it easily could be.

After the great conversation at #ArtsTech on 6/13, I collected tweets from the evening [see them here] using Storify. It was the first time I’d ever used it. My takeaway echoes most people who have used Storify: fantastic.

However: there is one major gap that Storify isn’t addressing. One that would be trivial for them to implement, but would have a major impact on the landscape of personal digital preservation tools. To summarize the issue: Storify is a black-box service. When they inevitably cease to exist, so too will all of the stories and narratives that people have documented.

First things first. If you’ve never used or seen Storify, it is a free service that lets you search for, and arrange tweets into a linear narrative. It’s good for documenting small-scale things like a conversation, and large-scale things like conference hashtags. It has been well documented that Twitter’s search index is very shallow chronologically speaking, hence the need for such tools. There is hardly a shortage of Twitter archiving tools. From ifttt recipes, to ThinkUp, and various homebrew solutions – there are options aplenty.

Where these all fall short (and where Storify excels) is in facilitating hand-selection, and producing a decent look and feel that is human readable, and in the style of a twitter conversation. Storify makes it easy to hand-pick tweets, or start broad with an entire hashtag and edit down from there. The end result maintains the look of a content stream, including avatars, and a “pretified” timestamp (i.e. “3 days ago”). You can retweet or reply to tweets directly from a finished Storify, which facilitates continued conversation, rather than rendering a static archive.

The great thing about all of the other Twitter archiving tools I mentioned, is that they provide you with a local copy of the data. When you use these tools, you are essentially creating a backup. When the makers of those tools close up shop, you will still have your archive of tweets in a relatively platform agnostic format. Storify does not let you locally save and archive any of the content you create with it. They do provide an “export” feature, which embeds your Storify on a site powered by WordPress, Drupal, Tumblr, (and a few other platforms). While at first glance this looks great, it is entirely misleading.

Taking a look at what Storify actually posts to your site, every last bit of it (from js, to images, and css) is hotlinked. Meaning: when Storify goes down, so will the content you’ve “exported.” To boot – they use infinite scroll javascript, so web archiving with a web crawler is pretty much out of the question. Of course there are simple ways to mitigate this: print a PDF of the page, do a “save as webpage”, etc. This seems besides the point though. The point is that Storify has built what is essentially the most “human” tool for archiving and presenting interactions on Twitter. If they were to provide a true “export” feature that allowed users to locally backup their Storify content, they would be in the position of being one of the most comprehensive personal digital preservation tools for Twitter.


ArtsTech: Digital Conservation

An Incomplete Introduction to Digital Preservation

Here are slides from a presentation I gave last night, providing an introduction to some basic digital preservation concepts. I focused on the Trustworthy Repositories Audit & Certification criteria, Archivematica as a manifestation of the OAIS model, some historic examples, and recent projects in web based emulation of obsolete systems. Nothing new here for practitioners, but ok intro for the curious.
PDF warning » download here