Storify Is Bad For Preservation

tl;dr: Storify is not a Twitter archiving tool, but it easily could be.

After the great conversation at #ArtsTech on 6/13, I collected tweets from the evening [see them here] using Storify. It was the first time I’d ever used it. My takeaway echoes most people who have used Storify: fantastic.

However: there is one major gap that Storify isn’t addressing. One that would be trivial for them to implement, but would have a major impact on the landscape of personal digital preservation tools. To summarize the issue: Storify is a black-box service. When they inevitably cease to exist, so too will all of the stories and narratives that people have documented.

First things first. If you’ve never used or seen Storify, it is a free service that lets you search for, and arrange tweets into a linear narrative. It’s good for documenting small-scale things like a conversation, and large-scale things like conference hashtags. It has been well documented that Twitter’s search index is very shallow chronologically speaking, hence the need for such tools. There is hardly a shortage of Twitter archiving tools. From ifttt recipes, to ThinkUp, and various homebrew solutions – there are options aplenty.

Where these all fall short (and where Storify excels) is in facilitating hand-selection, and producing a decent look and feel that is human readable, and in the style of a twitter conversation. Storify makes it easy to hand-pick tweets, or start broad with an entire hashtag and edit down from there. The end result maintains the look of a content stream, including avatars, and a “pretified” timestamp (i.e. “3 days ago”). You can retweet or reply to tweets directly from a finished Storify, which facilitates continued conversation, rather than rendering a static archive.

The great thing about all of the other Twitter archiving tools I mentioned, is that they provide you with a local copy of the data. When you use these tools, you are essentially creating a backup. When the makers of those tools close up shop, you will still have your archive of tweets in a relatively platform agnostic format. Storify does not let you locally save and archive any of the content you create with it. They do provide an “export” feature, which embeds your Storify on a site powered by WordPress, Drupal, Tumblr, (and a few other platforms). While at first glance this looks great, it is entirely misleading.

Taking a look at what Storify actually posts to your site, every last bit of it (from js, to images, and css) is hotlinked. Meaning: when Storify goes down, so will the content you’ve “exported.” To boot – they use infinite scroll javascript, so web archiving with a web crawler is pretty much out of the question. Of course there are simple ways to mitigate this: print a PDF of the page, do a “save as webpage”, etc. This seems besides the point though. The point is that Storify has built what is essentially the most “human” tool for archiving and presenting interactions on Twitter. If they were to provide a true “export” feature that allowed users to locally backup their Storify content, they would be in the position of being one of the most comprehensive personal digital preservation tools for Twitter.


ArtsTech: Digital Conservation

Keyboard Archeology

I came across an interesting question on twitter a few days ago that sent me spiraling into a brief bout of research. Via Matthew Kirschenbaum, Matt Schneider posed the question of when the greater than (>) and less than (<) symbols first appeared on keyboards. I managed to come up with two contenders.



Above is the 1955 Olympia SM3 De Luxe (with science and math keys). It seems that typewriters beat computers to the <pun>punch</pun>, as there were some early keyboard layouts that included mathematical symbols at a time when computers were programed in assembly languages that were coded/punched on keyboards whose layouts did not include scientific or mathematical symbols (or were programmed on colossal Univac keyboards). Looking at early-mid 20th century IBM card & key punches, it can be observed that while keypunches with alphanumeric “repertoire” emerged in 1933 with the IBM Type 032 Printing Punch, this keyboard layout was strictly alphanumeric.



Above is the keyboard layout of the IBM 026 (1949). The earliest example I managed to dig up, of a “computer” keyboard with the greater than (>) and less than (<) symbols was the IBM 029 Card Punch, from 1964.



In the category of “close, but no cigar, but nonetheless interesting” we have the Smith-Corona Classic 12 (Greek), which included ten keys of Greek symbols (but none useful for equations).



wget cheat-sheet

Hello Internet, I made you something. There seems to be a lack of a basic wget cheat-sheet. Today I got tired of referring back to the usual sources, which tend to include all possible flags, most of which I never use. Here’s a .pdf you can print and hang at your desk.

-e robots=off





-l depth
 (5 is maximum)

-o logfile

-i file




 (apends .html)

-U agent-string

-A acclist
--accept acclist
 (comma-separated extensions)

-R rejlist
--reject rejlist
(comma-separated extensions)

-D domain-list
(domains to follow)

--exclude-domains domain-list


(follow only relative links)


Interview on the LOC’s Digital Preservation Blog

Trevor Owens of NDIIP and the Library of Congress recently interviewed me for The Signal about Rhizome & the ArtBase. Here’s a bit where he asks what exactly my title (digital conservator) means:

>>  full interview here

Trevor: I don’t think there are many people out there with the title of digital conservator. Could you tell us a bit about how you define this role? To what extent do you think this role is similar and different to analog art conservation? Similarly, to what extent is this work similar or different to roles like digital archivist or digital curator?

Ben: I drew the distinction with my title for two reasons: 1) I am at the service of an institution that lives within a museum, and 2) the digital objects I am cataloging and preserving access to are not “records” by the archival definition. They are artifacts – and as such require a different kind of care.

I am responsible for the stewardship of intellectual entities that are often inseparable from their digital carriers, due to the artist’s exploitation of the inherent characteristics of the material. It calls for a high degree of regard for the creator’s intent, and a thorough understanding of the subtleties of the materials. A digital archivist tasked with preserving the records of an office probably isn’t going to wonder if the use of Comic Sans in the accountant’s email signature has artifactual significance.

Of course the lines are much blurrier than that and there plenty of examples of people with the title “digital archivist” or “digital curator” doing significant work on preserving the subtle artifactual quality of digital materials (not to mention the incredible people who are contributing to significant projects in their spare time). This is a new phenomenon though, where you have individuals with the title “archivist” or “curator” devoting a level of care to documents, that with paper materials would be the work of a document conservator.

While I would hesitate to compare the two, I think that the conservation of digital artifacts, and the conservation of objects, documents and the like, at their essence hold many similarities. They both require an empathy for the artist, expertise with the medium, and understanding of the proper environment. Sometimes I go to the Greek and Roman galleries at the Met, and daydream about what net art from the 90’s will look like hundreds of years from now.

An Incomplete Introduction to Digital Preservation

Here are slides from a presentation I gave last night, providing an introduction to some basic digital preservation concepts. I focused on the Trustworthy Repositories Audit & Certification criteria, Archivematica as a manifestation of the OAIS model, some historic examples, and recent projects in web based emulation of obsolete systems. Nothing new here for practitioners, but ok intro for the curious.
PDF warning » download here

On My Own Ambivalence

I have a rocky relationship with the practice of maintaining a personal blog.
There are plenty of people I admire in academia, the arts, and tech, who blog as a form of scholarly communication, yet I have been hesitant to throw my hat into the ring. Fermenting one’s ideas in private is important, and I don’t fancy a public archive of my own evolving naïveté. Yet I envy the masterful and careful bloggers in our midst who have amassed deep compendiums over the years. As someone who spends a good portion of his day pontificating on the web’s history, I hold the value of a public, long-term, personal knowledge repository in high regard. On Saturday in a post celebrating the 10th birthday of, Andy Baio shared the three simple ground rules that he laid out when he founded the blog:

1. No journaling, unless it’s relevant to people who don’t know me. Example: “Today I went down to 7-11 and bought a Slurpee. Strawberry is my favorite flavor!”

2. No tired memes, unless I have something to add. Example: “Take this quiz and find out which Smurf you are! I’m Jokey!”

3. Be original.

In the spirit of these three rules, and with the intent of having a place to share my research more frequently and freely, I am newly devoted to slowly cultivating this humble web log, with the hope of shaping it into a repository of careful, if somewhat infrequent dialog.

Jonathan Swift on Information Diets & Skimming

“The most accomplished way of using books at present is twofold: either first to serve them as some men do lords, learn their titles exactly, and then brag of their acquaintance; or, secondly, which is indeed the choicer, the profounder, and politer method, to get a thorough insight into the index by which the whole book is governed and turned, like fishes by the tail. For to enter the palace of learning at the great gate requires an expense of time and forms, therefore men of much haste and little ceremony are content to get in by the back-door.”