How to backup your Tumblr

May 19 2013

By now you’ve likely heard that Yahoo! intends on acquiring Tumblr. While, the acquisition is not (at time of writing) confirmed and even if it goes down, it does not mean death to Tumblr, you are likely wondering how to take your stuff and run. Today. Here’s how.

Bad news: there is no official Tumblr backup tool. Some people have cooked up well intentioned tools, but none of them preserve the look and feel of your Tumblr. For those that have spent countless hours perfecting their own theme, this will simply not do. If you want a backup of your Tumblr that looks right, and that you can easily upload to your own server I’d recommend using HTTrack.

Head over to the official HTTrack site and download the distribution you need. If you’re on a Mac and use Homebrew, you can just do that instead. The Windows build provides a GUI, but the others do not. The command line interface is cross-platform however. If you’ve installed correctly, you can copy and paste the below, replacing the URL with the URL of your Tumblr, and let er rip.

httrack -w -n -c8 -N0 -s0 -q -v -I0 -p3 http://example.tumblr.com

This can take quite a while depending on the the size of your Tumblr. If you use infinite scroll, this should work regardless, so long as you’ve maintained the “next” and “previous” pagination hyperlink markup in your template. If you haven’t (this would certainly be an edge case, but I’ve seen it with some artist’s themes), I’m sorry, but your site just isn’t crawlable. When all is said and done you’ll be left with flat HTML files, css, js, images, videos, audio, etc with all hyperlinks to crawled content modified to relative paths – meaning it is a backup you can toss on any server. If you’re curious about the options I’ve used in the line above, here are their full descriptions from the documentation. Enjoy.

w *mirror web sites (--mirror)
n get non-html files 'near' an html file (ex: an image located outside) (--near)
cN number of multiple connections (*c8) (--sockets[=N])
NN structure type (0 *original structure, 1+: see below) (--structure[=N])
  or user defined structure (-N "%h%p/%n%q.%t")
q no questions - quiet mode (--quiet)
%v display on screen filenames downloaded (in realtime) (--display)
I *make an index (I0 don't make) (--index)
pN priority mode: (* p3) (--priority[=N])
  0 just scan, don't save anything (for checking links)
  1 save only html files
  2 save only non html files
 *3 save all files
  7 get html files before, then treat other files

No responses yet

An overdue update

Feb 07 2013

Dear internet: over the course of the past week, a few people have mentioned that they have heard I was leaving Rhizome. This is not the case. There have, however, been some some wonderful changes in my professional life that I have not quite shared publicly. I’d like to update you, dear reader, on the particulars of these changes – lest misinformation befall you.

As of two weeks ago, I am officially splitting my time between Rhizome and the conservation department of the Museum of Modern Art. I have joined the fantastic team at MoMA to lead on the development of the Digital Repository for Museum Collections – a suite of tools and services that will together form an infrastructure for the effective preservation and conservation management of born-digital materials in the museum’s permanent collection. It is an incredibly exciting project, and I am glad to have the opportunity to help shape its future, and to work with the brilliant team at MoMA.

This is a half-time appointment – I have not left Rhizome. I’m fortunate enough to have colleagues that are open to a little institutional polyamory. I am grateful for this, as things have never been more exciting in Rhizome’s conservation department. We have been hard at work on restoring The Thing BBS – one of the earliest online communities of contemporary artists, and I am pleased to say that a small portion of what we’ve dug up will be on display as part of the New Museum’s next exhibition, “1993: Experimental Jet Set, Trash and No Star.” The exhibition is already partially on view, but opens fully next Thursday, Feb 14th. In conjunction with the exhibition, we are hosting an event in March titled “The Internet Before the Web: Preserving Early Networked Cultures.” I will be in conversation with Wolfgang Staehle (artist and founder of The Thing BBS), and none other than Jason Scott. Needless to say, you might want to reserve your tickets asap.

So – that’s it. Lots of new things… more to come.

No responses yet

A Series of Things

Feb 05 2013

Dear internet: I am near the end of completing an MFA. Surprise! I would like to invite you to the opening of my thesis exhibition, next Monday, Feb 11th.

Opening Reception: Monday, February 11th, 5 – 8 PM
On View: February 12th – February 15th, 2013
Gallery Hours: Tuesday – Friday 10 AM – 8 PM
Friday 10 AM – 5 PM

Pratt Institute Digital Arts Gallery
Myrtle Hall, 4th Floor
536 Myrtle Avenue
Brooklyn, NY 11205

No responses yet

Photos from “⌘⇪S Preserving Digital Artifacts”, at NYPL Labs

Oct 16 2012

NYPL Labs signageNYPL trustees roomBen Vershbow

Ben Vershbow, Manager of NYPL Labs introducing the talk

Ben Fino-Radin

Yours truly, photo courtesy Neal Stimler

Post talk diner, watching the debate

Post talk dinner, watching the 2nd presidential debate

No responses yet

Analyzing Browser History

Sep 02 2012

I recently participated in First Five – a Tumblr where guests list the first five websites they visit daily (my five here). Similar to recent contributor Luke Robert Mason, the concept seems foreign to me. As a poster child for consumption via aggregation, apps, and streams, I do not pull up my bookmarks in the morning as though unfolding the daily newspaper. Rather than opting to compensate for this by providing (as many contributors seem to) my favorite five, I decided to provide a strict, data driven answer to the question – of a sample of the first five URLs I type into my browser every morning, which are the most common? Although my content consumption is divided heavily between apps on my mobile device, desktop and browser based apps on my laptop, I chose, for time and feasibility’s sake to focus on my browser history. I hypothesized that the data would show a few major content sources, mostly browser based channels (such as Twitter and Prismatic), followed by a long tail of heterogeneous content they directed me to.

I use Chrome, so the clear route was to analyze the SQLite database where Chrome stores it’s history. On a Mac, this is located at ~/Library/Application Support/Google/Chrome/Default/History  | Had I prior experience analyzing SQLite with Python, I could have written something that worked with this file directly. This not being the case, I exported a CSV of results from the following query:

 

SELECT datetime(((visits.visit_time/1000000)-11644473600), "unixepoch"), urls.url,
FROM urls, visits WHERE urls.id = visits.url;

 

I cleared my history near the end of May, so this yielded about three months (a pithy 7.9MB) of data in the following format:

 

"2012-09-01 20:03:15","http://example.com/therest/ofthe/url"

 

I wrote a few lines of Python that look at each row of the CSV and add each day’s first five unique hostnames to a dictionary. At the end, each hostname is counted, and the results are printed to stdout.
 

 
The resultant data needed to be tidied up a bit – there were analogous hostnames such as drive.google.com and docs.google.com, which could be consolidated. As well, I use Twitter via a desktop client, not Twitter.com. T.co, the hostname of Twitter’s url shortener scored very highly, but rather than trace these back to their original URLs, I opted to count these as Twitter.com visits. Interestingly, Netflix ranked highly with a 15% share. I don’t watch Netflix in the morning, rather this registered due to the fact that Netflix is often the last website in my browser at night. In the morning, when waking my computer, the open tab refreshes thus gaining a post 6am, pre 11am entry in the database. I chose to remove this from the results.

The data seems to at least somewhat reflect my hypothesis: Twitter, email (mainly listservs and News.me), and Prismatic all being aggregators, followed by a long tail of diverse content sources. A clear next step would be to analyze the from_visit field of visits in the long tail, to see if indeed the referring visits trace back to the top aggregators. All in all, the exercise does seem to illustrate stream-based browsing habits, and the idea that more and more content is fluid – less and less tied to specific websites as vessels.

No responses yet

Take a Picture, It’ll Last Longer…

Aug 28 2012

Last week, early web folklorist, OG net artist, and friend of Rhizome, Olia Lialina wrote a post that dug at Art.sy for how severely their image processing system had mangled an image of her piece My Boyfriend Came Back From The War. Despite (for the lulz) comparing the problem to the recent destruction of a 19th century fresco, Olia is correct: the image of her work, as processed by Art.sy’s system does look pretty bad. This is just one manifestation of an underlying problem I have been pondering lately: how can documentation of works that are screen-based, and inherently low-resolution, exist within systems that are designed specifically for high-resolution documentation of works that exist in the physical world?

For a while now Rhizome has been sharing records and images of works from the ArtBase with a hand full of carefully chosen fine art image databases. It’s a nice thing to see lesser known computer based works alongside more established artists and media, and we like the idea of exposing our collection and the history of art engaged with technology to a broader audience. Every time we begin one of these projects we are faced with the same conundrum: image specifications. Image collections such as ArtStor, Art.sy, and Google Art Project all serve high resolution images of paintings, prints, photographs, and objects. The user experience of these platforms is engineered to best represent documentation of an object that exists in the physical world. However, nearly all artworks in the ArtBase are screen based – be they software, web sites, video, or animated gifs. This means that these works are inherently low-resolution. With compter or screen based works, there is often no finer grain of visual detail than native screen resolution. In documenting these works, we are not faced with the bottomless pursuit of capturing (or exceeding) human perception, as with the documentation of physical works of art; the pixel is the lowest level of detail. Furthermore, when endeavoring to capture images of authentic renderings (i.e. period specific web browser and operating system), the dimensions of the image are (or at least, in some cases should be) limited to the native resolution of displays of the time when the work was created.

 

Detail of My Boyfriend Came Back From The War

 

For example, the image of My Boyfriend Came Back From The War we shared with Art.sy (seen here) is a 746 x 436 px lossless PNG screenshot of the website, as rendered by Netscape Navigator 3.0 (1996) running in Mac OS 9.0 (1999) emulated by SheepShaver. Although though the image was cropped to remove the operating system’s graphical user interface, and the outer frame of the web browser, it still possesses inherent historic accuracy and artifactual and evidential quality. The dimensions of the image could have been slightly smaller or slightly larger, but they were defined by what was a comfortable browser window size within the emulation, which was sized to a resolution (800 x 600) appropriate to typical hardware of the time. As well, the images embedded in Olia’s HTML have variable percentage based widths, and adjust to the size of the browser window. This reinforces the importance of the size of the rendering, as modern browsers use a blurry interpolation algorithm, as opposed to the browsers at the time of the work’s creation. The delicate and sensitive nature of screen capture images is significant. Any scaling or heavy handed compression can easily destroy the subtle artifactual qualities that the image was carefully designed to capture. With screen graphics, especially text and images from the early web, the difference of a few pixels can completely alter the feeling of a work.

 

Detail of My Boyfriend Came Back From The War, as processed by Art.sy

 

It is unsurprising that Art.sy’s system messed with the image so severely, as it is a system designed for down-scaling incredibly high resolution images, not upscaling low-res images. Here’s a few thoughts on how the system could potentially handle intentionally low-res images of born-digital materials:

1) Do nothing: do not scale the images, use lossy compression with care.
2) Improve the image processing methodology to be adaptive to images that are intentionally low-res. I am guessing that when high-resolution images are uploaded to the Art.sy cms, they derive a set of progressively smaller images that can be fed to the image-zooming viewer. A reverse/mirror image of this process could be developed, where instead of scaling down, the images are scaled up using nearest neighbor interpolation at each level. In theory the original image size would be the smallest, and zooming in the image viewer would appear to provide a strict enlargement of the original pixels.

Speaking realistically, Art.sy is a unique entity among the image repositories we are talking about. They have an in-house team of talented and curious engineers constantly working on improving the platform, which of course is still very new. They are thinking about how they can attack this problem this as I type. I seriously doubt if larger, older platforms with less resources, or a different engineering culture would be able to invest in developing new image processing solutions for what is a very small subset of their content. In light of this, it behooves archivists and conservators of computer based works to consider how we can use documentation strategies that gel with these existing systems. Furthermore, although screenshots are the reigning paradigm in the documentation of computer based works, do they really do the work justice in these contexts? If not – why should platforms invest in accommodating them? A strategy used by SFMOMA when contributing documentation of Miranda July’s web based Learning to Love You More, to Google Art Project, was to tile many screenshots to compose one high-res image.

 

 

While on the one hand, this strategy solves the problem of resolution, the result just doesn’t feel right. It amplifies what I feel to be the problem with screenshot based documentation: it denies the work any broader context. While lossless screenshots of computer based works are immensely valuable for preservation purposes, this approach completely neglects the physical aspect of the works. Software is not experienced in a disembodied graphical space – we interact with it though machines. If one of the major driving forces behind sharing with these image repositories is education, it seems logical to employ a documentation strategy that is simple and effective in visually communicating the context of these works, not simply a strategy that meets the image specifications. We are beginning to employ a documentation strategy at Rhizome that will touch all of these bases. It’s quite simple really: take a picture.

 

Rafaël Rozendaal’s falling falling .com

 

The above two photos taken (the latter taken with my iPhone) are not suggested to be an example of quality documentation – I just happened to have these on hand. They are, however, exemplary of how instantly readable a still image of a web based work of art is, when it depicts the work from the perspective of the viewer, not the computer. Such documentation does not replace the role of lossless screenshots of authentic renderings, but in the context we are speaking of – image repositories that are designed for handling high resolution content, and which have a diverse audience – they are arguably far more evocative of the work, more educational in terms of historic context and technology, and finally, these images are inherently more durable in terms of image processing and compression. Of course there are significant setup costs involved in producing this type of documentation: camera, lighting, and period specific hardware. In some cases there are software shortcuts that can be taken if hardware isn’t your thing. For example, document the work displayed on a CRT display of the proper vintage, but rather than going to the trouble of setting up a vintage Mac or PC, connect it to a modern computer running a fullscreen emulation. This approach also requires less maintenance – a library of virtual machines is far more stable than a collection of vintage computers.

 

 

It will take some time as we go about collecting the hardware, purchasing a camera and lighting, and developing a workflow (computer displays, especially CRTs are a tricky thing to photograph), but Rhizome should be able to start producing documentation under this new rubric (high resolution, photographic, historically accurate hardware [not just software]) in the very near future. Until then, perhaps we’ll see something from Art.sy that does a better job of handling sensitive pixel-perfect historic screenshots.

 

One response so far

Media Archeology: The VODER

Aug 08 2012

Voder demonstration at the 1939 World's Fair

I wrote a piece for Rhizome about an object that is currently on display at the New Museum for the Ghosts in the Machine exhibition: Homer Dudley’s VODER. It’s a really fantastic piece of history that arguably ushered in the modern era of speech synthesis, and influenced culture in some very significant ways. Here’s the full article, and here for your enjoyment is a six minute demonstration of the VODER.

No responses yet

Storify Is Bad For Preservation

Jun 16 2012

tl;dr: Storify is not a Twitter archiving tool, but it easily could be.

After the great conversation at #ArtsTech on 6/13, I collected tweets from the evening [see them here] using Storify. It was the first time I’d ever used it. My takeaway echoes most people who have used Storify: fantastic.

However: there is one major gap that Storify isn’t addressing. One that would be trivial for them to implement, but would have a major impact on the landscape of personal digital preservation tools. To summarize the issue: Storify is a black-box service. When they inevitably cease to exist, so too will all of the stories and narratives that people have documented.

First things first. If you’ve never used or seen Storify, it is a free service that lets you search for, and arrange tweets into a linear narrative. It’s good for documenting small-scale things like a conversation, and large-scale things like conference hashtags. It has been well documented that Twitter’s search index is very shallow chronologically speaking, hence the need for such tools. There is hardly a shortage of Twitter archiving tools. From ifttt recipes, to ThinkUp, and various homebrew solutions – there are options aplenty.

Where these all fall short (and where Storify excels) is in facilitating hand-selection, and producing a decent look and feel that is human readable, and in the style of a twitter conversation. Storify makes it easy to hand-pick tweets, or start broad with an entire hashtag and edit down from there. The end result maintains the look of a content stream, including avatars, and a “pretified” timestamp (i.e. “3 days ago”). You can retweet or reply to tweets directly from a finished Storify, which facilitates continued conversation, rather than rendering a static archive.

The great thing about all of the other Twitter archiving tools I mentioned, is that they provide you with a local copy of the data. When you use these tools, you are essentially creating a backup. When the makers of those tools close up shop, you will still have your archive of tweets in a relatively platform agnostic format. Storify does not let you locally save and archive any of the content you create with it. They do provide an “export” feature, which embeds your Storify on a site powered by WordPress, Drupal, Tumblr, (and a few other platforms). While at first glance this looks great, it is entirely misleading.

Taking a look at what Storify actually posts to your site, every last bit of it (from js, to images, and css) is hotlinked. Meaning: when Storify goes down, so will the content you’ve “exported.” To boot – they use infinite scroll javascript, so web archiving with a web crawler is pretty much out of the question. Of course there are simple ways to mitigate this: print a PDF of the page, do a “save as webpage”, etc. This seems besides the point though. The point is that Storify has built what is essentially the most “human” tool for archiving and presenting interactions on Twitter. If they were to provide a true “export” feature that allowed users to locally backup their Storify content, they would be in the position of being one of the most comprehensive personal digital preservation tools for Twitter.

 

One response so far

ArtsTech: Digital Conservation

Jun 14 2012

4 responses so far

Keyboard Archeology

Jun 02 2012

I came across an interesting question on twitter a few days ago that sent me spiraling into a brief bout of research. Via Matthew Kirschenbaum, Matt Schneider posed the question of when the greater than (>) and less than (<) symbols first appeared on keyboards. I managed to come up with two contenders.

 

 

Above is the 1955 Olympia SM3 De Luxe (with science and math keys). It seems that typewriters beat computers to the <pun>punch</pun>, as there were some early keyboard layouts that included mathematical symbols at a time when computers were programed in assembly languages that were coded/punched on keyboards whose layouts did not include scientific or mathematical symbols (or were programmed on colossal Univac keyboards). Looking at early-mid 20th century IBM card & key punches, it can be observed that while keypunches with alphanumeric “repertoire” emerged in 1933 with the IBM Type 032 Printing Punch, this keyboard layout was strictly alphanumeric.

 

 

Above is the keyboard layout of the IBM 026 (1949). The earliest example I managed to dig up, of a “computer” keyboard with the greater than (>) and less than (<) symbols was the IBM 029 Card Punch, from 1964.

 



(via http://www.columbia.edu/cu/computinghistory/029.html)

In the category of “close, but no cigar, but nonetheless interesting” we have the Smith-Corona Classic 12 (Greek), which included ten keys of Greek symbols (but none useful for equations).

 

 

No responses yet

Older »