Tumblr Backup Options: None of them do everything
Cheeky but true. I'll go through what's good and bad about each option though so you can decide which balances out for you.
Covered: native export, WordPress (kinda), TumblThree, tumblr-utils (kinda)
Native Export
If you go to "https://www.tumblr.com/settings/blog/yourblogname", at the bottom of the page is an export option
Once you hit the button to start the request, it will start processing. Feel free to log off, this is going to to take a few hours. You don't need to keep it open. ~22k posts took roughly a day for me. If you have a small number of posts and get stuck, you're probably broken.
When it's done processing, you can hit that download backup button and then wait some more as you wait for the zip file to download. Mine failed the first time after like twenty minutes, and then I had to start over. I think it took 1-2 hour(s) and I'm almost certain that was on Tumblr and not my internet. And that was the zip file! So make sure your computer can be on for a while before getting this started.
So what do you get?
- A media folder, conversations folder, and posts folder
- Media folder: Every single photo, gif, and video that has ever been on your blog or in your DMs. There is no context data attached (except for dm images which do say which conversation they're from at least), but they seem to be in chronological order because they seem to be titled by the post's ID (the string of numbers in the address bar after "/post/"). They look like "100868498227", "100868498228_0", "100868498228_1"
- When you see something end with "_0" and up that means the photos are in the same post, so _0 represents the first image in the post, _1 represents the second, etc (at least, I think).
- Conversations folder: HTML export files of every DM history you have on your blog. These are actually pretty well formatted, see example here.
- Posts folder: html subfolder and posts_index.html file
- posts_index.html: File listing every single post on your blog by post ID on its own line with no other context. Example of a line: "Post: 780053389730037760". The ID number will link to the post in the html folder
- html subfolder: contains a submissions subfolder and stripped html file versions of every post on your blog. See below first what the post looks like on Tumblr, and second what the post looks like in the html folder
- The way you seem to be intended to use this is to open the file index, select a post ID, and be jumped to where that post is saved as an html file, but I don't know why you would bother when the index doesn't provide any information about the posts inside it. The posts all have extremely minimal formatting. See a reblog chain below.
- Notice I said ALL posts on your blog. Photo posts without a caption will just have a broken image icon and then the date and tags. Theoretically, it might be that if you unzip the entire export folder that allows it to automatically link to the image saved in your media folder. I have no fucking idea, unzipping the folder was estimated to take two hours so I didn't do it. Let me know if you do though so I can update this post!
- The submissions folder is such a rabbithole I made a post just on it but long story short it's asks you haven't replied to
What do I see as the main reasons to opt for this option?
1) you don't want to download any programs or files from the internet just to backup your blog, 2) your blog is relatively small, so digging through the ID files isn't a big deal, 3) you mostly just want to download either the images (which will be browsable via thumbnail previews in the media folder if you unzip it) or conversation history, which are fairly well formatted, 4) you don't need to update your export often/ever, because you'd have to request it from the start and download the entire thing all over again, 5) you want to be able to read your text posts clearly and don't care about preserving the full formatting, and/or 6) you don't plan to reupload this information elsewhere (say on... a WordPress blog)
WordPress Automatic Ex/Import
Move your post's from Matt's right hand to his left! WordPress (another product of Automattic) has a native Tumblr importer found under your WP Admin dashboard for your site under Tools > Import > Tumblr.
How does this work? No idea! I hit import 2 days ago and it has done nothing. Maybe I'm stuck, maybe it's permanently broken. It says to contact support if it's been over 24 hours but they don't make that easy. I disconnected from Tumblr (you can only port over a blog you have the login of) and reconnected and it still said it was importing. I don't think it's ever going to do anything.
Presumably it's supposed to 1:1 import every post on your blog onto the WordPress site, which will result in a whole lot of stolen art because there's no way to select just your original posts. Also, you'd need enough storage on your webhost to house all the posts (this honestly might be my problem, but I was planning to delete all the non-original posts once it imported.... anything and backfill what it didn't get to). The one thing I'll say about this option is that it's the only one I've seen so far that exports drafts and queues as well.
I mean, if it exported anything. If this ever does anything I'll update this post, but either my blog is too large or this tool isn't totally functional anymore.
TumblThree
(previously TumblTwo, etc)
TumblThree is an all-in-one program requiring no extra downloads beyond the main Zip, and was last updated fairly recently at the time of this post. In order to run it, unzip it into one folder and run the main .exe. It has a full UI interface with lots of very descriptive helper text to help you select the right options for you without looking at the wiki. I think it's user-friendly for non-tech people.
There are a lot of options in TumblThree to change what output it gives you, but I'm going to start with the largely universal parts first:
- Everything from one blog will be exported to one folder, no subfolders or sorting. As a result, the output is very messy and difficult to wade through, but post metadata and the photos are named in the same way so you can scroll, see an image preview, and then click on the metadata txt for that post and read the caption.
- Depending on your settings, you can export all photos, videos, text posts, etc as their own files or exclude them from the export entirely. For the different types of media posts, you can independently select if you what to download just the media, just the metadata for it (everything that surrounds the post when you see it on Tumblr, such as the caption, OP, tags, etc), or both.
- Master txt file: For every type of media metadata you export, a correspondingly named txt file will be created (images.txt, answers.txt, etc) that contains the text/metadata of every post of that type in one txt file. This is also the default behavior for exporting text posts.
- Note: for text posts (which includes asks/answers), it only creates a master txt file if you do not select "Save texts as individual files", in which case it will only save each text as an individual txt file and not make a master file.
- The formatting on these files is so brutal I won't even give examples, but they're unreadable. Being a .txt file, there is no native formatting, so it exports in html formatting.
- Example: instead of a post that says "I want to go swimming", it exports: "I want to go < b >swimming" (minus the spaces around the b) as the post body, which is a big part of what makes it unreadable, because there are a lot of hyperlinks in all the header information listed below.
- Each post in the master txt exports with: Post ID, date, post URL, slug, reblog key (no idea what that is), reblog URL, reblog name, title, [the text/caption itself], and tags.
- Theoretically this means you could ctrl+f "cybertrucks" in the master txt file and then browse all your posts making fun of Tesla owners by tabbing through the returns. This is not possible with any of the previous options, and only is possible because it's all in one file, as ridiculous as it is, which is why getting that master file is so important.
- For the trick to get both the individual text posts and master text.txt & answers.txt file, as well as my recommended settings and details on how updating backups works, see the read more at the end of this post.
- The images.txt includes all the information listed above, but with the following additions: photo url (NOTE: this is the url on Tumblr, not a link to where it is in your folder), photo set URLs, photo caption, and "downloaded files" (NOTE: this is the name of the file it has downloaded)
- The video.txt is similar to the above
- The use case for this would be similar to what I described for text posts above: search keywords from captions, tags, etc and when you find what you think is what you want, copy the name from "downloaded files" and search your folder to find the actual image
I really hated TumblThree's output the first time I looked at it and then I realized the single file is the only way to make browsing tags workable, because otherwise you would have to have a folder for every tag, and posts with multiple tags would have to be duplicated between them. I'm not pressed on finding a txt to HTML converter right now but it could be an option in the future if you wanted to make things more readable.
Okay, let's get into the non-universal stuff you can customize in settings, because it's like, everything:
- File names: We've already established you can search with the downloaded file name for images, but what will that be? Whatever you fucking want. Post date, reblogger name, post ID, post title, original file name, you can make it any and all of these in any order you want! You can have actually useful file names! Personally I like %e_%p_%q_%i_%x which exports as DateTime_PostTitle_BlogOriginName_PostID_IteratingNumber (note: you need some kind of unique iterator to be valid so two files don't have the same name, such as multiple photos from one post). Look how much searchable information that gives me, in chronological order! It decreases your need for the master txt file.
- Tip I wish I thought of before doing my massive export: make one of the unique headers from the master txt file part of the exported file name so it's easy to search for it after identifying it in the master file.
- Files scanned: this is the only method I've found that lets you back everything up, remember what it backed up, and then lets you add any new posts since that date without having to download the whole thing again. That's a game changer, but see the read more below for limitations.
- You also have the option to rescan the entire thing if you want.
- Post type: T3 (I'm abbreviating it now) also lets you export just your original posts, just reblogs, etc - again, giving you the most control of any options. It also lets you export replies. I, uh, would not do this because if you have any popular post on your blog it might have hundreds, or thousands of replies but hey, you can do it!
- You also have the option to only download posts with a certain tag.
- Blog options: You can export literally any blog you have the URL of. In fact, if you copy a blog URL while it's open, it will automatically add that blog to its UI and create an empty folder for it. It makes it easy, no private key required. I do have mixed feelings about the concept of exporting someone else's blog... but I'm also planning to do it to some of Crew-ra's blogs so... my digital horde must grow.
- You can also queue blogs up and leave it to run through a lot of them. It is a lot faster than Tumblr's native export, I started this import well after I started typing this post and it took a few hours, probably not all that much longer than just downloading Tumblr's export took (and that's while running it alongside other data copy operations because I'm backing up a lot of stuff right now).
- I do recommend doing a test export with a sideblog, I was able to use wild-bitchofthenorthwoods as a test import since it only has one post and it has media, so it was super quick.
- (I do want to note, I think the number of downloadable items starts out matching the number of posts on your blog without scanning them until you start the export - but if you choose to export everything as its own file, you're going to end up with way more than that because a post with three images would be multiple files)
Things T3 cannot export:
- Since in its simplest form it's just accessing the public upload of your blog, it cannot export your drafts, queue, or conversations
- It cannot export posts as HTML files, and thus cannot export them with readable formatting natively
What do I see as the main reasons to opt for this option?
1) you don't care about exporting your DMs/conversations, 2) you want the ability to export only certain kinds of posts (original, photos, using a tag, etc), 3) you want to control the titles of the exported files 4) you don't mind wading through massive folders, 5) you want the ability to search tags (using the txt files), 6) you want the ability to update your export without starting over from the beginning, 7) you either don't want to reupload this information somewhere else, or you want to upload it somewhere that supports automatic HTML conversion (for instance, you can switch a Tumblr post from a rich text format to HTML, same with AO3, so you can put it in as HTML and then hit post to see it turn into a rich format. This techically makes T3 the most versatile/useful export option if you're planning to do anything with it other than browse your own files).
tumblr-utils
Full disclosure: haven't tried this one. But others have! tumblr-utils is a no-UI, python-based backup software. This means in order to use it you have to type commands into the terminal. If you don't know what I just said, don't use this one.
If you do, you'll need to separately download python and youtube-dl just to get this one running. You'll also need to give it your personal Tumblr API key and feed it commands deciphered from the wiki page I linked. Here are two different guides people have written on how to use it. Output:
- Obviously I'm guessing based on the documentation, but one thing that is nice is this tool allows you to save each post in its own folder. Presumably each post is multiple files like we saw with T3, so this would make it easy to group them, but it also means you'd have to look in every single folder to find anything.
- It seems to break posts up into timestamp folders by month, again, helping with management to narrow down where you have to search
- It allows you to save only certain kinds of posts at a time like T3
- It allows you to backup posts only from a certain time period (so if you keep a little .txt note of the last time you backed up, you can easily add only the new posts into your backup without having to start over from the beginning)
- It allows you to only save posts under a certain tag like T3
- It allows you to save only original posts
- It's the only one I've found that lets you back up your liked posts
What do I see as the main reasons to opt for this option?
1) you don't care about exporting your DMs/conversations, 2) you want the ability to export only certain kinds of posts (original, photos, using a tag, etc), (okay now we get to the points that aren't also covered by T3), 3) you want posts to export already broken into folders, whether by post or by month, 4) you want to back up your likes, 5) you don't care what file names look like, 6) you're comfortable with the command line/coding and don't need a UI.
Summary:
None of these options are ideal for reuploading your files anywhere (except WordPress), but I do think TumblThree is the best of the options because of the written HTML formatting in the txt files being useful for websites that support automatic conversion (or require HTML input).
For starting another blog, WordPress wins. If it works. I'm trying to be generous here.
For searchability, T3 wins again.
For versatility... yeah you know it's T3, but tumblr-utils has a lot of the same features, too!
For sentimentality (aka conversations), it has to be the native export. There literally is not any other option.
For queues and drafts, the only theoretical option is WordPress. If it works.
For likes, the only option is tumblr-utils.
Every option does something the others don't, so theoretically to cover everything, you have to do all four options. Actually I would say do the native export if you don't have a lot of posts and aren't a freak like me, check it out, and if it doesn't work (I know it's finnicky) or you don't like the export, go with TumblThree. This also means you'll at least have your conversations even if you don't end up using the native export any other way.
And I wish it could go without saying, but don't repost people's shit, y'all. I'm backing up everything for my records only and it will never be shared with anyone else, or even browsed as long as using Tumblr instead is an option.
TumblThree adding to old backup quirks, recommended settings, & master file backup solution:
Keep reading