Simple and Affordable Archival & Backup Solution (for Small Studios Dealing With Large Amounts of Data)

This is an overview of an archival + backup strategy I designed while working at a small (4-person) video production company back in 2009. There was absolutely no backup (let alone a system for archiving past work) in place when I started. Things were in a state of absolute chaos. Data and projects were literally everywhere. Spread across 4 workstations, dozens of external harddrives, old PCs stuffed in a closet, a NAS or two. Just shit everywhere. And the kicker? Not a single file backed up anywhere to be seen. When you tried to find a project it required hopping onto a bunch of different machines, trying to find what you were looking for, then starting a painful process of attempting to wade through what was the latest, most up to date version (there was no versioning in place either). There was kind of an archive of burned DVDs in a case, but it was next to impossible to use in any capacity as it was just hand-labelled misc stuff. This was the only thing that remotely constituted a 'backup' of anything.

This is a look at the system I set up to handle the archival & backup of completed projects. In a future post, we'll take a look at how we handled the in-studio backup for day-to-day work, but first let's take a look at the archival method that involves creating a library of bare, dockable harddrives.

Archival & Backup is a very non-glamourous and annoying part of what we do, nobody's going to deny that. But, what's very important to understand is that the second a client trusts you with a job, it's your responsibilty to ensure that you're taking the necessary precautions to protect their data. In this instance I'm referring more to event-type work, ie things that you shoot that only happen once. Failing to backup your design work and losing it isn't a complete disaster, as you can always just re-do it (albiet losing a lot of time in the process). But you can't redo someones wedding, and you can't redo some corporate event that you shot last year if you lose it.

The particular requirements for this system were vague at the beginning. Basically they're was nothing, except the understanding that something needed to be done about it. It was no way for any operation to run. because there was such a backlog of projects from years of neglect and pileup, it wasn't really feasible, or advisable to just buy a server and dump it all on to mass storage. We needed somewhat of a fresh start, and a plan to move forward.

This archive system is very simple to setup and implement, and has an extremely low cost per GB, and should be feasible for almost everyone, especially if you have no current system in place right now. The most important thing about backups are that they exist. Plain and simple, you either have them or you do not. There are of course varying degrees of how safe/secure your backups are, but we'll leave that discussion for another time and place. Just think of all the future jobs you'll probably lose if you have a crash and lose your clients data and can't recover it. It can be an expensive lesson to learn. There was a time when I was finishing up school and I lost a harddrive FULL of work 1 month before graduation. Contained on it was all the work I needed to prep for the year-end show, and prepare my portfolio for the job hunt. I got the data back, with the help of cbltech.ca, but it was a very expensive way to learn how inexpensive backing up can be. If you haven't learned this lesson yet, if you aren't prepared then you will at some point. Guaranteed.

So, what follows is an archival and backup strategy that I designed and implemented at said small 4-person studio a couple years ago. What we were dealing with was a fairly large amount of project data, and finding room for it was an issue. We needed a system for cleaning up and getting rid of (archiving) old completed projects.

Two things were needed, first was a server setup to deal with the amalgamation and central hub for active projects. We will deal with this aspect of things in a future post. For this archival system I'm about to describe, It assumes you have all your completed project data located in one central spot (& likely on a server).

The key to this system was that it was a very inexpensive, safe, pay-as-you-grow solution for us. It is something that should work well for small (under 10 person) studios dealing with a large amount of data (video and/or photography). This probably isn't as applicable to individual freelancers, larger studios, design studios not doing any video or motion graphic work, or anyone with a lot of cash to invest in a more robust system. That being said, I don't currently think there is a better solution at this price, which is low to begin with, but literally gets cheaper to operate every week as drives come down in price.

What you need:

  • a "drive toaster", referred to as a drive dock (ideally 1/workstation)
  • Bare harddrives (in pairs, at your desired size)
  • Drive Cases
  • Media Catalog (OS X only)
  • Optional:

    • eSATA PCIe card (Mac Pro only) + esata drive toaster

    Drive docks (preferably 1/workstation):

    These things aren't exactly invincible, I've had a few die, but they're so cheap to replace it's not really a huge concern. They come in a wide-assortment of flavours now. 1-bay, 2-bay, 3 or 4-bay(!), card readers, one-touch backups (forget about this on the Mac though), USB2.0, esata, FW800 etc. Pick your poison. I recommend a dual-bay dock so you can clone two drives simultaneously to save a bunch of time (do it overnight or on weekends in general though). These will run you anywhere from $25-$50, and are available almost anywhere online. All you do is put in a bare harrdrive and it mounts on your computer, forever eliminating the need for annoying external enclosures, cables and power supplies.

    Depending on what you're using it for (in this case making and retrieving files from archived drives), you can usually get away with a basic USB 2.0 one. If you think you'll be referencing data on your archives frequently or are concerned about speed, get one with eSATA and drop an eSATA card in your Mac Pro. If you're not using Mac Pro's, just stick with USB2.0 and you'll be fine, as you'll be used to waiting around anyway. :P

    Bare Harddrives:

    Depending on how much data you're churning out, how often you want (or need) to archive, and how little you want to spend will help determine what size drives you'll be buying to archive to. The beauty of this system though is that it easily scales over time, and basically gets cheaper to maintain as drive sizes go up and prizes drop. At the time of writing this, the sweet spot in terms of price/GB is with 1TB drives, with the cost of 2TB dropping rapidly. I would stick with 1TB drives for now if that fits your production needs. This was ideal for us at the time (mainly XDCAM EX & HDSLR video with 1-3 shoot days/week, and on-location event photography). We were archiving on average about 1TB/Month most of the time, with a per project average of about maybe 75gb. This meant archiving anywhere from 3-10 projects a month, depending on size, duration, completion date, etc. Another win for this system is the total flexibility to archive as much or as little as you need to each month. Busy month? Just buy an extra set of drives. Slow month? Just hold off and do a double batch next month.

    Notes: You don't really need the fastest drives for this system. The cheaper 'Green' drives (lower RPMs & power-consumption) are fine for this type of backup, especially if you're concerned about keeping costs low. You don't really need to have the fastest drives if they're just sitting around on a shelf all the time acting as an archive/backup. At the time I was getting the Western Digital Caviar Greens a lot, or the Blue. I avoided the Caviar Black as they're faster but more expensive. Remember because you need a pair of drives, and we're not as concerned with drive failure, going with the cheapest harddrives is the way to go, so just buy whatever's on sale. I like to use Newegg.ca if you're not around the corner from a cheap computer supply place. I recommend www.CanadaComputers.com if you have one nearby.

    Also, I wouldn't recommend buying a large supply of drives all at once, because of how quickly harddrives drop in price. I liked to always have a couple pairs of empty drives ready to go, and would re-order about 4 at time every 2 months or so as needed.

    Drive Cases:

    You need something to put the drives in once your archive drive & backup drive are created. At the time I went with these Weibetech cases. I highly recommend these, but they're comically expensive for what is basically $0.04 worth of plastic. Packs of 10 of these at the time were running me like $60+. Recently I found these over at DealExtreme, at about $2.88 with bulkrate prices, seem like a much better deal. I just ordered a bunch of these to try out, so I can't vouch for them yet, but I'm sure they'll do fine.

    Basically anything that protects your drives and are easily labelled and identified will do fine. There are a few other options, you can get fancy or not it's really up to you. If you're going real ghetto you can just sharpie onto the drive itself and put it back in the package the drive came in. Or, I'm pretty sure they'll fit in old VHS cases which would also get points for general awesomeness.

    Media Catalog:

    http://halfduplex.net - $24.99

    After looking at a few options, I chose to go with Media Catalog as the software used to track the data that is on the archives. It does this job very well, with minimal frills. This piece of the puzzle is crucial, because it allows you to create a tagged, searchable index of a drives contents so you can find and locate projects or files fast. If there are anything other options out there that you've found you like better, please let me know!

    So, Now What?

    Once you have all the pieces, it's time to get stuff ready to archive. Exciting stuff I know...

    Organizing your projects for archival will maybe be covered at another time, but for now we'll assume it's organized in some form of system, in a central location. Take the projects that are going to be archived and collect them together. If you're archiving to say a 1tb drive, pool together roughly about 1tb worth of projects together as a batch. Don't split projects over multiple drives, it's not worth the hassle. If projects are so large that they won't fit on one drive, you've got much bigger problems than this system is designed for.

    Once you've got your batch of projects ready to archive & backup, pop in your first drive and copy the contents over to this drive. Make sure you label the drive appropriately so it's easily identified not only on the shelf, but in the Media Catalog. Depending on how much data you are copying this will take some time, so either get a dual-bay drive dock or do it overnight or over a weekend.

    The idea is that one drive is the Archive, the other is the Backup (they both have identical contents).

    The Archive goes on the shelf in the studio, and is used when needed to pull things out of the archive. The Backup goes off-site and is safely stored somewhere other than your office.

    Index your Archive drive in Media Catalog. Label the physical drive, put it in a case, label the case & put it on the shelf. Similarly, label the Backup and pack it up and send it to your offsite location (you don't need to archive the Backup copy). Once this is done and you've verified that the data was properly copied over to two drives successfully, you can delete this data from your server.

    You'll need a naming system for your archives, you can use whatever you like. Obviously something chronological in sequence works best. We just used a variation of the police alphabet (Alpha, Bravo, etc).

    That's pretty much it. Schedule and assign someone to do this as often as you need to. It could be once a week, or in our case it was about every month or so.

    Like anything, this system does has it's pros & cons. Here are the biggest ones for me:

    Pros:

    • Cheap, very low startup cost. Only gets cheaper over time as drives get cheaper.
    • Off-site. Gets your archived data moved offsite to a safe second location (purely for Holy-Shit scenarios, fire, theft, natural disasters, etc at main location).
    • Simple to add a 3rd level of redundancy if you're super paranoid (just clone a 3rd drive and keep it at a 3rd location).
    • Easy to search, locate and pull up files/projects within the archive.
    • Easy to scale up or down on the fly month-to-month.

    Cons:

    • More of a manual solution. Adding automation is very possible but we found it tricky due to the nature of the work we were doing. It's hard to setup rules for when things should automatically be archived when projects cover a wide spectrum and no two are ever exactly alike.
    • Only as good as the whoever's responsible for keeping this archival up. Good news is it's simple and anyone can do it, but I found it's easier for someone to be assigned the responsibility for doing it each month.
    • Leaving drives on a shelf in a case for extended periods of time is not recommended by many. You can't and shouldn't let drives sit without spinning them up every now & again. I suggest a 6-month 'Backup Integrity' check that someone gets the unlucky pleasure of being responsible for where they go though both the on-site & off-site drive pairings and make sure they are synced, incase any changes occurred to the archive.
    • All your archive data is not immediately accessible if you are not physically in the same location as your archive drives. This is not a solution for people who want or need that. It's a much more expensive server solution that we never quite needed in my experience at our studio. Once a project was done and archived, 80% were probably never needed again so this was fine for us.

    Closing:

    Archival & backup strategies in general should be tailored to suit your particular needs. There unfortunately isn't one universal solution to be had. With that said, this one worked, and worked well for the situation we were in, and may or may not be the best fit for everyone.

    I hope to do a few more posts on a couple of the other backup strategies I put in place here for in-studio, and in-the-field backup, because I think it's important experience to be shared. There wasn't a whole lot of other information to be found about this mid-level, not too high end, yet affordable style of archival/backup back when I was trying to figure out what we should do. Plus in our case it had to be fairly hands on, we didn't have an IT department, or even an IT guy, I just doubled as the IT guy in between trying to get my work done. It's far from fool proof, but definitely much better off having implemented it.

    If you have any questions, or are looking for some consulting to cater a system to your individual or studios needs, please feel free to get in touch and I'd be happy to help.

    Jordan