Over time, I have accumulated a large number of digital images and the more images I accumulate, the more it becomes obvious that it is not a good idea to loose them. And you can loose an image in multiple ways:
- Digital Images are stored on some type of media – either a harddisk or a CD-ROM/DVD/Blueray or a memory stick. Any of these media can become corrupted, either logically or physically – so you are looking at an issue of Media Safety.
- Digital Images are processed like other files – there are plenty of tools out there that you can use to view and/or manipulate the image and usually, saving an image overwrites the original image. This is what I call Logical Safety.
The Issue with the Media Safety
Media Safety to me means the reliability of the media I used to store the digital data on. Every one of use who is accumulating digital data should also be aware of the fact that digital data (if it is of any worth to you) needs protection against any kind of loss caused by physical issues of the related storage medium.
That is why in the old days we have backed up our data onto tons of floppy disks. Everyone of us understands that this is a cumbersome process – and everyone of us tried to make it easier for us as much as we could (and our budget allowed).
For my, that means that vital data (and that does include my digital images/photos) is stored in a Windows Home Server where the server’s internal duplication together with regular backups to off-site hard-disks) provides (hopefully) sufficient protection against any sort of physical loss – may it be because the disks in the server cease to work or the server itself gets destroyed (e.g. in a fire).
That sort of protection does, however, not help with what I call Logical Safety.
The Issue with Logical Safety
Logical Safety to me is anything that is related to data destruction due to false handling of data, either by human interaction or software.
Even if data is protected against physical losses, who protects data against me deleting it involuntarily? Or who protects my data against any software corrupting it e.g. when saving an update to the file? Of course, you could argue, I do have a backup to restore the original file from. But what if I do not even recognize that I have destroyed my data?
My backup cycle will re-use the off-site harddisks ever so often and when I am backing up the corrupted data onto that disk (and thus overwriting my good copy) I have nothing to revert to in case I discover the problem with my original file.
The solution to this issue is to store any change to a file as a new version instead of overwriting the original file. In theory, that sounds like a perfect solution but there are two things to keep in mind:
- Windows does not (or at least not easily) allow you do deal with multiple versions of a file.
- If you store each change in a separate version, your need for storage capacity increases dramatically!
Adding Version Control to Digital Images
I am running a Subversion (SVN) Server on my Windows Home Server anyway – so the logical choice was to create a SVN Repository (think if a Repository as some kind of cabinet dedicated to the management of related data) for my Digital Photos.
I don’t really want to go into the details of installing and configuring Subversion – there are enough posts out there that will tell you how to do that. So let me focus on digital images and Subversion instead:
Migration of existing Images
Before I can benefit from my digital images being under version control, I need to migrate the existing data to Subversion. One of the positive aspects of using SVN is that I can have a copy of my images on my Workstation PC so all my applications dealing with them work locally (and not over the network) which makes them significantly faster than before.
So I am linking a directory on my local harddisk to my Subversion Repository (which is currently empty) and start building my structure on my local harddisk. Once I am happy with the structure, I can add the files (and directories) to the repository and commit then. This is the moment in time when the data is written into the Subversion Repository which means it will take some time if I am working with large numbers of files.
This process duplicates my data (one copy is in the Subversion Repository on my server and one (working) copy is stored locally on my desktop). However, if I am going to delete the original server-side storage location (the one not under version control!) once I have validated(!!!) the functionality of my Subversion Repository, I have not used more disk space on the server than I did before.
I do use much more space on my local PC (with the working copy now being there) but I gain a significant amount of performance (because of elimination of network traffic).
File Size Considerations
Over the years, my digital images have accumulated and they have been created using different technologies and devices. Where is a snapshot of what I am seeing in my repository today:
- My first digital Camera seems to have produced images of 6 Megapixel in JPG Format with file sizes roughly around 2MB,give or take 500K.
- My second digital Camera was also a 6 Megapixel device but the images are larger – the JPGs measure around 3MB each (give or take 500K) but the RAW Images are around 5MB to 6MB each.
- My current digital Camera is a 10 Megapixel device and I am only taking RAW Images these days – each one of them is somewhere between 9MB and 11MB in size.
- I have scanned a lot of color slides, each one of them is around 2MB (give or take 1MB).
- I have scanned a lot of photos my grandfather took back in the 1940s and because they are black & white, the images are a few hundred KB only.
My entire digital library (without version control) accumulates to approx. 150GB of data… and that is storing every image only once!
Why using Version Control? Shouldn’t be using the right tools make that useless?
One of the things I was really thinking a bit about was the actual need for version control with digital images – especially digital photos.
Tools like Adobe Lightroom are claiming to use “non-destructive” processing – in other words: everything you do with an image is not done to the image physically (the file) but is stored in their database and simply applied every time you use the image. This, in theory, would mean that the image never changes and therefore you would never have a second version of the image.
The following is my personal conclusions: if the above said is true, I am not going to lose anything if I am putting the images under version control. There will be exactly one copy stored on the server, using up exactly as much space as it would without version control.
Yes, I am spending a bit of my local storage space (because the image is duplicated to my workstation PC) but I do gain a good amount of processing performance.
On the other hand, Adobe Lightroom is not entirely non-destructive – there is, for example, an option to write metadata to the physical files and that will change the file. Also, Adobe Lightroom does not give me all the features I need to modify my images – so there are other tools and they may not be as non-destructive as Lightroom is. And finally: I am sharing my data with other people in the house – meaning I have no control over the activities of other people and they may – on purpose or not – also modify data on the server.
Adding Version Control to the digital images does not sound like a bad idea, especially if you are running a Subversion Server already. However, first tests have shown that you might want to commit yourself to an intelligent workflow to avoid unnecessary duplication of data by creating new versions of large files when there is no need to so. But that is another post…