Saturday, May 11, 2013

Photo Laser Confinement Grid

When Ghostbusters came out, I was always fascinated by the containment unit. It wasn't anything to do with how it worked (storing ghosts is clearly a complicated issue), but the user interface to it:

  • Capture a ghost in a single foot press
  • Take it back to HQ
  • Upload to the ghost storage unit by plugging in the trap

This was great - no selecting which directory to store a ghost in, a 10 second operation and some bonus levers to manipulate for satisfaction  Overly dramatic green and red lights and a zoning issue with the city really sealed the deal here. A quick training video from Ray:


The 2 button presses however? Meh.

Anyway, after I regained the usage of my NAS box for more traditional activities, the problem of managing all my photos was back on the cards. Also, the wife had started deleting photos 'on the camera' to make room for more cute baby pictures which is the kind of thing that keeps me awake at night, so action was needed.


PHOTO CONTAINMENT UNIT

My photo storage policy is pretty easy:

  • Store locally at home
  • Store in the cloud
  • Store randomly on a bunch of SDCards hanging around the house 

Clearly the card issue needs to be solved, although I am pretty sure I have them all... :) The home and cloud storage are just reflections of a good backup solution. Also the NAS ate a drive and is not to be trusted anymore.

However, all of this is maintained manually. What I really want is a photo containment unit that operates like this:

  • Capture photos on my camera
  • Take camera back home
  • Put camera card into photo containment unit and wait for the light to turn green

The storage here is automatic replication to the local RAID disks and Google Drive.

So I put one together this evening.


READY NAS PHOTO BOX

My NAS has a USB input on the front.


Plugging in an SDCard from my SLR into the front of this triggers a python script that copies off the new files to the disk. Then it triggers a sync to Google Drive.

Photo organization after upload is still done manually by viewing the files in a thumbnail mode and dragging them into new directories based on the topic. But I do this on a laptop which auto sync's with Google Drive and then updates both the cloud and NAS storage locations for the files. I thought about storing the files by date or even coarse GPS area, but I like to manually sort and prune out bad images etc... so I left it fairly basic.


GOOGLE DRIVE INTEGRATION

A great open src project, Grive is one of the few ways to sync files from a local Linux box to Google Drive without a UI.

https://github.com/Grive/grive

Cross compiling this onto the ReadyNAS (actually on the NAS itself) was a little bit of a mission however. The box is based on a SPARC LEON  derivative, a 280 Mhz processor from a company that used to exist called Infrant. Netgear bought this 34 person company back when they were building these SPARC based processors, moving them over to more standard and supportable ARM cores after acquisition.

Anyway, getting the ReadyNAS to cross compile Grive was an adventure that might be more difficult to  submit a pull request its github than reimplementing the calls to Boost . Shame really as I always liked Boost (well, in comparison to GCC implementation).



PYTHON TIME AND THE GITS

Of all the hacking I did on the ReadyNAS to make this work, the only real tangible piece of code is a 5 minute script knocked up that runs when an SDCard is inserted into the NAS box (triggered by the disk mounting).

The python script does the following:

  • Take in 3 paths
    • Photo storage directory
    • Memory card directory
    • Staging directory to copy the files to from the card (optional, if not specified it does not copy)
  • Create a list of all the files in the storage and memory card paths
    • Use the filename and size as a key
      • Adding in the date doesn't help as this gets reset on Windows when copying files around... I was guilty of this at one time. Joy.
  • Create list of files not on the photo storage disk but are on the memory card
    • Optionally dump to stdout
    • Copy these files to the staging directory (if specified)
      • When copying the file, keep the date the same from the memory card
  • Conflicts (where multiple files of the same name and size exist either on the photo disk or memory card)
    • Checksum the files and match away any pairs that exist automatically
      • Optionally dump to stdout
      • Side benefit of this is that is finds duplicates on the photo storage disk as well
    • Print out any remaining conflicts to stdout for manual fix ups
      • Copies these duplicates to a conflicts directory in the staging area


KODAK DC240i

I have a remaining issue, that of duplicate files on my photo storage disk that have different resolution / sizes. This was caused by an overzealous camera (I believe this was a Kodak DC240i, bought in fetching iMac blue when I got my first pay cheque after clearing my student loans).




This camera used to store a smaller thumbnail picture alongside the main image (which was only 1.3MP). At the time, JPEG decoding was very slow on a PC and navigating 'hundreds' of pictures was a real pain, so this was actually a pretty useful feature. 

I also pondered if this was done to enable the camera to navigate the captured images as well. This camera used the TI SoC as the main processor, based on a ARM7 at 80 MHz. There is a DSP also running at the same clock speed, which looks to be dedicated to audio input/output and the main event, a 90MHz SIMD processor for the ISP + JPEG encode/decode (not clear where the huffman decode/encode is done however). Back of a napkin suggests a JPEG decode speed (scaler CPU for huffman, 4 way vector DCT in the image accelerator) should be easily capable of 'good enough' full size JPEG decode for the camera preview, so maybe it really was an option to speed up desktop viewing.

Either way, whenever I see this, OCD kicks in and I look to clean it up. The issue is that the filenames are not linked for some reason, the dates mangled and I have 2000 of these images. What I need is a method to detect duplicate pictures (within a certain probability) for manual clean up.


DUPLICATE PICTURE DETECTION

I started messing with scipy to knock up a quick and dirty script for this, which works wonderfully for the test image I picked, but failed 9 of the other pictures I tried it with. Trying a low pass 2D filter before comparison helped, but now false positives started to pop up. Sub sampling is next (sample rate based on resolution), but it might be quicker to simply do this by hand... Or I might spend the next 3 months tinkering with it as usual.

No comments:

Post a Comment