Friday, March 29, 2013

Scraping Craigslist for mein Deutsches Auto

Recently, for 30+ years of good service, I was awarded the gift of a son, born 2 weeks early and bringing with him a set of lungs only a new father could love. As part of the GREAT PREPARATION for the boy, I was to source a laundry list of items, the most important being a car.


TWO CARS BETTER THAN ONE

The problem with buying a car is that I already had one. But, despite it having 2 very solid doors and a rag top, it did not transport infants according to rule one of THE WIFE's guidelines on acceptable child bearing machines. To replace it with a 4 door motor would require a vehicle that both satisfied my rule "never buy anything I don't want" and also the wife's "never buy a Mustang again" edict.

After two months of researching, I finally selected a steed I would be happy with.


BUYERS AND SUPPLIERS

It turned out incredible hard to find an example of THE CHOSEN CAR that I wanted however, a specific revision of an used Audi A4 Wagon with glass roof and lots of buttons. So difficult was this search that I found myself spending over an hour a day searching a bunch of websites for this specific model. On 2 occasions when I actually found the ideal car in an evenings search, following up in both occasions found that it had already been sold!

To add to the woes, my criteria for "what car" quickly grew as BABY DAY approached into accepting almost any car that wasn't "that damn Mustang" and now instead of spending an hour a day looking for one model, I was spending three times the effort speed reading around 100 new hits, researching the specs of these cars and checking if the pricing was good or not.


THE GREAT AUTOMATION

So, automating the search was the next step.Craigslist was where I bought the 'Stang from and where the other 2 hits came from for the new motor, so I started here.

There are many methods of monitoring Craigslist  - RSS feeds being the officially supported method, but direct scraping the HTML feed also works. There are also third party apps that can do this on your behalf including browser plugins and mashup web services dedicated to searching. One particular app I had a lot of hope on was If This Then That.



Without reviewing each and every service in detail here including ITTT, about half of them worked well enough, yet all were either laggy or were not specific enough to save me significant time or give me confidence that it would be reliable enough to bag the motor.


THE WHOLE HOG

So I ended up putting in place a hosted Linux server running scripts that scraped things like Craigslist every 5 mins and emailed / texted me as soon as a model turned up that was in my search criteria. It was, as expected, a lot of fun.

You can find the code here if you find yourself with the same issue!

https://github.com/mrlambchop/clcarhunt

I have got a completely redone version that supports scraping of all sorts of content for personal pleasure which was much more interesting technically to put in place - will save this for another post.

And yes, I still have the Mustang. The boy is 4 months now ;0)

No comments:

Post a Comment