The Mighty Fidget: Linux

Showing posts with label Linux. Show all posts

Monday, June 10, 2013

GCC profiling to VCD waveforms

During a few after work drinks with a friend, James McCombe, he mentioned that he had been working on a method of augmenting C code with macros that generated signals that could be read by a waveform visualizer and plotted on the screen. This scratched a few itches over the next week or so and I couldn't help myself but hack something together to look at the visualizations.

I put this together using gcc instrument-functions, attributes and the usual bit of rushed together python.

INSTRUMENTATION FUNCTIONS

gcc has a cool feature that inserts a function call before and after every function has been called. It's triggered by building the final executable with -finstrument-functions. i.e.:

gcc test.c -finstrument-functions -g

{-g is needed to build for debug / keep all the symbols in the executable}

You then need to declare a couple of functions and link them into the executable that will get called before jumping to a new function and just before returning from a function:

void __attribute__((no_instrument_function))
__cyg_profile_func_enter(void *this_fn, void *call_site
{
printf("Func ptr called: %p from address %p\n", this_fn, call_site );
}

void __attribute__((no_instrument_function))
__cyg_profile_func_exit(void *this_fn, void *call_site)
{
printf("Func returning to %p from address %p\n", this_fn, call_site );
}

GCC ATTRIBUTES

To stop the instrumentation functions from being called recursively, you have to indicate to gcc that it should remove these functions from the list of targets used when inserting its instrumentation hooks.

The basic premise of this is as follows:

void __attribute__((NAME)) FISH(void )

A full list is available here and is well worth a perusal, even if a bunch of the options here are not portable between architectures or even instruction sets. A few of my favourites are:

: constructor / destructor

These insert functions before and after main(), allowing you to do all sorts of devious initialization of things when dynamically linking code against an executable. i.e.

void __attribute__((no_instrument_function))
__attribute__((constructor))
constructor_begin (void)
{
printf("Main is about to be called\n");
}

void __attribute__((no_instrument_function))
__attribute__((constructor))
constructor_end (void)
{
printf("Main has exited (or exit was called)\n");
}

: weak

The weak attribute indicates that when linking together an executable, if multiple functions exist with the same name, deprecate any functions marked as 'weak'. This enables executables to override functions in libraries for example if they so want.

: section

The last trick I like to use is to put a bunch of functions into a specific section in the ELF file. This attribute allows you to arbitrarily specify a section name for a funtion to live in (or list of functions). In the embedded space, this is very handy when specifying functions that want to execute from onchip SRAM or even ROM.

PUTTING IT TOGETHER

Using the instrumentation trick, the constructor attributes and a little bit of code to log the transactions to a file, we can quickly augment the executable to output a trace history of its function calls at run time.

Code over here:

https://github.com/mrlambchop/gcc-to-vcd/blob/master/prof.c

A few considerations made here to minimize the overhead of the tracing code:

The gcc profiling side periodically flushes out the trace buffer to disk (not writing every time). Although the file system is almost always buffered, fwrite still has a considerable overhead to call for 8 bytes and so local buffering of the data is needed
We need to minimize the amount of data written at the expense of a little computation. The file system is a magnitude order slower than the CPU and so using a few cycles per function to much reduce the amount of data written to disk is worth while to get better run times

For the python parsing side, the code is here:

https://github.com/mrlambchop/gcc-to-vcd/blob/master/trace_to_vcd.py

The main trick for performance is building up a dictionary (hash map) of all the addresses in the executable before parsing the file, to provide a nice O(1) lookup to get the function name being invoked.

However, this wasn't the real issue. After the hour long hack completed to get this working, I moved onto getting larger traces to run (specifically, tracing an open src C based JPEG decoder as a practical exercise). This produced trace files over 2GB in size, implying that even with the most compact of data structures, you'd be hard pressed to process them in place and so the entire function had to be rewritten to stream in / out the trace file and generated waveform. The history is in GIT if you prefer the original version, but you will need a lot of memory to get it to run :)

WAVES

So the most interesting bit of this is the visualizations. We'll start off with a trivial piece of test code found here:

https://github.com/mrlambchop/gcc-to-vcd/blob/master/test/test.c

Basically this code has a few function calls, the last one (FISH3) recurses 4 times, each time delaying for 1ms for the parameter passed in. The waveform for this is below:

The waveform shows the runtime is pretty much 10ms in total (the leading 1ms is a printf). FISH3 glitches each time it recurses, the delay actually happening as the function talks back up the stack of recursive calls, giving us a nice 1 / 2 / 3 / 4ms cadence.

Another excerpt below is from the JPEG decoder:

You can see in red that we are in a function here decoding a single JPEG block (njDecodeBlock). The zero cycle transitions (look like glitches) on the red trace is where a sub function returns and the njDecodeBlock continues to run. The other functions are those called by the njDecodeBlock function and their respective run times.

Looking at a full JPEG decoder (this guy) running over a 100x75 wide image gives us a much more interesting chart showing roughly the breakdown of function run times. Note this image is actually slightly quantized to make it more understandable (using the -q option in the python script) - very useful for eyeballing an application at a high level.

There are a few issues we want to resolve to make this more useful:

To view a callstack so its easy to understand, the waves really need to be ordered such that main() is in the top column and the stack calls down into the other functions.
Remove the cost of the trace buffer write itself. It seems to only be 1ms in 61ms of run time, but we should be able to account for this and hide it from the user viewing the waveform

Once again, thanks to James for his idea - next round of cocktails is on me!

Saturday, May 11, 2013

Photo Laser Confinement Grid

When Ghostbusters came out, I was always fascinated by the containment unit. It wasn't anything to do with how it worked (storing ghosts is clearly a complicated issue), but the user interface to it:

Capture a ghost in a single foot press
Take it back to HQ
Upload to the ghost storage unit by plugging in the trap

This was great - no selecting which directory to store a ghost in, a 10 second operation and some bonus levers to manipulate for satisfaction Overly dramatic green and red lights and a zoning issue with the city really sealed the deal here. A quick training video from Ray:

http://www.youtube.com/watch?v=aLwKMkdVMnQ

The 2 button presses however? Meh.

Anyway, after I regained the usage of my NAS box for more traditional activities, the problem of managing all my photos was back on the cards. Also, the wife had started deleting photos 'on the camera' to make room for more cute baby pictures which is the kind of thing that keeps me awake at night, so action was needed.

PHOTO CONTAINMENT UNIT

My photo storage policy is pretty easy:

Store locally at home
Store in the cloud
Store randomly on a bunch of SDCards hanging around the house

Clearly the card issue needs to be solved, although I am pretty sure I have them all... :) The home and cloud storage are just reflections of a good backup solution. Also the NAS ate a drive and is not to be trusted anymore.

However, all of this is maintained manually. What I really want is a photo containment unit that operates like this:

Capture photos on my camera
Take camera back home
Put camera card into photo containment unit and wait for the light to turn green

The storage here is automatic replication to the local RAID disks and Google Drive.

So I put one together this evening.

READY NAS PHOTO BOX

My NAS has a USB input on the front.

Plugging in an SDCard from my SLR into the front of this triggers a python script that copies off the new files to the disk. Then it triggers a sync to Google Drive.

Photo organization after upload is still done manually by viewing the files in a thumbnail mode and dragging them into new directories based on the topic. But I do this on a laptop which auto sync's with Google Drive and then updates both the cloud and NAS storage locations for the files. I thought about storing the files by date or even coarse GPS area, but I like to manually sort and prune out bad images etc... so I left it fairly basic.

GOOGLE DRIVE INTEGRATION

A great open src project, Grive is one of the few ways to sync files from a local Linux box to Google Drive without a UI.

https://github.com/Grive/grive

Cross compiling this onto the ReadyNAS (actually on the NAS itself) was a little bit of a mission however. The box is based on a SPARC LEON derivative, a 280 Mhz processor from a company that used to exist called Infrant. Netgear bought this 34 person company back when they were building these SPARC based processors, moving them over to more standard and supportable ARM cores after acquisition.

Anyway, getting the ReadyNAS to cross compile Grive was an adventure that might be more difficult to submit a pull request its github than reimplementing the calls to Boost . Shame really as I always liked Boost (well, in comparison to GCC implementation).

PYTHON TIME AND THE GITS

Of all the hacking I did on the ReadyNAS to make this work, the only real tangible piece of code is a 5 minute script knocked up that runs when an SDCard is inserted into the NAS box (triggered by the disk mounting).

The python script does the following:

Take in 3 paths

Photo storage directory
Memory card directory
Staging directory to copy the files to from the card (optional, if not specified it does not copy)

Create a list of all the files in the storage and memory card paths

Use the filename and size as a key

Adding in the date doesn't help as this gets reset on Windows when copying files around... I was guilty of this at one time. Joy.

Create list of files not on the photo storage disk but are on the memory card

Optionally dump to stdout
Copy these files to the staging directory (if specified)

When copying the file, keep the date the same from the memory card

Conflicts (where multiple files of the same name and size exist either on the photo disk or memory card)

Checksum the files and match away any pairs that exist automatically

Optionally dump to stdout
Side benefit of this is that is finds duplicates on the photo storage disk as well

Print out any remaining conflicts to stdout for manual fix ups

Copies these duplicates to a conflicts directory in the staging area

Src code here:

https://github.com/mrlambchop/photo_sync

KODAK DC240i

I have a remaining issue, that of duplicate files on my photo storage disk that have different resolution / sizes. This was caused by an overzealous camera (I believe this was a Kodak DC240i, bought in fetching iMac blue when I got my first pay cheque after clearing my student loans).

This camera used to store a smaller thumbnail picture alongside the main image (which was only 1.3MP). At the time, JPEG decoding was very slow on a PC and navigating 'hundreds' of pictures was a real pain, so this was actually a pretty useful feature.

I also pondered if this was done to enable the camera to navigate the captured images as well. This camera used the TI SoC as the main processor, based on a ARM7 at 80 MHz. There is a DSP also running at the same clock speed, which looks to be dedicated to audio input/output and the main event, a 90MHz SIMD processor for the ISP + JPEG encode/decode (not clear where the huffman decode/encode is done however). Back of a napkin suggests a JPEG decode speed (scaler CPU for huffman, 4 way vector DCT in the image accelerator) should be easily capable of 'good enough' full size JPEG decode for the camera preview, so maybe it really was an option to speed up desktop viewing.

Either way, whenever I see this, OCD kicks in and I look to clean it up. The issue is that the filenames are not linked for some reason, the dates mangled and I have 2000 of these images. What I need is a method to detect duplicate pictures (within a certain probability) for manual clean up.

DUPLICATE PICTURE DETECTION

I started messing with scipy to knock up a quick and dirty script for this, which works wonderfully for the test image I picked, but failed 9 of the other pictures I tried it with. Trying a low pass 2D filter before comparison helped, but now false positives started to pop up. Sub sampling is next (sample rate based on resolution), but it might be quicker to simply do this by hand... Or I might spend the next 3 months tinkering with it as usual.

Friday, May 10, 2013

Accessing OpenVPN from Android

A colleague from work recently pointed out that the $12 server he picked up after I had sent around the "LOOK AT THIS BARGIN" link was a perfect tool for circumventing the port block that was in place in the corporate guest wifi (said port block effectively rendering it useless for anything but basic web browsing).I don't know anything about this kind of behavior, yet was equally interested in this VPN for another, yet to be named use case.

THE INSTALL

The link below has the best explanation of how do this:

http://tipupdate.com/how-to-install-openvpn-on-ubuntu-vps/

I archived it as a PDF should this disappear any time soon.

On my personal experience of the installation was as follows (all done through "root"):

Success.. Success... Success... [Step 9 in the instructions] FAIL.

Note: in step 5, the following command is run:

. /etc/openvpn/easy-rsa/2.0/build-key client1

This is creating your client user name (i.e. the name you will log into the system as). Also critical for Android at least is to supply a password (and not just press enter on this field).

. /etc/openvpn/easy-rsa/2.0/build-key MY_INITIALS

THE TUN FAILURE

Looking at the log for the init.d startup script:

cat /var/log/syslog

Note: Cannot open TUN/TAP dev /dev/net/tun: No such file or directory (errno=2)
Note: Attempting fallback to kernel 2.2 TUN/TAP interface
Cannot allocate TUN/TAP dev dynamically

A quick google suggested that OpenVZ (the server virtual stack that is running on my VPS) often had this error, the root cause being that the kernel did not have the tun network module available. Some citations of security issues (although opinion seems divided) as the leading reason for it not being enabled by default.

Quick confirmation of this:

# modprobe tun
FATAL: Module tun not found.

A quick email to +URPad DC support and its resolved 10 mins later. Great support guys!

Still not working however. More Googlage and this fixed it:

mkdir /dev/net
mknod /dev/net/tun c 10 200

Carrying on from Step 9. Success... Success!

THE P12

What we have done is configured a VPN, secured with "L2TP/IPsec CRT". This is in effect a digital certificate based authentication that you can install on a client (Android phone, laptop etc...) and authenticate automatically with the VPS server.

Android prefers the certificate and key in a single package (pkcs12 to be specific), so we need to combine the client certs + keys into a single file.

In the directory where we created the client keys (/etc/openvpn/easy-rsa/2.0/keys), the following files exist:

-rw-r--r-- 1 root root 3913 Jan 22 09:39 client1.crt
-rw------- 1 root root 887 Jan 22 09:39 client1.key

# cd /etc/openvpn/easy-rsa/2.0/keys
# openssl pkcs12 -export -in client1.crt -inkey client1.key -certfile dh1024.pem -out certs.p12

This outputs the file "certs.p12" which is a combo of the .crt and .key file.

ANDROID INSTALL

To download the .p12 file from the server (created in step 3), some obvious ways exist:

Download the certificates via app like WinSCP or file manager such as Servant Salamander with a SCP plugin and copy to your Android phone via the SDCard or USB (mass storage or ADB if your adventurous)
Grab them directly from the server via your Android phone
Email them to yourself on the phone

I went for the latter option - nice and clean. Android supports receiving uuencoded data, which is very easy to send from a shell. On the server, I ran the following:

# cd /etc/openvpn/easy-rsa/2.0/keys

# uuencode certs.p12 certs.p12 | mail -s "VPN Files" MYEMAILADDRESS@gmail.com -- -f MYEMAILADDRESS@gmail.com

Note: The uuencode first param is the input file, the second is the name of the attachment you want the file to appear as in the email.

In gmail app in Android, I simply selected the file and "saved" it to the phone. This doesn't give you an option of where to save it, but that is not important thankfully.

You can then import the certificates by going to:

Settings Menu
Security Menu
Install from SDCard
Then select the "Download" directory and then the file that you emailed yourself.

Friday, March 29, 2013

Scraping Craigslist for mein Deutsches Auto

Recently, for 30+ years of good service, I was awarded the gift of a son, born 2 weeks early and bringing with him a set of lungs only a new father could love. As part of the GREAT PREPARATION for the boy, I was to source a laundry list of items, the most important being a car.

TWO CARS BETTER THAN ONE

The problem with buying a car is that I already had one. But, despite it having 2 very solid doors and a rag top, it did not transport infants according to rule one of THE WIFE's guidelines on acceptable child bearing machines. To replace it with a 4 door motor would require a vehicle that both satisfied my rule "never buy anything I don't want" and also the wife's "never buy a Mustang again" edict.

After two months of researching, I finally selected a steed I would be happy with.

BUYERS AND SUPPLIERS

It turned out incredible hard to find an example of THE CHOSEN CAR that I wanted however, a specific revision of an used Audi A4 Wagon with glass roof and lots of buttons. So difficult was this search that I found myself spending over an hour a day searching a bunch of websites for this specific model. On 2 occasions when I actually found the ideal car in an evenings search, following up in both occasions found that it had already been sold!

To add to the woes, my criteria for "what car" quickly grew as BABY DAY approached into accepting almost any car that wasn't "that damn Mustang" and now instead of spending an hour a day looking for one model, I was spending three times the effort speed reading around 100 new hits, researching the specs of these cars and checking if the pricing was good or not.

THE GREAT AUTOMATION

So, automating the search was the next step.Craigslist was where I bought the 'Stang from and where the other 2 hits came from for the new motor, so I started here.

There are many methods of monitoring Craigslist - RSS feeds being the officially supported method, but direct scraping the HTML feed also works. There are also third party apps that can do this on your behalf including browser plugins and mashup web services dedicated to searching. One particular app I had a lot of hope on was If This Then That.

Without reviewing each and every service in detail here including ITTT, about half of them worked well enough, yet all were either laggy or were not specific enough to save me significant time or give me confidence that it would be reliable enough to bag the motor.

THE WHOLE HOG

So I ended up putting in place a hosted Linux server running scripts that scraped things like Craigslist every 5 mins and emailed / texted me as soon as a model turned up that was in my search criteria. It was, as expected, a lot of fun.

You can find the code here if you find yourself with the same issue!

https://github.com/mrlambchop/clcarhunt

I have got a completely redone version that supports scraping of all sorts of content for personal pleasure which was much more interesting technically to put in place - will save this for another post.

And yes, I still have the Mustang. The boy is 4 months now ;0)

Wednesday, March 20, 2013

Everybody needs a cloud for a pillow

Home networks with NAS units are great for local storage of files, but I have always wary of exposing these servers directly to the great unwashed of the WWW as, quite frankly people, you sometimes get up to mischief.

YOU CAN SECURE THESE YOU KNOW

True. And I used to run a NAS ( +NETGEAR ReadyNAS ) from my home router, exposed to the internet, protected by SSH keys and obscure port mappings (YAY). Having not built the NAS linux kernel and user space from scratch however (or fully characterized all the processes / apps running on it), there was always a niggling doubt that it was simply obscure, not secure. Also, it consumed much of my time to maintain and one day, the device killed one of the RAID disks – the writing was on the wall for this little box that could, but did not. So I reinstalled the original firmware and for the sake of my kids baby photos, turning it back into an basic NAS box.

WHAT DO YOU WANT FROM THIS CLOUD ANYWAY? YOU BRITS ALWAYS COMPLAIN ABOUT THE RAIN

Well, I had a bunch of things I was messing around with, all of which needed something to talk to that was “always on”. A box in the cloud was perfect for this, so I started looking around to see what was available. Turns out there are a couple of ways to ways to get one of these there cloud boxes that fitted my "pay nothing to anyone" budget.

A dedicated PC or MAC, kept in a data center of your choosing
- You can supply the HW and pay the hosting centre fees or rent one
- The hosting service provides power, a network connection and a fixed IP address
- Runs your OS of choice! AmigaOS please.
- Physical access is generally required if you provided the box yourself, or some funky BIOS SW is available on pro servers to enable remote administration – some even support KVM and the ability to supply an ISO image for the CD or USB drives over the net.
A Virtual Private Server (VPS)

You pay a hosting service for an timeshare on a super fast server

Share CPU, memory, storage and network with multiple users

The hosting service can run a large selection of virtual machines / OS's for you to use, but the list is limited to what sells in volume (no AmigaOS…)

The latter sounded good, but VPS seems to come in many shapes and sizes! The characteristics I was looking for were limited to:

"Hack it if you want, its quick to reinstall and there is nothing secret on there anyway” administration panel access
Low(ish) processing and local storage
Good enough bandwidth limits (no plans to host torrents on it)
Super cheap!

THE LOW END BOX

The site LowEndBox was a great read on the different solutions available in the VPS Linux box hosting arena. Much discussion basically boils down to the following however:

Price
Reliability (up time)
Where is the server located
Are there any deals?
Does the owner post on the forums and does he reply to crazies posting about his company?

The site quickly led me to +URPad DC who had an offer for $12 for a year deal for an Ubuntu 12.04 installation. THIS IS CHEAPER THAN (some) BEER!

Any caveats? Well, the VPS is listed as "unmanaged", meaning that outside of the initial install, its all down to you to configure.

Update: Further investigations shows lowendstock as also being a great resource for budget VPS solutions. At the time of writing "FOUR DOLLAR VPS".

URPAD

12 dollar dollar bill yo’s later, and we’re in. What I liked:

Super quick to setup - payment went through and I received my login almost immediately
Easy to access hosting controls to wipe / provision the server
Small but good selection of Ubuntu packages pre-installed and APT running quickly to install any missing items
Reliable hosting (never found it broken or down so far)

The only negative would be that at one time, I found the VPS going super slow for a few minutes when I was simply at the shell – nothing else running. It wasn't anything critical, but it led to a wander into the tech behind their virtual server stack.

OpenVZ vs VMware

URPad runs OpenVZ. According to its wiki, its containerization of an OS instead of a entire virtual machine emulator (such as VMWare or VirtualBox). The interwebs do a better job of “what” here, so I knocked up a quick table of some of the differences in relation to the “why did my server go slow!” witch hunt:

	Virtualization	Containerization
Single Kernel?	No	Yes
Full System Isolation	Yes	No (common kernel)
Performance	OK	Better
Scheduler Contention	Yes	Yes
Resource control	Per VM	Per user within the VM
Isolation	Complete	Partial

How could this explain the slow down? Whilst a virtualized OS is still at the whim of the host’s scheduler (memory allocation, disk access etc…), within the virtualized OS, everything runs 'evenly'(ish) within that virtual guest. In theory, you can lock down the scheduling time of the virtual machines to a fairly granular level. With the containerized VM, as the kernel is not emulated, its possible to call various syscalls, sufficiently enough to load up the entire system and monopolize the hosts CPU time.

The pricing of VPS solutions based on containerization is fairly obvious – cheaper maintenance / resource requirements per virtual host and I am happy with the tradeoffs. Would I want a production server running on this infrastructure however? Probably not. In fact, I’d be straight over to Amazon .