Day 65: Profiling CUDA performance

Today’s new thing was spending a bit of time profiling the behavior of the CUDA cores when it comes to manipulating vectors.

Getting started with CUDA

As with any new platform being used on a machine you don’t use very often, most of the day was spent updating the machine, reading up about CUDA, getting the libraries installed. And because I can, finding a managed wrapper so I can use CUDA from C# and F#. Luckily such a library exists, in the form of the well-maintained managedCuda project (get it from GitHub).

The results were nothing short of staggering. For the first test, once I managed to work out how to write a CUDA kernel, we just did some simple vector addition.

CUDA: Run #1

For vectors with 50,000 random elements, CUDA was five times faster than plain old C#. It gets more interesting though. Try it with 500,000 elements, and the CUDA code performs twenty times faster than C#.

CUDA: Run #2

You can guess where this is going: how about 5,000,000?

The data in the table below is for 10,000 iterations of adding two random float vectors of length N, averaged over 20 runs. You can spot the pattern right? While the time taken for  addition in C# increases linearly with the number of elements, the CUDA calculations don’t.

Vector length N C# (ms) CUDA (ms)
50,000 2585 559
500,000 21584 1298
5,000,000 214321 8738

What’s next

Now, to get even larger vectors tested on my machine with the test code I wrote i not really possible, since I only have 32GB of system RAM available, and if I start paging (even with the silly M2 drive) performance will be much more bound to IO than the GPU/CPU comparison.

I realize none of the above is particularly complex, but, it is interesting and very new. And it is tempting me to take a project I was working on a few years ago, and swapping out the core logic (it had lots of cosine similarity calculations) and seeing how it would perform with CUDA. It is fun though, and I’m pleasantly surprised at how well documented all of this space is. It took a lot less time than I would have expected.

Hey, were are the missing days?

I’ve been busy at work, travelling and doing the new things. Not to forget avoiding getting mauled by my mini-lions. This means I haven’t been able to find the time to write the posts in time for a week or so. I’ll post this one today, and back-fill as quickly as I can during the week.

Day 34: Use F# to analyze the post data

Wouldn’t it be fun to analyse some of the data that this project is generating? Today is something completely different to the preceding 33 days of new things, namely to implement some fun text analyses code in F#.

post data

Some interesting post data

So far the total number of unique words used across the 33 posts preceding this post is 2492. The average number of unique words per post is 168.2. The average post length is 281 words. If I can settle on a definition for stop words that I like, then it turns out you usually get to read every word only twice per post on average. Which is pretty exciting.

Something more interesting about the post data than that

The really new part today was diving back into F# to use it to implement some text similarity analysis functionality. There’s nothing particularly interesting as yet, partly I don’t have enough data points as yet to extract much meaning. And, as you can judge by some of the simple stats in the first paragraph, I needed to get my F# skills sharpened again. As well as burning through some time trying to turn the WordPress RSS feed into something usable for analysis purposes.

Today’s main stat is comparing the posts to each other for similarity, as determined by the Jaccard distance between the posts. It’s fairly naive, but for the first time, I’ve implemented calculations in an F# 4.4 assembly that’s called from a C# app that is pulling the data down from WordPress.

The stand-out post

I was surprised to find that one of the posts pops out as being ‘odd’. Reading back through it, it’s not that surprising. It’s interestingly also one of the most-read posts on the blog so far. The blog post it is most similar to is an unlikely candidate, so it’s all probably just noise.

I’m not sharing any code just yet – I need to clean it up as I only spent slightly more than an hour on the whole thing, and it’s a mess. It’ll be on Github when I get around to it.

Day 11: Start a modelling project with CNTK

I finally got around to installing CNTK. Having run the demos, I started thinking about what I could play around with as a project. A couple of years ago I gathered some cricket data for a different project. That data is on a backup drive somewhere. I didn’t really feel like looking for that today.

A quick search on Bing unearthed this gem: 11 years worth of ball-by-ball game data. That seemed like a good starting point. You might think that there’s not enough data there, but at a ball-by-ball level, it is a couple of million data points, enough to get started with. The data itself is in retrosheet format, originally used for baseball.

Not many results yet: most of the was spent writing the code to funnel data from the YAML format into something more useful, setting up a VSO site for the code, as well as watching the South Africa vs Sri Lanka test match.

As the code cleans up, and the results hopefully start rolling in, I’ll share them probably on Github or on this site somewhere.

Day 6: A double header: health tonic and a parser that parses first time

Purdey's Edge

Today’s first new thing is a bit lazy. It’s winter and everyone is a bit under the weather after the festive days. What better cure than a health tonic. Or as the bottle says: Elixir vitae.
Yes, its Purdey’s Edge multivitamin health drink. I was fully expecting it to be disgusting. It turns out to be quite tasty. I might try it again.

Perhaps suitably reinvigorated by the tonic, I also remembered a second new thing for the day. As you may or may not know I work as a software engineer at Microsoft. This week being a fairly quiet week so decided to pick up one of those side projects that always seem to remain unfinished. In process of which I had to write a parser that needs to deal with a fairly funky input format. Usually debugging takes a while, but today the tests all passed first time. And I don’t think that has ever happened to me first time. And yes, there’s more than one test. So either I’ve been writing too many parsers, or, more likely: quiet, uninterrupted coding time leads to fewer errors!