Selective Parallelization using C++ Template Metaprogramming

This is the continuation of my previous post and I recommend reading it first.

Photo by Tine Ivanič on Unsplash

In the last post, we stopped after parallelizing the for-loop which was the performance bottleneck. We also managed to get a decent boost in performance thanks to OpenMP.

With all that good stuff, we might have overlooked a few potential issues. Firstly, the fact that we’re using a vector for our benchmarking. If we swap it out for say a std::list, the performance starts to drop.

Let’s take a closer look at the parallel implementation.

In all the three implementations, we’re relying on random-access of the elements inside the container.

It is clearly stated in the documentation that a std::list is not meant for random-access. That’s the reason why we aren’t getting the expected performance from our parallel loops.

We have two options here. Either make our code more generic or let the compiler decide on whether or not to use the parallel implementation depending on the container.

For now, let’s focus on the latter.

Selectively Picking a Parallel Implementation

Before we jump in, let’s define the behavior that we want.

  • We want the compiler to pick our parallel implementation whenever a supported container is used and it should fall back to normal implementation with any other container.
  • There should not be any run-time overhead.

The required behavior seems pretty familiar, isn’t it?

It’s just conditional branching. We could simply use the typeid operator from the standard typeinfo header to construct an if-else block.

The above example demonstrates how we can use typeid operator to perform the type checking.

We could come up with something like this.

Any optimizing compiler should probably eliminate the condition as one of them will be dead code.

But can we do better?

Yes, welcome to the world of template metaprogramming!

Checking Type Equality at Compile Time

Going by the rules of template metaprogramming, we should let the compiler generate the required source code during compile time. In our case, we want the compiler to make that decision based on the container used. So we need some kind of a type checking mechanism.

It’ll be convenient if we could just give in two types and get a boolean as the result. Let’s see how we do just that in plain C++.

In simple terms, we need to run the type checking expression but this time during compilation. I know that you’re thinking about constexpr.

And that should work fine.

We could also do the same without a constexpr, using partial template specialization.

With that our type checking part is sorted.

Partial Template Specialization

As we have seen earlier, partial template specialization lets us customize class templates for a new set of template arguments. We will be using this technique together with the type checking to get the defined behavior.

For it to work we will be needing both the parallel and serial implementations to coexist as two partial specializations.

Since C++ does not support partial specialization for function templates, we can wrap them in a struct.

Let’s take a look at the implementation of the map function.

We now have both the implementations available as partial specializations.

Before we update the main API we need one more utility function.

It’s time to update the main API of the library.

As we can see the main API just became a wrapper function. Don’t worry about the additional function call, the optimizer should inline this one.

Wrapping up

Now we have both parallel and serial code that the compiler is able to select appropriately depending on our choice of container. All of this without any additional run-time overhead.

The compiler used: GCC 7.3

The code is available here.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store