Friday, January 24, 2020

The Unbearable Lightness of a Lambda

I believe that one of the game change features in c++ were the introduction of lambdas I also believe that performance-wise lambda are underestimated, the reason for not considering the cost of a lambda is the fact that are considered just as functions.
Given the following code

And playing the compiler game this is what happens:

Now it's clear that a lambda has a state. With the help of godbolt looks like that if the captured amount of memory exceeds 16 bytes both compilers I have tried (gcc and clang) perform an allocation on the heap. Let see how we can avoid this madness.
Let's imagine we have some pieces of code that we need to execute after a preamble and before a postamble, something like the following:

This is the generated code and you can observe the heap allocation made to store the lambda "internal state"

In this case, we can completely dodge the capture and save us some headaches avoiding the heap allocation when the lambda is converted to that std::function, we can make the function template on the lambda and avoid the std::function conversion. And this is what we get

We have another issue now, PrePost function is not generic (even if it seems).
Let's make it generic, as it is indeed doesn't work for example with mutable lambdas, the following code, for instance, does not compile:

We need to get the lambda as a Universal Reference, that is:

We are not done yet with PrePost indeed as it is can also get callable objects but it doesn't behave well with callable objects with "ref-qualified methods" such as:

As it is our PrePost function is bugged, it compiles but doesn't do what we expect, in the following code the r-value operator is expected to be used but is not

In order to fix the issue we need to "perfect forward" the function/lambda, this is the last version of PrePost that covers all cases

Similar links

Avoid the performance hazzard of std::function
Efficient use of lambda expressions


Anonymous said...

This article implies that the compiler is allocating memory for the lambda when you capture too much data. I don't believe that is true. It is the conversion of the lambda to a std::function that is causing the allocation. Just passing the lambda as a template param (eg, like STL algorithms accept), should avoid the allocation in all cases.

Gaetano said...

Thanks for pointing me that out. The solution is even simpler than that.