I believe that one of the game change features in c++ were the introduction of lambdas I also believe that performance-wise lambda are underestimated, the reason for not considering the cost of a lambda is the fact that are considered just as functions.
Given the following code
And playing the compiler game this is what happens:
Now it's clear that a lambda has a state.
With the help of godbolt looks like that if the captured amount of memory exceeds 16 bytes both compilers I have tried (gcc and clang) perform an allocation on the heap.
Let see how we can avoid this madness.
Let's imagine we have some pieces of code that we need to execute after a preamble and before a postamble, something like the following:
This is the generated code and you can observe the heap allocation made to store the lambda "internal state"
In this case, we can completely dodge the capture and save us some headaches avoiding the heap allocation when the lambda is converted to that std::function, we can make the function template on the lambda and avoid the std::function conversion.
And this is what we get
We have another issue now, PrePost function is not generic (even if it seems).
Let's make it generic, as it is indeed doesn't work for example with mutable lambdas, the following code, for instance, does not compile:
We need to get the lambda as a Universal Reference, that is:
We are not done yet with PrePost indeed as it is can also get callable objects but it doesn't behave well with callable objects with "ref-qualified methods" such as:
As it is our PrePost function is bugged, it compiles but doesn't do what we expect, in the following code the r-value operator is expected to be used but is not
In order to fix the issue we need to "perfect forward" the function/lambda, this is the last version of PrePost that covers all cases
Similar links
Avoid the performance hazzard of std::function
Efficient use of lambda expressions