Friday, January 24, 2020

The Unbearable Lightness of a Lambda

I believe that one of the game change features in c++ were the introduction of lambdas I also believe that performance-wise lambda are underestimated, the reason for not considering the cost of a lambda is the fact that are considered just as functions.
Given the following code

And playing the compiler game this is what happens:

Now it's clear that a lambda has a state. With the help of godbolt looks like that if the captured amount of memory exceeds 16 bytes both compilers I have tried (gcc and clang) perform an allocation on the heap. Let see how we can avoid this madness.
Let's imagine we have some pieces of code that we need to execute after a preamble and before a postamble, something like the following:

This is the generated code and you can observe the heap allocation made to store the lambda "internal state"

In this case, we can completely dodge the capture and save us some headaches avoiding the heap allocation when the lambda is converted to that std::function, we can make the function template on the lambda and avoid the std::function conversion. And this is what we get

We have another issue now, PrePost function is not generic (even if it seems).
Let's make it generic, as it is indeed doesn't work for example with mutable lambdas, the following code, for instance, does not compile:

We need to get the lambda as a Universal Reference, that is:

We are not done yet with PrePost indeed as it is can also get callable objects but it doesn't behave well with callable objects with "ref-qualified methods" such as:

As it is our PrePost function is bugged, it compiles but doesn't do what we expect, in the following code the r-value operator is expected to be used but is not

In order to fix the issue we need to "perfect forward" the function/lambda, this is the last version of PrePost that covers all cases

Similar links

Avoid the performance hazzard of std::function
Efficient use of lambda expressions

Monday, May 8, 2017

Thread Collisions detector - Fake Mutex

Some years ago I found myself wondering if a not supposed thread safe class was being used by multiple threads without being synchronized. In that occasion I wrote about it here Threading mess! and here Threading mess (2)!.
At that time (9 years ago) we had no threads neither atomic in the c++ standard and the solution proposed was based on pthreads and on gcc atomic builtins. I think it's time to refresh the implementation using some C++11 features.
The idea is very simple, upon entering a critical section (part of code that should not be executed concurrently) we should save the current thread id resetting the stored value as soon the thread leaves the critical section. If a thread tries to enter a critical section but we already have a thread id saved then we have detected the collision.
The technique is very effective and at that time I wrote for the Chromium project the class ThreadCollisionWarner thread_collision_warner.h and using the described technique.
Basically what you need to do is to add to your classes a "FakeMutex" and then "locking" it where it's needed as you would do with a real mutex. It's called Fake Mutex because it will not suspend a thread if another one is active but it will assert(false) instead. If you want to use this technique in your project I suggest to use the implementation done in Chromium.

Examples of uses:

the macros DFAKE_MUTEX, DFAKE_SCOPED_LOCK, DFAKE_SCOPED_RECURSIVE_LOCK and DFAKE_SCOPED_LOCK_THREAD_LOCKED are defined only if compiled in DEBUG mode removing from your production code the atomic overhead.

The modern simplified version of Chromium ThreadCollisionWarner proposed in Threading mess (2)! is reported here.

Friday, March 31, 2017

Structured Binding (C++17 inside)

Let's see how structured binding introduced in C++17 will change the way to interact with std::pair, std::tuple, std::array and such:
     std::pair<int,float> foo();  
     auto [a,b] = foo();  
it will replaces:
     std::pair<int,float> foo();  
     int a;  
     float b;  
     std::tie(a,b) = foo();  
in case you are running an obfuscated code contest this new swap can scrub up well (please don't try this at home):
     std::tie(a,b) = std::make_pair(b,a);  
The decomposition works with c array and std::array as well:
    int a[4] = { 1, 2, 3, 4};  
    auto [b,c,d,e] = a;  
    std::array<int, 4> a;  
    auto [b,c,d,e] = a;  
and this is what you can do using a ranged for loop:
    std::map myMap;  
    for (const auto & [k,v] : myMap) {  
I bet someone in c++ committee has become recently a python enthusiast. Now if you wonder what your structures like this:
   stuct X {  
    int theInt = 3;  
    float thePi = 3.14;  
   auto [a,b] = x;  
shall provide to make the decomposition working the response is: a plain nothing. That will work indeed off the shelf.

Unfortunately if you need to do something more fancy with your class it has to support the get<>() functions, and you need to reopen the std namespace to specialize std::tuple_size and std::tuple_element.

Given the following user defined type (note a and b here are private members):
   class Y {  
    int foo() const {  
     return a;  
    float bar() const {  
     return b;  
    int a = 3;  
    float b = 3.14;  
you need to provide the gets<>() functions:
   template <int N> auto get(Y const &);  
   template <> auto get<0>(Y const & aY) {  
   template <> auto get<1>(Y const & aY) {  
and then you need to reopen the std namespace (one of those few allowed cases):
   namespace std {  
    struct std::tuple_size<Y> {  
      static const size_t value = 2;  
    template<size_t I>  
    struct std::tuple_element<I, Y> {  
     using type = decltype(get<I>(declval<Y>()));  
Note the partial specialization for std::tuple_element, you don't need to hard code the type of each index, it's enough to "deduce" it using the get function. You did a lot of work in order to have your class supporting the decomposition, in this case c++17 can save you some work taking advantage of a new c++17 feature, the "constexpr if", just writing a single version of get<>():

   template<int N>   
   auto get(Y const & aY) {  
     static_assert(N==0 || N==1);  
     if constexpr (N == 0) {  
     } else if constexpr (N == 1) {  
If you want use/experiment with those new language features go for clang++ (I tried only version 5.0 but it should work with the 4.0 as well) and you need to specify -std=gnu++1z

Saturday, March 18, 2017


Let's start from something very easy, you have the simple function:

soon you realize you need another one but for float and before you even need the third one you write it like this:

looks like you are done, everything goes well until, while compiling your huge project, the compiler gives the following error:

right, you think, the type Point needs to have defined the operator+, after defining it and after some precious minutes you get now another error:

and finally this is the last error and fixing it fixes your whole compilation.

Sometime that Point type is defined in an header that makes your entire project to recompile every time you add a missing feature.

Let see how can concepts can save our time and some headache.

Basically that sumScale function has a strong requirement on the type T. It should be a summable type (it has to support the operator+) and it should be scalable (it has to support operator* with an int), we can express these two concepts in the following way:

and then use this defined concept rewriting the sumScale function:

doing so the error would have been a more useful one:

wow, within a single iteration the compiler was able to gives us all the information we needed in order to fix the issue. In case you missed it I'll report for convenience the old and the new version of sumScale function.

and this is, in my humble opinion, one of the main advantages of concepts: simplify the generic programming taking rid of the cumbersome template syntax.

Let's go back now to our concept:

this concept is the refining of a Scalable concept indeed we can define the SummableScalable concept writing first a Summable concept then refining it in the following way:

even better we can combine two concepts Summale + Scalable obtaining a third one:

I believe when we will get the concepts available on our preferred compiler, for instance the concepts didn't make C++17 and today (at my knowledge) the concepts are implemented only in GCC 6 (using the flag -fconcepts), it will change the face of c++ code even more the c++11 specification did.

Friday, June 20, 2014

Deal with OOM conditions

Imagine the following sci-fi scenario: your code is in the middle of aborting a nuclear missiles launch fired by mistake, it needs to allocate some memory and unfortunately the BOFH is using all the physical and virtual memory because, he just can.

What shall we do?

The life of thousand people depends on that function you need to call, passing to it some fresh allocated memory. The operator new (unless the placement one is called) deals with OOM condition throwing a bad_alloc or returning a null-pointer in case the nothrow version of it is used.

But as programmer what can you do when a bad_alloc is thrown or a null-pointer is returned?

There are several options, but the most "nifty" one is the following.

When the operator new is not able to allocate the required memory it calls a function, at this point the function can try to free some memory, throwing an exception or exit the program. Exiting the program is not a good option I have to say, indeed the caller of the operator new (or operator new [] for the matter) expects a bad_alloc (or a derivation of it) or a nullptr (in case the nothrow was used).

A programmer is able to specify the function to be call in case of OOM with the following function:

the operator new will keep calling the specified function every time it tries to allocate memory and it doesn't succeeded. A programmer can exploit this mechanism in the following way:
  1. Allocate a the programming startup a bunch of memory reserving it for future uses.
  2. Install the new handler that will free the reserved memory, in case the reserved memory was already release then throw bad_alloc. 
The following code does exactly what described:

issuing a ulimit -v 100000 before to run it (in order to decrease the memory that can be used), the output is the following:

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)

As you can see at least once we were able to free some memory and the first allocation after the OOM condition was able to allocate memory due the fact some was freed by us, unfortunately there were no more space on the second call. You have no excuse anymore to have a crash due to OOM condition, what you can do at least is to free the memory, launch a warning, writing in the logs, send a message to a pager or whatever action that soon the memory will be over for real!

Wednesday, May 21, 2014

Prevent exceptions from leaving destructors. Now!

Any seasoned C++ programmer should now that permitting an exception to leave the destructor is bad practice, googling for "throw exception destructor" it leads to enough results convincing yourself that is a bad practice (see for example Meyers's "More effective C++" Item 11). Most of the arguments are: "if an object is destroyed during a stack unwinding then throwing an exception it triggers the terminate function" or "if an STL container is being destroyed it start to destroy all his contained elements and given the fact the STL containers do not expect an exception being thrown then it will not complete the destruction of the remaining objects".

If you still are not convinced by those arguments then I hope you will buy at least the following. Let's look at a possible implementation of an unique ptr (apart the -> and * operators):

and a possible use:

As you can see the AutoPtr::reset() deletes the stored pointer and then is not able to nullify it due the throw, as soon as the "a" instance goes out of scope due the stack unwinding then ~AutoPtr deletes again thePointer. A possible implementation of reset can be the following:

but unfortunately it not saves you! Indeed in c++11 specification you can "find" the following:
12.4.3: A declaration of a destructor that does not have an exception-specification is implicitly considered to have the same exception-specification as an implicit declaration (15.4).
and again:
Whenever an exception is thrown and the search for a handler (15.3) encounters the outermost block of a function with an exception-specification that does not allow the exception, then, — if the exception-specification is a dynamic-exception-specification, the function std::unexpected() is called (15.5.2), — otherwise, the function std::terminate() is called (15.5.1).
that means that throwing an exception from a DTOR terminates your program and it doesn't matter if a stack unwinding is going on or not.

This simple example

does generate a crash if compiled in c++11 mode with gcc (4.8 and 4.9) and clang (3.5) while with intel icc 14.01 doens't call the std::unexpected either the std::terminate (time to fill an icc bug?)

Sunday, March 9, 2014


C++11 introduced the ability to "ref-qualifier" methods. The most known qualifier is the const one:

however now is also possible to ref-qualify *this

let see how this can be of any use. Immagine to have a factory building heavy objects and returning them by copy this way:

in the following scenario we can avoid an useless copy:

we can avoid the copy if Jumbo is movable overloading the method getJumboByCopy in case the object on which I'm calling it is a temporary:

To be honest the example shows a scenario with other problems than the one mentioned (for instance if the object Jumbo is so big why permitting the copy then?) but I hope you got the idea.