Friday, January 24, 2020

The Unbearable Lightness of a Lambda

I believe that one of the game change features in c++ were the introduction of lambdas I also believe that performance-wise lambda are underestimated, the reason for not considering the cost of a lambda is the fact that are considered just as functions.
Given the following code

double a = 1.1;
double b = 2.2;
[&a, &b](){
std::cout << "a: " << a << " - b: " << b << "\n";
}();
view raw LambdaCall.cpp hosted with ❤ by GitHub
And playing the compiler game this is what happens:

double a = 1.1;
double b = 2.2;
class __lambda__ {
public:
__lambda__(double& a, double& b):a_(a),b_(b) {}
void operator()() const {
std::cout << "a: " << a_ << " - b: " << b_ << "\n";
}
private:
const double& a_;
const double& b_;
} __lambda__{a, b};
__lambda__();
Now it's clear that a lambda has a state. With the help of godbolt looks like that if the captured amount of memory exceeds 16 bytes both compilers I have tried (gcc and clang) perform an allocation on the heap. Let see how we can avoid this madness.
Let's imagine we have some pieces of code that we need to execute after a preamble and before a postamble, something like the following:

void PrePost(const std::function<void(void)>& f) {
std::cout << "pre\n";
f();
std::cout << "post\n";
}
int main() {
double a = 1.1;
double b = 2.2;
double c = 3.3;
PrePost(
[&a, &b, &c](){
std::cout << a << " " << b << " " << c << "\n";
}
);
}
This is the generated code and you can observe the heap allocation made to store the lambda "internal state"


In this case, we can completely dodge the capture and save us some headaches avoiding the heap allocation when the lambda is converted to that std::function, we can make the function template on the lambda and avoid the std::function conversion. And this is what we get

template <typename F>
void PrePost(const F& f) {
std::cout << "pre\n";
f();
std::cout << "post\n";
}
int main() {
double a = 1.1;
double b = 2.2;
double c = 3.3;
PrePost(
[&a, &b, &c](){
std::cout << a << " " << b << " " << c << "\n";
});
}

We have another issue now, PrePost function is not generic (even if it seems).
Let's make it generic, as it is indeed doesn't work for example with mutable lambdas, the following code, for instance, does not compile:

template <typename F>
void PrePost(const F& f) {
std::cout << "pre\n";
f();
std::cout << "post\n";
}
int main() {
double a = 1.1;
double b = 2.2;
double c = 3.3;
PrePost([&a, &b, c]() mutable {
c = 4;
std::cout << a << " " << b << " " << c << "\n";
});
return 0;
}
We need to get the lambda as a Universal Reference, that is:

template <typename F>
void PrePost(F&& f) {
std::cout << "pre\n";
f();
std::cout << "post\n";
}
int main() {
double a = 1.1;
double b = 2.2;
double c = 3.3;
PrePost([&a, &b, c]() mutable {
c = 4;
std::cout << a << " " << b << " " << c << "\n";
});
return 0;
}
We are not done yet with PrePost indeed as it is can also get callable objects but it doesn't behave well with callable objects with "ref-qualified methods" such as:

struct Foo {
// used when Foo instance is an l-value
void operator()() const & {
std::cout << "l-value" << "\n";
}
// used when Foo isntance is an r-value
void operator()() const && {
std::cout << "r-value" << "\n";
}
};
Foo f;
f(); // First operator is used
Foo()(); // Second operator is used
As it is our PrePost function is bugged, it compiles but doesn't do what we expect, in the following code the r-value operator is expected to be used but is not

struct Foo {
void operator()() const & {
std::cout << "l-value" << "\n";
}
void operator()() const && {
std::cout << "r-value" << "\n";
}
};
PrePost(Foo());
In order to fix the issue we need to "perfect forward" the function/lambda, this is the last version of PrePost that covers all cases

template <typename F>
void PrePost(F&& f) {
std::cout << "pre\n";
std::forward<F>(f)();
std::cout << "post\n";
}
Similar links

Avoid the performance hazzard of std::function
Efficient use of lambda expressions

Monday, May 8, 2017

Thread Collisions detector - Fake Mutex

Some years ago I found myself wondering if a not supposed thread safe class was being used by multiple threads without being synchronized. In that occasion I wrote about it here Threading mess! and here Threading mess (2)!.
At that time (9 years ago) we had no threads neither atomic in the c++ standard and the solution proposed was based on pthreads and on gcc atomic builtins. I think it's time to refresh the implementation using some C++11 features.
The idea is very simple, upon entering a critical section (part of code that should not be executed concurrently) we should save the current thread id resetting the stored value as soon the thread leaves the critical section. If a thread tries to enter a critical section but we already have a thread id saved then we have detected the collision.
The technique is very effective and at that time I wrote for the Chromium project the class ThreadCollisionWarner thread_collision_warner.h and thread_collision_warner.cc using the described technique.
Basically what you need to do is to add to your classes a "FakeMutex" and then "locking" it where it's needed as you would do with a real mutex. It's called Fake Mutex because it will not suspend a thread if another one is active but it will assert(false) instead. If you want to use this technique in your project I suggest to use the implementation done in Chromium.

Examples of uses:

// Example: Queue implementation non thread-safe but still usable if clients
// are synchronized somehow.
//
// In this case the macro DFAKE_SCOPED_LOCK has to be
// used, it checks that if a thread is inside the push/pop then
// noone else is still inside the pop/push
class NonThreadSafeQueue {
public:
...
void push(int) { DFAKE_SCOPED_LOCK(push_pop_); ... }
int pop() { DFAKE_SCOPED_LOCK(push_pop_); ... }
...
private:
DFAKE_MUTEX(push_pop_);
};
// Example: Queue implementation non thread-safe but still usable if clients
// are synchronized somehow, it calls a method to "protect" from
// a "protected" method
//
// In this case the macro DFAKE_SCOPED_RECURSIVE_LOCK
// has to be used, it checks that if a thread is inside the push/pop
// then noone else is still inside the pop/push
class NonThreadSafeQueue {
public:
void push(int) {
DFAKE_SCOPED_LOCK(push_pop_);
...
}
int pop() {
DFAKE_SCOPED_RECURSIVE_LOCK(push_pop_);
bar();
...
}
void bar() { DFAKE_SCOPED_RECURSIVE_LOCK(push_pop_); ... }
...
private:
DFAKE_MUTEX(push_pop_);
};
// Example: Queue implementation not usable even if clients are synchronized,
// so only one thread in the class life cycle can use the two members
// push/pop.
//
// In this case the macro DFAKE_SCOPED_LOCK_THREAD_LOCKED pins the
// specified
// critical section the first time a thread enters push or pop, from
// that time on only that thread is allowed to execute push or pop.
class NonThreadSafeQueue {
public:
...
void push(int) { DFAKE_SCOPED_LOCK_THREAD_LOCKED(push_pop_); ... }
int pop() { DFAKE_SCOPED_LOCK_THREAD_LOCKED(push_pop_); ... }
...
private:
DFAKE_MUTEX(push_pop_);
};
// Example: Class that has to be contructed/destroyed on same thread, it has
// a "shareable" method (with external synchronization) and a not
// shareable method (even with external synchronization).
//
// In this case 3 Critical sections have to be defined
class ExoticClass {
public:
ExoticClass() { DFAKE_SCOPED_LOCK_THREAD_LOCKED(ctor_dtor_); ... }
~ExoticClass() { DFAKE_SCOPED_LOCK_THREAD_LOCKED(ctor_dtor_); ... }
void Shareable() { DFAKE_SCOPED_LOCK(shareable_section_); ... }
void NotShareable() { DFAKE_SCOPED_LOCK_THREAD_LOCKED(ctor_dtor_); ... }
...
private:
DFAKE_MUTEX(ctor_dtor_);
DFAKE_MUTEX(shareable_section_);
};
view raw fakemutex hosted with ❤ by GitHub
the macros DFAKE_MUTEX, DFAKE_SCOPED_LOCK, DFAKE_SCOPED_RECURSIVE_LOCK and DFAKE_SCOPED_LOCK_THREAD_LOCKED are defined only if compiled in DEBUG mode removing from your production code the atomic overhead.

The modern simplified version of Chromium ThreadCollisionWarner proposed in Threading mess (2)! is reported here.

#pragma once
#include <atomic>
#include <cassert>
#include <stdexcept>
#include <thread>
#ifdef NDEBUG
#define THREAD_WATCH(obj)
#define SCOPED_WATCH(obj)
#define WATCH(obj)
#else
#define THREAD_WATCH(obj) ThreadCollisionWarning _##obj;
#define SCOPED_WATCH(obj) ThreadCollisionWarning::ScopedWatch sw_##obj(_##obj);
#define WATCH(obj) ThreadCollisionWarning::Watch w_##obj(_##obj);
#endif
class ThreadCollisionWarning
{
public:
ThreadCollisionWarning()
: theActiveThread() {
assert(theActiveThread.is_lock_free());
}
~ThreadCollisionWarning() {
}
class Watch
{
public:
Watch(ThreadCollisionWarning& aTCW)
: theWarner(aTCW) {
theWarner.enter_self();
}
~Watch() {
}
private:
ThreadCollisionWarning& theWarner;
};
class ScopedWatch
{
public:
ScopedWatch(ThreadCollisionWarning& aTCW)
: theWarner(aTCW) {
theWarner.enter();
}
~ScopedWatch() {
theWarner.leave();
}
private:
ThreadCollisionWarning& theWarner;
};
private:
void enter_self() {
auto myExpectedId = std::thread::id();
if (!theActiveThread.compare_exchange_strong(myExpectedId,
std::this_thread::get_id())) {
// Last chance! may be is the thread itself calling within a critical
// section another critical section
if (theActiveThread.load() != std::thread::id()) {
throw std::runtime_error("Thread Collision");
}
}
}
void enter() {
auto myExpectedId = std::thread::id();
if (!theActiveThread.compare_exchange_strong(myExpectedId,
std::this_thread::get_id())) {
// gotcha! another thread is trying to use the same class
throw std::runtime_error("Thread Collision");
}
}
void leave() {
theActiveThread.store(std::thread::id());
}
std::atomic<std::thread::id> theActiveThread;
};

Friday, March 31, 2017

Structured Binding (C++17 inside)

Let's see how structured binding introduced in C++17 will change the way to interact with std::pair, std::tuple, std::array and such:
     std::pair<int,float> foo();  
     auto [a,b] = foo();  
it will replaces:
     std::pair<int,float> foo();  
     int a;  
     float b;  
     std::tie(a,b) = foo();  
in case you are running an obfuscated code contest this new swap can scrub up well (please don't try this at home):
     std::tie(a,b) = std::make_pair(b,a);  
The decomposition works with c array and std::array as well:
    int a[4] = { 1, 2, 3, 4};  
    auto [b,c,d,e] = a;  
    std::array<int, 4> a;  
    auto [b,c,d,e] = a;  
and this is what you can do using a ranged for loop:
    std::map myMap;  
    ...  
    for (const auto & [k,v] : myMap) {  
    }  
I bet someone in c++ committee has become recently a python enthusiast. Now if you wonder what your structures like this:
   stuct X {  
    int theInt = 3;  
    float thePi = 3.14;  
   };  
   auto [a,b] = x;  
shall provide to make the decomposition working the response is: a plain nothing. That will work indeed off the shelf.

Unfortunately if you need to do something more fancy with your class it has to support the get<>() functions, and you need to reopen the std namespace to specialize std::tuple_size and std::tuple_element.

Given the following user defined type (note a and b here are private members):
   class Y {  
   public:  
    int foo() const {  
     return a;  
    }  
    float bar() const {  
     return b;  
    }  
   private:  
    int a = 3;  
    float b = 3.14;  
   };  
you need to provide the gets<>() functions:
   template <int N> auto get(Y const &);  
   template <> auto get<0>(Y const & aY) {  
    return aY.foo();  
   }  
   template <> auto get<1>(Y const & aY) {  
    return aY.bar();  
   }  
and then you need to reopen the std namespace (one of those few allowed cases):
   namespace std {  
    template<>  
    struct std::tuple_size<Y> {  
      static const size_t value = 2;  
    };  
    template<size_t I>  
    struct std::tuple_element<I, Y> {  
     using type = decltype(get<I>(declval<Y>()));  
    };  
   }  
Note the partial specialization for std::tuple_element, you don't need to hard code the type of each index, it's enough to "deduce" it using the get function. You did a lot of work in order to have your class supporting the decomposition, in this case c++17 can save you some work taking advantage of a new c++17 feature, the "constexpr if", just writing a single version of get<>():

   template<int N>   
   auto get(Y const & aY) {  
     static_assert(N==0 || N==1);  
     if constexpr (N == 0) {  
       return aY.foo();  
     } else if constexpr (N == 1) {  
       return aY.bar();  
     }  
   }  
If you want use/experiment with those new language features go for clang++ (I tried only version 5.0 but it should work with the 4.0 as well) and you need to specify -std=gnu++1z

Saturday, March 18, 2017

Concepts!

Let's start from something very easy, you have the simple function:

int sumScale(int a, int b, int c) {
return c*(a + b);
}
view raw SumScaleForInt hosted with ❤ by GitHub
soon you realize you need another one but for float and before you even need the third one you write it like this:

template <class T>
T sumScale(T a, T b, int c) {
return c*(a + b);
}
looks like you are done, everything goes well until, while compiling your huge project, the compiler gives the following error:

sumScale.cpp: In instantiation of ‘T sumScale(T, T, int) [with T = Point]’:
sumScale.cpp:24:26: required from here
sumScale.cpp:13:15: error: no match for ‘operator+’ (operand types are ‘Point’ and ‘Point’)
return c*(a + b);
~~~^~~~
view raw Error1 hosted with ❤ by GitHub
right, you think, the type Point needs to have defined the operator+, after defining it and after some precious minutes you get now another error:

sumScale.cpp: In instantiation of ‘T sumScale(T, T, int) [with T = Point]’:
sumScale.cpp:28:26: required from here
sumScale.cpp:17:11: error: no match for ‘operator*’ (operand types are ‘int’ and ‘Point’)
return c*(a + b);
~^~~~~~~~
view raw Error2 hosted with ❤ by GitHub
and finally this is the last error and fixing it fixes your whole compilation.

Sometime that Point type is defined in an header that makes your entire project to recompile every time you add a missing feature.

Let see how can concepts can save our time and some headache.

Basically that sumScale function has a strong requirement on the type T. It should be a summable type (it has to support the operator+) and it should be scalable (it has to support operator* with an int), we can express these two concepts in the following way:

template <class T>
concept bool SummableScalable() {
return requires(T a, T b, int c) {
{a + b}->T;
{c * a}->T;
};
}
and then use this defined concept rewriting the sumScale function:

SummableScalable sumScale(SummableScalable a, SummableScalable b, int c) {
return c*(a + b);
}
doing so the error would have been a more useful one:

sumScale_concepts.cpp: In function ‘int main()’:
sumScale_concepts.cpp:40:26: error: cannot call function ‘auto sumScale(auto:1, auto:1, int) [with auto:1 = Point]’
auto c = sumScale(a,b,3);
^
sumScale_concepts.cpp:28:18: note: constraints not satisfied
SummableScalable sumScale(SummableScalable a, SummableScalable b, int c) {
^~~~~~~~
sumScale_concepts.cpp:21:14: note: within ‘template<class T> concept bool SummableScalable() [with T = Point]’
concept bool SummableScalable() {
^~~~~~~~~~~~~~~~
sumScale_concepts.cpp:21:14: note: with ‘Point a’
sumScale_concepts.cpp:21:14: note: with ‘Point b’
sumScale_concepts.cpp:21:14: note: with ‘int c’
sumScale_concepts.cpp:21:14: note: the required expression ‘(a + b)’ would be ill-formed
sumScale_concepts.cpp:21:14: note: the required expression ‘(c * a)’ would be ill-formed
view raw Error3 hosted with ❤ by GitHub
wow, within a single iteration the compiler was able to gives us all the information we needed in order to fix the issue. In case you missed it I'll report for convenience the old and the new version of sumScale function.

template <class T>
T sumScale(T a, T b, int c) {
return c*(a + b);
}
SummableScalable sumScale(SummableScalable a, SummableScalable b, int c) {
return c*(a + b);
}
and this is, in my humble opinion, one of the main advantages of concepts: simplify the generic programming taking rid of the cumbersome template syntax.

Let's go back now to our concept:

template <class T>
concept bool SummableScalable() {
return requires(T a, T b, int c) {
{a + b}->T;
{c * a}->T;
};
}
this concept is the refining of a Scalable concept indeed we can define the SummableScalable concept writing first a Summable concept then refining it in the following way:

template <class T>
concept bool Summable() {
return requires(T a, T b) {
{a + b}->T;
};
}
template <class T>
concept bool SummableScalable() {
return Summable<T>() &&
requires(T a, int c) {
{c * a}->T;
};
}
view raw RefiningConcept hosted with ❤ by GitHub
even better we can combine two concepts Summale + Scalable obtaining a third one:

template <class T>
concept bool Summable() {
return requires(T a, T b) {
{a + b}->T;
};
}
template <class T>
concept bool Scalable() {
return requires(T a, int c) {
{c * a}->T;
};
}
template <class T>
concept bool SummableScalable() {
return Summable<T>() &&
Scalable<T>();
}
I believe when we will get the concepts available on our preferred compiler, for instance the concepts didn't make C++17 and today (at my knowledge) the concepts are implemented only in GCC 6 (using the flag -fconcepts), it will change the face of c++ code even more the c++11 specification did.


Friday, June 20, 2014

Deal with OOM conditions

Imagine the following sci-fi scenario: your code is in the middle of aborting a nuclear missiles launch fired by mistake, it needs to allocate some memory and unfortunately the BOFH is using all the physical and virtual memory because, he just can.

What shall we do?

The life of thousand people depends on that function you need to call, passing to it some fresh allocated memory. The operator new (unless the placement one is called) deals with OOM condition throwing a bad_alloc or returning a null-pointer in case the nothrow version of it is used.

But as programmer what can you do when a bad_alloc is thrown or a null-pointer is returned?

There are several options, but the most "nifty" one is the following.

When the operator new is not able to allocate the required memory it calls a function, at this point the function can try to free some memory, throwing an exception or exit the program. Exiting the program is not a good option I have to say, indeed the caller of the operator new (or operator new [] for the matter) expects a bad_alloc (or a derivation of it) or a nullptr (in case the nothrow was used).

A programmer is able to specify the function to be call in case of OOM with the following function:

new_handler set_new_handler (new_handler new_p) noexcept;
view raw set_new_handler hosted with ❤ by GitHub
the operator new will keep calling the specified function every time it tries to allocate memory and it doesn't succeeded. A programmer can exploit this mechanism in the following way:
  1. Allocate a the programming startup a bunch of memory reserving it for future uses.
  2. Install the new handler that will free the reserved memory, in case the reserved memory was already release then throw bad_alloc. 
The following code does exactly what described:

#include <iostream>
class ReservedMemory {
public:
ReservedMemory()
: theMemoryReserved(new char[80000*1024])
{}
~ReservedMemory() { delete []theMemoryReserved; }
void release() {
if (theMemoryReserved) {
std::cout << "FREEING SOME MEMORY" << std::endl;
delete []theMemoryReserved;
theMemoryReserved = nullptr;
} else {
std::cout << "NO MORE MEMORY TO FREE" << std::endl;
throw std::bad_alloc();
}
}
private:
const char* theMemoryReserved;
};
ReservedMemory rm;
int main(int argc, char** argv) {
std::set_new_handler([](){ rm.release(); });
char * ptr = new char[50000*1024];
std::cout << "SUCCEEDED" << std::endl;
char * ptra = new char[50000*1024];
}

issuing a ulimit -v 100000 before to run it (in order to decrease the memory that can be used), the output is the following:

FREEING SOME MEMORY
SUCCEEDED
NO MORE MEMORY TO FREE
terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
Aborted (core dumped)


As you can see at least once we were able to free some memory and the first allocation after the OOM condition was able to allocate memory due the fact some was freed by us, unfortunately there were no more space on the second call. You have no excuse anymore to have a crash due to OOM condition, what you can do at least is to free the memory, launch a warning, writing in the logs, send a message to a pager or whatever action that soon the memory will be over for real!

Wednesday, May 21, 2014

Prevent exceptions from leaving destructors. Now!

Any seasoned C++ programmer should now that permitting an exception to leave the destructor is bad practice, googling for "throw exception destructor" it leads to enough results convincing yourself that is a bad practice (see for example Meyers's "More effective C++" Item 11). Most of the arguments are: "if an object is destroyed during a stack unwinding then throwing an exception it triggers the terminate function" or "if an STL container is being destroyed it start to destroy all his contained elements and given the fact the STL containers do not expect an exception being thrown then it will not complete the destruction of the remaining objects".

If you still are not convinced by those arguments then I hope you will buy at least the following. Let's look at a possible implementation of an unique ptr (apart the -> and * operators):

template <class T>
class AutoPtr {
public:
AutoPtr(T* aPointer)
: thePointer(aPointer)
{}
~AutoPtr() {
delete thePointer;
}
void reset() {
delete thePointer;
thePointer = nullptr;
}
private:
T* thePointer;
};
view raw AutoPtr hosted with ❤ by GitHub
and a possible use:

int main() {
AutoPtr<Bomb> a(new Bomb());
a.reset();
}
view raw AutoPtrMain hosted with ❤ by GitHub
As you can see the AutoPtr::reset() deletes the stored pointer and then is not able to nullify it due the throw, as soon as the "a" instance goes out of scope due the stack unwinding then ~AutoPtr deletes again thePointer. A possible implementation of reset can be the following:

void reset() {
T* tmp = thePointer;
thePointer = nullptr;
delete tmp;
}
view raw AnotherReset hosted with ❤ by GitHub
but unfortunately it not saves you! Indeed in c++11 specification you can "find" the following:
12.4.3: A declaration of a destructor that does not have an exception-specification is implicitly considered to have the same exception-specification as an implicit declaration (15.4).
and again:
Whenever an exception is thrown and the search for a handler (15.3) encounters the outermost block of a function with an exception-specification that does not allow the exception, then, — if the exception-specification is a dynamic-exception-specification, the function std::unexpected() is called (15.5.2), — otherwise, the function std::terminate() is called (15.5.1).
that means that throwing an exception from a DTOR terminates your program and it doesn't matter if a stack unwinding is going on or not.

This simple example

#include <stdexcept>
class Bomb {
public:
~Bomb() { throw std::runtime_error("BOOM"); }
};
int main()
try {
Bomb b;
}
catch(...) {}
view raw SimpleMain hosted with ❤ by GitHub
does generate a crash if compiled in c++11 mode with gcc (4.8 and 4.9) and clang (3.5) while with intel icc 14.01 doens't call the std::unexpected either the std::terminate (time to fill an icc bug?)

Sunday, March 9, 2014

ref-qualifiers

C++11 introduced the ability to "ref-qualifier" methods. The most known qualifier is the const one:

class T {
...
foo() const; // Here *this is const
...
}
view raw gistfile1.cpp hosted with ❤ by GitHub
however now is also possible to ref-qualify *this

class T {
...
foo() const; // *this is const
bar() &; // *this is an l-value
goo() &&; // *this is an r-value
...
};
view raw gistfile1.txt hosted with ❤ by GitHub
let see how this can be of any use. Immagine to have a factory building heavy objects and returning them by copy this way:

class JumboFactory {
...
Jumbo getJumboByCopy() {
return theJumboObject;
}
...
private:
Jumbo theJumboObject;
};
JumboFactory myJF;
Jumbo myJumbo = myJF.getJumboByCopy();
view raw gistfile1.txt hosted with ❤ by GitHub
in the following scenario we can avoid an useless copy:

Jumbo myJumbo = JumboFactory().getJumboByCopy();
view raw gistfile1.txt hosted with ❤ by GitHub
we can avoid the copy if Jumbo is movable overloading the method getJumboByCopy in case the object on which I'm calling it is a temporary:

class JumboFactory {
...
Jumbo getJumboByCopy() const & {
//Deep copy
return theJumboObject;
}
Jumbo getJumboByCopy() && { // *this is an r-value
//Move
return std::move(theJumboObject);
}
...
private:
Jumbo theJumboObject;
};
JumboFactory myJF;
Jumbo myJumboA = myJF.getJumboByCopy(); // Deep copy
Jumbo myJumboB = JumboFactory().getJumboByCopy(); // Move
view raw gistfile1.txt hosted with ❤ by GitHub
To be honest the example shows a scenario with other problems than the one mentioned (for instance if the object Jumbo is so big why permitting the copy then?) but I hope you got the idea.