c++ today

The Unbearable Lightness of a Lambda

2020-01-24T15:16:00.000-08:00

I believe that one of the game change features in c++ were the introduction of lambdas I also believe that performance-wise lambda are underestimated, the reason for not considering the cost of a lambda is the fact that are considered just as functions.
Given the following code

And playing the compiler game this is what happens:

Now it's clear that a lambda has a state. With the help of godbolt looks like that if the captured amount of memory exceeds 16 bytes both compilers I have tried (gcc and clang) perform an allocation on the heap. Let see how we can avoid this madness.
Let's imagine we have some pieces of code that we need to execute after a preamble and before a postamble, something like the following:

This is the generated code and you can observe the heap allocation made to store the lambda "internal state"

In this case, we can completely dodge the capture and save us some headaches avoiding the heap allocation when the lambda is converted to that std::function, we can make the function template on the lambda and avoid the std::function conversion. And this is what we get

We have another issue now, PrePost function is not generic (even if it seems).
Let's make it generic, as it is indeed doesn't work for example with mutable lambdas, the following code, for instance, does not compile:

We need to get the lambda as a Universal Reference, that is:

We are not done yet with PrePost indeed as it is can also get callable objects but it doesn't behave well with callable objects with "ref-qualified methods" such as:

As it is our PrePost function is bugged, it compiles but doesn't do what we expect, in the following code the r-value operator is expected to be used but is not

In order to fix the issue we need to "perfect forward" the function/lambda, this is the last version of PrePost that covers all cases

Similar links

Avoid the performance hazzard of std::function
Efficient use of lambda expressions

Thread Collisions detector - Fake Mutex

2017-05-08T14:39:00.000-07:00

Some years ago I found myself wondering if a not supposed thread safe class was being used by multiple threads without being synchronized. In that occasion I wrote about it here Threading mess! and here Threading mess (2)!.
At that time (9 years ago) we had no threads neither atomic in the c++ standard and the solution proposed was based on pthreads and on gcc atomic builtins. I think it's time to refresh the implementation using some C++11 features.
The idea is very simple, upon entering a critical section (part of code that should not be executed concurrently) we should save the current thread id resetting the stored value as soon the thread leaves the critical section. If a thread tries to enter a critical section but we already have a thread id saved then we have detected the collision.
The technique is very effective and at that time I wrote for the Chromium project the class ThreadCollisionWarner thread_collision_warner.h and thread_collision_warner.cc using the described technique.
Basically what you need to do is to add to your classes a "FakeMutex" and then "locking" it where it's needed as you would do with a real mutex. It's called Fake Mutex because it will not suspend a thread if another one is active but it will assert(false) instead. If you want to use this technique in your project I suggest to use the implementation done in Chromium.

Examples of uses:

the macros DFAKE_MUTEX, DFAKE_SCOPED_LOCK, DFAKE_SCOPED_RECURSIVE_LOCK and DFAKE_SCOPED_LOCK_THREAD_LOCKED are defined only if compiled in DEBUG mode removing from your production code the atomic overhead.

The modern simplified version of Chromium ThreadCollisionWarner proposed in Threading mess (2)! is reported here.

Structured Binding (C++17 inside)

2017-03-31T09:00:00.000-07:00

Let's see how structured binding introduced in C++17 will change the way to interact with std::pair, std::tuple, std::array and such:

     std::pair<int,float> foo();  
     auto [a,b] = foo();

it will replaces:

     std::pair<int,float> foo();  
     int a;  
     float b;  
     std::tie(a,b) = foo();

in case you are running an obfuscated code contest this new swap can scrub up well (please don't try this at home):

     std::tie(a,b) = std::make_pair(b,a);

The decomposition works with c array and std::array as well:

    int a[4] = { 1, 2, 3, 4};  
    auto [b,c,d,e] = a;

    std::array<int, 4> a;  
    auto [b,c,d,e] = a;

and this is what you can do using a ranged for loop:

    std::map myMap;  
    ...  
    for (const auto & [k,v] : myMap) {  
    }

I bet someone in c++ committee has become recently a python enthusiast. Now if you wonder what your structures like this:

   stuct X {  
    int theInt = 3;  
    float thePi = 3.14;  
   };  
   auto [a,b] = x;

shall provide to make the decomposition working the response is: a plain nothing. That will work indeed off the shelf.

Unfortunately if you need to do something more fancy with your class it has to support the get<>() functions, and you need to reopen the std namespace to specialize std::tuple_size and std::tuple_element.

Given the following user defined type (note a and b here are private members):

   class Y {  
   public:  
    int foo() const {  
     return a;  
    }  
    float bar() const {  
     return b;  
    }  
   private:  
    int a = 3;  
    float b = 3.14;  
   };

you need to provide the gets<>() functions:

   template <int N> auto get(Y const &);  
   template <> auto get<0>(Y const & aY) {  
    return aY.foo();  
   }  
   template <> auto get<1>(Y const & aY) {  
    return aY.bar();  
   }

and then you need to reopen the std namespace (one of those few allowed cases):

   namespace std {  
    template<>  
    struct std::tuple_size<Y> {  
      static const size_t value = 2;  
    };  
    template<size_t I>  
    struct std::tuple_element<I, Y> {  
     using type = decltype(get<I>(declval<Y>()));  
    };  
   }

Note the partial specialization for std::tuple_element, you don't need to hard code the type of each index, it's enough to "deduce" it using the get function. You did a lot of work in order to have your class supporting the decomposition, in this case c++17 can save you some work taking advantage of a new c++17 feature, the "constexpr if", just writing a single version of get<>():

   template<int N>   
   auto get(Y const & aY) {  
     static_assert(N==0 || N==1);  
     if constexpr (N == 0) {  
       return aY.foo();  
     } else if constexpr (N == 1) {  
       return aY.bar();  
     }  
   }

If you want use/experiment with those new language features go for clang++ (I tried only version 5.0 but it should work with the 4.0 as well) and you need to specify -std=gnu++1z

Concepts!

2017-03-18T01:00:00.000-07:00

Let's start from something very easy, you have the simple function:

soon you realize you need another one but for float and before you even need the third one you write it like this:

looks like you are done, everything goes well until, while compiling your huge project, the compiler gives the following error:

right, you think, the type Point needs to have defined the operator+, after defining it and after some precious minutes you get now another error:

and finally this is the last error and fixing it fixes your whole compilation.

Sometime that Point type is defined in an header that makes your entire project to recompile every time you add a missing feature.

Let see how can concepts can save our time and some headache.

Basically that sumScale function has a strong requirement on the type T. It should be a summable type (it has to support the operator+) and it should be scalable (it has to support operator* with an int), we can express these two concepts in the following way:

and then use this defined concept rewriting the sumScale function:

doing so the error would have been a more useful one:

wow, within a single iteration the compiler was able to gives us all the information we needed in order to fix the issue. In case you missed it I'll report for convenience the old and the new version of sumScale function.

and this is, in my humble opinion, one of the main advantages of concepts: simplify the generic programming taking rid of the cumbersome template syntax.

Let's go back now to our concept:

this concept is the refining of a Scalable concept indeed we can define the SummableScalable concept writing first a Summable concept then refining it in the following way:

even better we can combine two concepts Summale + Scalable obtaining a third one:

I believe when we will get the concepts available on our preferred compiler, for instance the concepts didn't make C++17 and today (at my knowledge) the concepts are implemented only in GCC 6 (using the flag -fconcepts), it will change the face of c++ code even more the c++11 specification did.

Deal with OOM conditions

2014-06-20T13:31:00.003-07:00

Imagine the following sci-fi scenario: your code is in the middle of aborting a nuclear missiles launch fired by mistake, it needs to allocate some memory and unfortunately the BOFH is using all the physical and virtual memory because, he just can.

What shall we do?

The life of thousand people depends on that function you need to call, passing to it some fresh allocated memory. The operator new (unless the placement one is called) deals with OOM condition throwing a bad_alloc or returning a null-pointer in case the nothrow version of it is used.

But as programmer what can you do when a bad_alloc is thrown or a null-pointer is returned?

There are several options, but the most "nifty" one is the following.

When the operator new is not able to allocate the required memory it calls a function, at this point the function can try to free some memory, throwing an exception or exit the program. Exiting the program is not a good option I have to say, indeed the caller of the operator new (or operator new [] for the matter) expects a bad_alloc (or a derivation of it) or a nullptr (in case the nothrow was used).

A programmer is able to specify the function to be call in case of OOM with the following function:

the operator new will keep calling the specified function every time it tries to allocate memory and it doesn't succeeded. A programmer can exploit this mechanism in the following way:

Allocate a the programming startup a bunch of memory reserving it for future uses.
Install the new handler that will free the reserved memory, in case the reserved memory was already release then throw bad_alloc.

The following code does exactly what described:

issuing a ulimit -v 100000 before to run it (in order to decrease the memory that can be used), the output is the following:

FREEING SOME MEMORY

SUCCEEDED

NO MORE MEMORY TO FREE

terminate called after throwing an instance of 'std::bad_alloc'

what(): std::bad_alloc

Aborted (core dumped)

As you can see at least once we were able to free some memory and the first allocation after the OOM condition was able to allocate memory due the fact some was freed by us, unfortunately there were no more space on the second call. You have no excuse anymore to have a crash due to OOM condition, what you can do at least is to free the memory, launch a warning, writing in the logs, send a message to a pager or whatever action that soon the memory will be over for real!

Prevent exceptions from leaving destructors. Now!

2014-05-21T15:30:00.002-07:00

Any seasoned C++ programmer should now that permitting an exception to leave the destructor is bad practice, googling for "throw exception destructor" it leads to enough results convincing yourself that is a bad practice (see for example Meyers's "More effective C++" Item 11). Most of the arguments are: "if an object is destroyed during a stack unwinding then throwing an exception it triggers the terminate function" or "if an STL container is being destroyed it start to destroy all his contained elements and given the fact the STL containers do not expect an exception being thrown then it will not complete the destruction of the remaining objects".

If you still are not convinced by those arguments then I hope you will buy at least the following. Let's look at a possible implementation of an unique ptr (apart the -> and * operators):

and a possible use:

As you can see the AutoPtr::reset() deletes the stored pointer and then is not able to nullify it due the throw, as soon as the "a" instance goes out of scope due the stack unwinding then ~AutoPtr deletes again thePointer. A possible implementation of reset can be the following:

but unfortunately it not saves you! Indeed in c++11 specification you can "find" the following:

12.4.3: A declaration of a destructor that does not have an exception-specification is implicitly considered to have the same exception-specification as an implicit declaration (15.4).

and again:

Whenever an exception is thrown and the search for a handler (15.3) encounters the outermost block of a function with an exception-specification that does not allow the exception, then, — if the exception-specification is a dynamic-exception-specification, the function std::unexpected() is called (15.5.2), — otherwise, the function std::terminate() is called (15.5.1).

that means that throwing an exception from a DTOR terminates your program and it doesn't matter if a stack unwinding is going on or not.

This simple example

does generate a crash if compiled in c++11 mode with gcc (4.8 and 4.9) and clang (3.5) while with intel icc 14.01 doens't call the std::unexpected either the std::terminate (time to fill an icc bug?)

ref-qualifiers

2014-03-09T14:43:00.000-07:00

C++11 introduced the ability to "ref-qualifier" methods. The most known qualifier is the const one:

however now is also possible to ref-qualify *this

let see how this can be of any use. Immagine to have a factory building heavy objects and returning them by copy this way:

in the following scenario we can avoid an useless copy:

we can avoid the copy if Jumbo is movable overloading the method getJumboByCopy in case the object on which I'm calling it is a temporary:

To be honest the example shows a scenario with other problems than the one mentioned (for instance if the object Jumbo is so big why permitting the copy then?) but I hope you got the idea.

The under-evaluated delete specifier

2014-02-16T02:02:00.001-08:00

As you should know by now in C++11 we are able disable certain signatures in our classes. Most of the time this is used to disable copy constructor and assignment operators

this way is much better than the old way where the programmer had to declare private both member and then not implement them getting an error either at compile time or either at linking time.

The delete specifier can disable some automatic overloads consider indeed the following code

it works perfectly but if we do not want that automatic conversion (note that explicit can not be used here) then the delete specifier can come in handy

A typical mistake in C++ is to store a reference (or a pointer for the matter) to a temporary object leading to a disaster. This mistake is one of the argument java guys put on the table when they are arguing against C++.

The following class is a perfectly working class but used wrongly can store a reference to a temporary object

The delete specifier can help us again indeed we can disable the constructor with a rvalue reference and avoid such use:

now the code above will lead to a compilation error in case we are trying to build the class with a temporary string, you should note that a "const &&" is needed, indeed without the const specifier passing a "const std::string foo()" will not led to a compilation error

Time to add to my coding rules a new rule!

A bad workman always blames his tools (Miguel de Icaza inside)

2014-01-05T09:18:00.001-08:00

Recently Miguel de Icaza revealed in his blog the fact that they regret the decision to develop Moonlight in C++, you can read about it in here:

http://tirania.org/blog/archive/2014/Jan-04.html

My first thought was, wow he found a way to communicate with a parallel universe where a "Miguel de Icaza" took the decision to got for C and he is now comparing the two projects.

Anyway, he is Miguel de Icaza after all, he is behind the Gnome and Mono projects. Chapeau!
I was curious to look their code base and after having opened some sources here and there I was horrified and I mean it.

Issues I found (just opening a few files not doing a full code inspection and without any static analysis check):

CTORs are not using initialization list
Arguments of CTORs and functions are not getting parameters as const reference copying basically all the arguments passed
Methods that should be marked as const are not(as example KeyTime::HasPercent), this means that const correctness is not used around in the code.
classes not meant to be modified after the construction do not have their member marked const
Not all classes have all their member initialized
Classes have their destructor marked as virtual even when not needed and also if the DTOR is marked as virtual why the CopyCTOR is not implemented or then "disabled"?
List reimplemented from scratch and instead to make the List template the List is a standarad bidirectional implementation with Node hosting only the next/prev pointers and a virtual destructor (nice vptr overhead when not needed) then a derived class from Node a template GenericNode. The user of this class has to create his own class Inheriting from GenericNode (see EventObjectNode).
brush.cpp: You accept first a possibile division by zero and then you fix the result

double sx = sw / width;

double sy = sh / height;

if (width == 0)

sx = 1.0;

if (height == 0)

sy = 1.0;

collection.cpp the following statement looks suspicious:

if (n == 0 || n % 1 == 1) {...}

General remarks on the code:

I haven't seen a single class not copyable
on around 250K line of code just a mere 128 asserts
Variables assigned twice without use the first assigned value (see as example CornerRadius::FromStr implementation).
Scope variables can be reduced
Pointer cast C-Styled
Unused variables

So dear Miguel de Icaza, please fix your code base then we can talk about performances and memory efficiency. Unless your regret has to be read as the following: "We regret to have chosen C++ without having a deep knowledge of it and without any best practice to follow".

STL is not thread safe, get over it

2013-08-28T13:45:00.001-07:00

STL is not thread safe, get over it. Why it should be after all? STL is plain C++ code, compiler doesn't even know the existence of the STL, it's just C++ code released with your compiler, what the STL has in common with the C++ language is the fact that it's standardized. Given the fact that C++ and STL is standardized you can expect to have STL implemented on every platform with same guarantees enforced by the standard. You are not even obliged to use the STL deployed with your compiler indeed there are various version out there, see the roguewave ones for example (http://www.roguewave.com).
Let's take as example the std::list, let's suppose for a moment that the stl implementation it's thread safe,
some problem arise:

If it's not needed to be thread safe you will get extra not needed overhead
What shall do a thread calling a std::list::pop_back on an empty list?

Waiting for a std::list::push?
Returning with an "error"?
Throwing an exception?
Waiting for a certain ammount of seconds that an entry is available?

It should be used in an enviroment with a single producer/single consumer, in this case it's possible to implement it without locks.
Shall be multiple readers permitted?

Yes sure you can solve all the points above with a policy template, but just imagine your users complains.

Well, again, get over it, STL standardized is not thread safe you have to create your thread safe list embedding a real std::list, after all making a wrapper around an STL container is a so easy exercise that if you find difficult to do it yourself then you have to ask: "Am I ready to start with multithreading programming" even if someone provides me an out of the shelf std::list thread safe?

Consider that two different instance of STL containers can be safely manipulated by different threads.

PS: I have written about std::list instead of the most widely used std::vector because std::vector for his own characteristics has to be used in a more "static" way with respect to an std::list and using an std::vector as a container used by multiple threads (producer/consumer) is a plain wrong choice.

Code inspecting Stroustrup (TC++PL4) (2nd issue)

2013-06-13T15:07:00.001-07:00

Since I learned C++ reading one of the first edition of TC++PL and even if I'm programming using C++ since 2000 or so I'm reading the new TC++PL4 carefully as if this language is totally new to me. It seems my last post about an error found on this book will be not the unique and here we are again.
In 5.3.4.1 he introduces conditions and how two threads can interact each other communicating using events, it's presented the classical producer / consumer interaction exchanging Messages trough a queue, and this is the poor implementation proposed:

class Message {
// ...
};

queue mqueue;
condition_variable mcond;
mutex mmutex;

void consumer()
{
while(true) {
unique_lock lck{mmutex};
while (mcond.wait(lck)) /* do nothing */;

auto m = mqueue.front();
mqueue.pop();
lck.unlock();
// ... process m...
}
}

void producer()
{
while(true) {
Message m;
// ... fill the message ...
unique_lock lck{mmutex};
mqueue.push(m);
mcond.notify_one();
}
}

This implementation is affected by at least three issues:

Unless very lucky the queue will grow indefinitely: that's because basically the consumer will wait at each cycle even if queue contains something, at the same time it has a chance (I repeat a "chance") to exit from the condition_variable::wait() only each time the producer puts something in the queue.
The consumer can miss the condition_variable::notify_one event, indeed if the producer does the notify_one() but the other thread hasn't yet executed the wait() the consumer will block for no reason
The producer holds the unique_lock for more time than needed, the mutex has to only protect the queue not the condition as well

Let see how those producer / consumer should have been implemented:

void consumer()
{
while(true) {
unique_lock lck{mmutex};
while (mqueue.empty()) { // the empty condition has to be recheck indeed the thread
// can get sporadic wakeup without any thread doing a notify
mcond.wait(lck);
}
auto m = mqueue.front();
mqueue.pop();
lck.unlock();
// ... process m...
}
}

void producer()
{
while(true) {
Message m;
// ... fill the message ...
{
unique_lock lck{mmutex};
mqueue.push(m);
} // This extra scope is in here to release the mmutex asap
mcond.notify_one();
}
}

In my opinion this should have been the version of the producer/consumer in TC++PL4, as you can see
with a simple extra scope and the right while(...) the issues reported in the bullets are solved.

There is another problem, I have to admit that this issue most of the times is a minore one:

The producer can issue notify_one() even if not needed, and this can be a performance issue

to address it the producer has to "forecast" if the consumer can be in a blocked status, and this can happen only if after having acquired the mmutex the queue is empty, this is the final version of producer:

void producer()
{
while(true) {
Message m;
bool notifyIsNeeded = false;
  // ... fill the message ...
  {
unique_lock lck{mmutex};
if (mqueue.empty()) {
notifyIsNeeded = true;
}
mqueue.push(m);
} // This extra scope is in here to release the mmutex asap
  if ( notifyIsNeeded ) {
mcond.notify_one();
}
}
}

Writing correct code is not easy and writing correct multi-threaded is damn hard.

Code inspecting Stroustrup (TC++PL4)

2013-06-09T09:27:00.002-07:00

TC++PL4 is now on my desk and carefully reading it I have to say that even Stroustrup makes stupid mistakes in his classes implementations.

He illustrates a typical Vector implementation (pag. 73):

class Vector {
private:
double* elem;
int sz;
public:
Vector(int s);
~Vector() { delete [] elem; }

Vector(const Vector& a);
Vector& operator=(const Vector& a);

double& operator[](int i);
const double& operator[](int i) const;

int size() const;
};

and given the fact this class needs a copy constructor implemented he "implements" it (pag. 74):

Vector::Vector(const Vector& a)
:elem{new double[sz]},
sz{a.sz}
{
for (int i = 0; i != sz; ++i)
elem[i] = a.elem[i];
}

as you can see the elem vector is built with a size retrieved from a not yet initialized variable, being it defined at page 74 I had my last hope flipping the page and checking at page 73 if "int sz" was declared before "double* elem" but it was not the case.

I'm sad.

Temporary objects

2010-11-01T12:36:00.000-07:00

Most believe that temporary objects are const, well they are "almost const". It is possible indeed call on a temporary object a non const member. Nifty exception stated in the standard: "Section 3.10.10 in C++ ISO/IEC 14882:1998".
This exception permits to implement a movable constructor (through a proxy) while waiting for Rvalue references in C++0x language standard.

Asserts vs Exceptions

2010-02-12T02:40:00.001-08:00

I still see people with the doubt: "shall I use an assert or throw an exception?".
Let me whine a bit about the usage of asserts first. Overall people code badly and I mean it, hence assertions are not used enough. Having say that, "assert" and "exception" have different
meanings and behaves.

A failure assert just terminates the normal program execution (reporting source file name and line position where the error occurred), an exception thrown has a chance to be caught. Asserts disappear/vanish when NDEBUG is defined (usually in production code); this fact is a newbie oversight and what they claim about the massive usage of assert is the overhead introduced by asserts. If I wasn't clear: "Asserts disappear in production code". What didn't you get from: "Asserts disappear in production code"?

Those two facts already give an hint on when to use an assert and when an exception. If the error can be managed (by user or code itself) then an exception must be used, if the error has no chance to be managed by anyone then assert is your friend. This is not the only rule to follow indeed as said: "Asserts disappear in production code" and then in production code that error will not be "detected", errors that you expect to happen in production code can not be spotted by an assert.
Asserts are meant to detect coding error and there is no reason to happen in production code. When writing a method there are some assumption about the internal class status (pre-condition), better to check those first in order to not do something bad, at the end of the same method better check that the internal status of class is in a consistent status (post-condition).
Unfortunately, there are a great deal of people that are using exceptions for case that really ought to be assertions. Throwing exceptions instead of wrapping simple pre/post-conditions into a simple assertion macro is an hint of the fact that you're coding badly. Asserts are used to protect by coding error: using a null pointer for example; exception are to protect by normal program life: server not available, file not writable, etc.

It should be always possible to write a unit test that is able to make an exception to be thrown, if you are not able to then it means that that piece of code is a dead code; to the other side it should be impossible do the same with assert, if you are able to write an unit test that is able to make an assert fail then most likely that assert should be an exception, or of course the code is wrong.

A nice rule that drives the lazy bone programmers to use more assert is: who ever write a code that provides a set of data that asserted then the same person is responsible to fix it. This rule drives library writer to protect them self by an incorrect usage of their library at the cost to fix even the code using the library.

If you have missed it: "Asserts disappear in production code".

Assignment operator

2008-09-18T04:06:00.000-07:00

Today I went through a piece of code similar to this (what matters here is how the assignment operator was written):


class Test {
    Test()
    :theId(0), theName()
    {}

    Test& operator=(const Test& aRhs) {
        theID   = aRhs.theID;
        theName = aRhs.theName;

        return *this;
    }

private:
    int         theID;
    std::string theName;

};

as it is, the code works and I have nothing to say at first glance. However , who knows how the class will be extended in the future?

Imagine that in a future version some members are added:


class Test {
    Test()
    :theId(0), theName(), theHammer(), theHeap(0)
    {}

    Test& operator=(const Test& aRhs) {
        theID     = aRhs.theID;
        theName   = aRhs.theName;
        theHammer = aRhs.theHammer;  //This is going to be a bottleneck
        theHeap   = aRhs.theHeap;    //This is wrong

        return *this;
    }

private:
    int          theID;
    std::string  theName;
    HugeClass    theHammer;  //Extra member with huge footprint
    HeapClass*   theHeap;    //Extra member on heap
};

as you can see with these two extra members following the same code line as the previous version the code becomes invalid. The objection I get is: "if one day someone adds those two members then they will take care to write it correctly", unfortunately in the real world it doesn't work like this. The average programmer will just add those two extra members following the code line already in place, following the rule: "after all if the code works for the already present members why shouldn't it be the same for the two extra members I have been told to add?"

The trick to avoid problems like this is to write the right code, from the start thinking of what will happen in the future, when possible of course. Usually the change to make is just a matter of a few lines of code and a two line comment in order to warn the future coders.

The original well written class would have been:


class Test {
    Test()
    :theId(0), theName()
    {}

    Test& operator=(const Test& aRhs) {

        //Check for a self assignment
        if (this != &aRhs) {
            theID     = aRhs.theID;
            theName   = aRhs.theName;
        }

        return *this;
    }
private:
    int          theID;
    std::string  theName;
};

At this point the objection is: why check for a self assignment when the event is never going to happen? Who will ever write:


Test t;

t=t;

well, the code is valid so be sure someone will, also it is not always easy to spot a self assignment, consider this:


t[j] = t[i];

or even:


Test t;
Test &a = t;
...
t = a;

The real question here is: "Why was the assignment operator in that class was ever implemented?" The question makes a point, the operator was not needed at all, in that case the right thing to do is to remove it.

Let's then suppose the class is the one with the two extra members, the class is managing dynamically allocated memory then the operator must be implemented. The almost correct version is:


class Test {
    Test()
    :theId(0), theName(), theHammer(), theHeap(new HeapClass)
    {}

    Test& operator=(const Test& aRhs) {
        if (this != &aRhs) {
            theID     = aRhs.theID;
            theName   = aRhs.theName;
            theHammer = aRhs.theHammer;  

            delete theHeap;
            theHeap   = new HeapClass(*aRhs.theHeap);
        }

        return *this;
    }

private:
    int          theID;
    std::string  theName;
    HugeClass    theHammer;  //Extra member with huge footprint
    HeapClass*   theHeap;    //Extra member on heap
};

I wrote "almost correct" because the assignment operator is correct but not exception safe, imagine what will happen if an exception is thrown, if that is the case then the class will be left in an inconsistent state. The goal is not an easy one, in order to make an assignment operator exception safe before modifying the internal state it is better to create a temporary object and then swap it with the internal state with operations that do not throw exceptions. This is achieved using the Pimpl idiom, moving all the internal state of Test inside another class and then leaving inside the class Test a pointer to this new class, it is better to use a boost::shared_ptr in this case to avoid exception catches:


class TestImpl {

    friend class Test;

private:
    TestImpl()
    :theID(0), theName(), theHammer(), theHeap(new HeapClass)
    {}

    TestImpl(const TestImpl& aRhs)
    :theId(aRhs.theID), theName(aRhs.theName),
     theHammer(aRhs.theHammer), theHeap(new HeapClass(aRhs.theHeap))
    {}

    int          theID;
    std::string  theName;
    HugeClass    theHammer;  //Extra member with huge footprint
    HeapClass*   theHeap;    //Extra member on heap
};

class Test {

    Test()
    :theImplementation(new TestImpl)
    {}

    Test& operator=(const Test& aRhs) {
        if (this != &aRhs) {
            boost::shared_ptr<TestImpl> tmp(new TestImpl(*aRhs.theImplementation));

            std::swap(theImplementation, tmp);
        }

        return *this;
    }

private:
    boost::shared_ptr<TestImpl> theImplementation;
};

As you can see writing right code is not easy, and this becomes more difficult if you want to write exception safe code. What I suggest is, to remove assignment operator and copy constructor and write those only if really needed:


class Test {

    Test()
    :theID(0), theName(), theHammer(), theHeap(new HeapClass)
    {}

private:
    Test& operator=(const Test& aRhs);  //disabled (do not even implement it)
    Test(const Test& aRhs);             //disabled (do not even implement it)


    int          theID;
    std::string  theName;
    HugeClass    theHammer;  //Extra member with huge footprint
    HeapClass*   theHeap;    //Extra member on heap
};

Threading mess (2)!

2008-06-19T07:31:00.000-07:00

I got some comments about my last post (threading-mess) about the fact that the showed solution was just detecting the scenario of two or more threads entering at the same time a critical section (in the last post example the method Shared::foo). What it doesn't answer is the real question: "is this class during its life being used by more than a single thread, or more specifically, is a certain section of code used by more than a thread"? Indeed if you remember after a SCOPED_LOCK leaves his scope, it "releases" the stored current ID thread allowing another thread to enter it.

I have added a nested Watch class to ThreadCollisionWarning class that detects also if a critical section is ever used by two different threads ( for example you can detect if a given class is constructed and destroyed within the same thread).

The code is the following:

#ifndef THREAD_COLLISION_WARNING
#define THREAD_COLLISION_WARNING

#include <stdexcept>

#ifdef NDEBUG

#define THREAD_WATCH(obj)
#define SCOPED_WATCH(obj)
#define WATCH(obj)

#else

#define THREAD_WATCH(obj) ThreadCollisionWarning _##obj;
#define SCOPED_WATCH(obj) ThreadCollisionWarning::ScopedWatch scoped_watch_##obj(_##obj);
#define WATCH(obj)        ThreadCollisionWarning::Watch watch_##obj(_##obj);

#endif

class ThreadCollisionWarning {
public:
    ThreadCollisionWarning()
    :theActiveThread(0)
    { }

    ~ThreadCollisionWarning() { }

    class Watch {
        public:
            Watch(ThreadCollisionWarning& aTCW)
            :theWarner(aTCW)
            { theWarner.enter_self(); }

            ~Watch() { }

        private:
            ThreadCollisionWarning& theWarner;
    };

    class ScopedWatch {
        public:
            ScopedWatch(ThreadCollisionWarning& aTCW)
            :theWarner(aTCW)
            { theWarner.enter(); }

            ~ScopedWatch() { theWarner.leave(); }

        private:
            ThreadCollisionWarning& theWarner;
    };

private:

    void enter_self() { 
        //If the active thread is 0 then I'll write the current thread ID
        //if two or more threads arrive here only one will success to write on theActiveThread 
        //the current thread ID
        if (! __sync_bool_compare_and_swap(&theActiveThread, 0, pthread_self())) { 

            //Last chance! may be is the thread itself calling from a critical section
            //another critical section
            if (!__sync_bool_compare_and_swap(&theActiveThread, pthread_self(), theActiveThread)) {
                throw std::runtime_error("Thread Collision");
            }
        }
    }

    void enter() { 
        if (!__sync_bool_compare_and_swap(&theActiveThread, 0, pthread_self())) {
            //gotcha! another thread is trying to use the same class
            throw std::runtime_error("Thread Collision");
        }
    }

    void leave() { 
        __sync_fetch_and_xor(&theActiveThread, theActiveThread);
    }

    pthread_t theActiveThread;
};

#endif

The nested Watch class (used by WATCH macro) just during his constructor initializes theActiveThread member with the current id thread if it isn't still initialized, in case it gives another chance to check if the active thread is itself.

So let's see some examples of use:

Case #1: Check that one thread ever uses some critical section (recursion allowed)

struct Shared {
   void foo() { 
       WATCH(CriticaSectionA);
       bar();
   }

   void bar() {
       WATCH(CriticaSectionA);
   }

   THREAD_WATCH(CriticaSectionA);
};

Case #2: Check that a class is constructed and destroyed inside the same thread

struct Shared {

    Shared() {
        WATCH(CTOR_DTOR_SECTION);
        ...
    }

   ~Shared() { 
       WATCH(CTOR_DTOR_SECTION);
       ...
   }

   THREAD_WATCH(CTOR_DTOR_SECTION);
};

note that doing so the Shared destructor can throw an exception, so do not use this in a production code (put the WATCH between a try-catch and just notify it in some way).

Case #3: Two or more different threads can enter a critical section but in exclusive way (useful to check if external sync mechanism are working).

struct Shared {

    foo() {
        SCOPED_WATCH(CriticalSectionA);
    }


   THREAD_WATCH(CriticalSectionA);
};

Threading mess!

2008-06-18T08:19:00.000-07:00

Software development requires discipline, you know what I mean: brainstorming, coding rules, code inspections, pair programming. Unfortunately all these activities for the management are a waste of time so at the end you end up to just act as a "code monkey"; to rub salt to the wound "multithread programming" requires ten time the discipline you need in a single thread environment. I've recently stepped in a project of medium size, and at the easy question: "are those class instances shared between two or more threads" the response was: "no... wait... yes, well I'm not sure... I don't know...". Riiiight.
Let's see a quick technique that should permit to detect (at runtime, sigh!) if two or more threads are using concurrently a class.

Suppose we have the following class:

struct Shared {
   void foo() { ...  }
};

and we are unsure if two threads are calling the Shared::foo() at same time. One way is to add a mutex to the class Shared and then attempt a "try lock" as first thing to do inside the foo and raise an error in case the try lock fails.

Something like:

class Shared {
   void foo() {
      TryScopedLock aLock(theMutex);

      if (!aLock) { throw std::runtime_error("BOOM"); }

      ...
   }

private:
   volatile mutex theMutex;

};

this approach works but it will slow down your software, hiding other problems around and, most of all, introduces useless synchronization; a mutex lock is not exactly a cheap operation.

The idea is to use the technique above but without using a lock, GCC gives us some functions for atomic memory access, and we can use for example:

bool __sync_bool_compare_and_swap (type *ptr, type oldval type newval, ...)

for our very purpose. That function assigns at *ptr the value newval only if the current value of *ptr is oldval, it returns true if the comparison is successful and newval was written. We can use it to store the threadId when we enter the critical section, "zeroing" the value when we exit.

Basically I wrote a class that store (with an atomic operation) the threadID of the thread entering the critical section, and when it leaves forgets about the threadID. This was the result:

#ifndef THREAD_COLLISION_WARNING
#define THREAD_COLLISION_WARNING

#include <stdexcept>

#ifdef NDEBUG

#define THREAD_WATCH(obj)
#define SCOPED_WATCH(obj)

#else

#define THREAD_WATCH(obj) ThreadCollisionWarning _##obj;
#define SCOPED_WATCH(obj) ThreadCollisionWarning::ScopedWatch scoped_watch_##obj(_##obj);

#endif

class ThreadCollisionWarning {
public:
    ThreadCollisionWarning()
    :theActiveThread(0)
    { }

    ~ThreadCollisionWarning() { }

    class ScopedWatch {
        public:
            ScopedWatch(ThreadCollisionWarning& aTCW)
            :theWarner(aTCW)
            { theWarner.enter(); }

            ~ScopedWatch() { theWarner.leave(); }

        private:
            ThreadCollisionWarning& theWarner;
    };

private:

    void enter() { 
        if (!__sync_bool_compare_and_swap(&theActiveThread, 0, pthread_self())) {
            //gotcha! another thread is trying to use the same class
            throw std::runtime_error("Thread Collision");
        }
    }
    void leave() { 
        __sync_fetch_and_xor(&theActiveThread, theActiveThread);
    }

    pthread_t theActiveThread;
};

#endif

The class ThreadCollisionWarning has the responsibility to store the thread using the class (or more in general entering a critical section) while the nested class ScopedWatch is used to notify the entering and the leaving the critical section. Look the implementation of the two ThreadCollisionWarning::enter and ThreadCollisionWarning::leave, the former stores the thread Id only if the old value was 0 the latter just zeroes it. The macros simplify the usage.

So there we go, the class Shared becomes then:

struct Shared {
   void foo(char aC) { 
       SCOPED_WATCH(Shared)
       ...
   }

   THREAD_WATCH(Shared)
};

using SCOPED_WATCH we just check that two threads are not using the method foo concurrently.

Of course the implementation above is not by any mean a complete solution to the problem I exposed at the beginning, it helps and it can be a good start point to create a better tool to detect if someone messed around.

False Sharing hits again!

2008-05-31T00:56:00.001-07:00

You can ask what "false sharing" is? False sharing is an annoying effect that occurs when two processors apparently do not share any resource but due to undergoing hardware architecture they actually do. For example consider two threads writing in not overlapping memory locations, they do not need any synchronization so happily you are driven to think that you are able to split your data in order to implement a lock less algorithm. Unfortunately you hold a sparkling "centrino duo" processor in where both cores do share the L2 cache and then the data you partition in memory can be mapped on the same cache line. The same scenario is triggered on L1 caches, indeed due to coherency protocols if a thread write to a cache line then the cache line referring the same memory is invalidated on the other processor (cache trashing).

Consider for example the following case:

char a[10];
char b[10];

start_thread_that_works_on_a;
start_thread_that_works_on_b;

Very likely that 20 bytes will lie on contiguous memory location and then will be mapped inside the same single cache line, so each time a thread works on his own vector it invalidates the cache line for the other thread and then the hardware has to write and fetch the cache line in and from memory even if it is not strictly needed.

I've had to work on an algorithm that had that very issue and the following code, even if useless, exploits the same problem:

#include <iostream>
#include <boost/thread.hpp>

class threadS {
public:
   threadS(unsigned char *aVector, unsigned int aVSize) 
   :theVector(aVector),
    theVSize(aVSize)
   { }

   void operator()() {
      unsigned long long myCounter = 100000000;
      while(--myCounter) {
          for (int i=0; i<10; ++i) {
              ++theVector[i];
          }
      }
   }
private:
   unsigned char* theVector; 
   unsigned int   theVSize; 
};

int main() 
{
   unsigned char vectorA[10]; 
   unsigned char vectorB[10];

   std::cout << std::hex;
   std::cout << "A:[" <<  (int)&vectorA[0] << "-" << (int)&vectorA[9] << "]" << std::endl;
   std::cout << "B:[" <<  (int)&vectorB[0] << "-" << (int)&vectorB[9] << "]" << std::endl;

   threadS threadA(vectorA, 10);
   threadS threadB(vectorB, 10);

   boost::thread_group tg;
   tg.create_thread(threadA);
   tg.create_thread(threadB);

   tg.join_all();
}

You should be able to compile and link it with:

g++ main.cpp -o false_sharing -lboost_thread -O3

Let see what that codes does.
The class threadS stores the vector on which it will operate. The thread body (the operator()) just increases all the vector elements. As you can see I used the boost thread library to start two threads: threadA and threadB.

On my system I obtain an execution time that goes from 6 to 8 seconds and the following boundaries for the two vectors:

B:[bfa7d010-bfa7d019] - A:[bfa7d01a-bfa7d023]

as you can see vectorB and vectorA are at contiguous memory locations.

How to eliminate the false sharing in this case? The goal is to have both threads working on an underling different cache line, we can achieve this goal separating both data with some extra bytes. Declaring the two vector bigger than we need it's a dirty and quick way to do it, executing the same program with the following vector declaration:

   unsigned char vectorA[100]; 
   unsigned char vectorB[100];

I'm able to obtain an execution time that goes from 1 to 1.5 seconds.

Same problem would happen with a single vector "unsigned char vector[1000]" but with a thread working on elements [0,10] and another thread working on elements [11,20]. I wrote a simple program that creates two threads,one performs writes at locations [0,10] and the other that performs writes at [z,z+10] with z inside the interval [11,100]. The following graph shows the execution time while the bytes of separation between the two data increase; z=11 means that data have a 0 bytes separation.

As you can see as soon the two data "false shared" have 51 bytes separation the execution time collapses from ~8 secs to ~1.8 secs, so just wasting 51 bytes I'm able to obtain a x5 speed up, nice isn't it?

A local variable is "local" after all

2007-11-26T07:30:00.000-08:00

I have found a code that "sounds" like this:


struct StructureC {
   char * theString;
};

class ClassCPP : public StructureC {
public:
   ClassCPP();
   ClassCPP(const ClassCPP& aClass);

   ~ClassCPP();
};

ClassCPP::ClassCPP() {
   std::string myString = foo();

   theString = myString.c_str();
}

ClassCPP::~ClassCPP() {
}

ClassCPP::ClassCPP(const ClassCPP& aClass) {
   theString = aClass.theString;
}

someone failed here, badly!

Let's see.

Of course the code above is not useful, it was simplified and extracted from a real case; StructureC can not be changed at all.

std::string has the method c_str() that returns a null terminated sequence of characters (same content as std::string) and it points to an internal location of std::string, when the scope of myString (note how I use the prefix "my" for local variables) ends, any reference to its internal status is then not valid, unfortunately theString is a member of the object being constructed. The code can "work" but then you are very lucky if it does.

The correct way is to allocate memory in the constructor and then copy the character sequence:


ClassCPP::ClassCPP() {
  std::string myString = foo();

  theString = new char[myString.size()+1]; //+1 to store the null termination
  strcpy(theString, myString.c_str());
}

so, you think that's all, don't you?

Still some errors left.

Copy constructor shall be rewritten, if we leave it unchanged like this


ClassCPP::ClassCPP(const ClassCPP&amp; aClass) {
  theString = aClass.theString;
}

then as soon we copy an object of type ClassCPP we will have two instances that are pointing to the same memory area (the one that contains the string).

So it'd be better to write it in the correct way:


ClassCPP::ClassCPP(const ClassCPP&amp; aClass) {
  theString = new char[strlen(aClass.theString)+1];
  strcpy(theString, aClass.theString);  
}

I could have used a strdup but strdup uses malloc to allocate memory.

Having done dynamic allocation of memory the destructor can not be void, we need to release the memory allocated:


ClassCPP::~ClassCPP() {
  delete []theString;
}

still some problems left, incredible how many errors can be done in a few lines of code!

Assignment operator shall be written as well but in this case we can declare it
private and not implement it:


class ClassCPP : public StructureC {
  ...
private:
  const ClassCPP &amp; operator=(const ClassCPP&amp;);
};

in this way we can check if someone needs it (the original implementation didn't
have it implemented so I guess the intention was: "I don't need it") and if necessary
implement it.

The problems are not over yet, look at the following usage of that code:



StructureC *a = new ClassCPP;
delete a;

as you can see deleting an instance of ClassCPP through a pointer to its base class will not call the ClassCPP destructor. Then we need to declare the destructor of StructureC virtual but given the fact we can not change StructureC then we need avoiding someone being able to build a ClassCPP in the heap memory. This can be done declaring and not implementing the operator new in the private part of class.


class ClassCPP : public StructureC {
  ...
private:
  ...
  void * operator new(std::size_t);
  void * operator new(std::size_t, void *);
};

as you can see in order to avoid any mistake better to disable the in-place operator
new as well.

Exceptions (part 2)

2006-12-15T09:19:00.000-08:00

So we have seen on my last post that throwing exceptions and functions call have some similarities, we have also seen that throw an object always means copy it even if we catch for reference, as all copies in C++ are based on the static type then even in the exceptions environments the objects thrown are base on the static type.

Not always throw an exception assures you to avoid memory leakages even using "automatic objects", consider this example:



class Foo {
  public:
    Foo();
    ~Foo()
  private:
    AType* aPointer;
    BType* bPointer;
};

Foo::Foo()
:aPointer(new AType),
 bPointer(new BType)
{ }

we know that the initializer list order depends on the order declaration in the definition of class,
so in this case aPointer is initialized first then bPointer. What happens if the "new BType" throws an exception? Well, given the fact aPointer is a plain pointer then we will have memory leakage. So a first thought can be to use not plain pointers but something like "smart pointer".
So a first approach can be the following:



class Foo {
  public:
    Foo();
    ~Foo()
  private:
    std::auto_ptr aPointer;
    std::auto_ptr bPointer;
};

Foo::Foo()
:aPointer(new AType),
 bPointer(new BType)
{ }

well this is still not safe. Let see the constructor execution sequence:

1) Constructor AType is executed (new AType)
2) Constructor BType is executed (new BType)
3) Constructor std::auto_ptr is executed ( aPointer( ... ) )
4) Constructor std::auto_ptr is executed ( bPointer( ... ) )

do you see know where the problem is? If still "new BType" throws an exception the address of memory allocated by new AType was still not saved anywhere; unfortunately the correct way to solve this problem is the following:



class Foo {
  public:
    Foo();
    ~Foo()
  private:
    std::auto_ptr aPointer;
    std::auto_ptr bPointer;
};

Foo::Foo()
:aPointer(),
 bPointer()
{
  aPointer = std::auto_ptr(new AType);
  bPointer = std::auto_ptr(new BType);
}

throwing an exception can also leave the object in an inconsistent state, consider the following class (do not consider the fact that the class is useless):



class Foo {
  public:
    Foo()
    :theStorage()
    { }

    addInt(int anInteger) {
      theStorage.push_back(anInteger);
    }

    void sumOne() {
      int i;
      for (i=0; i < theStorage.size(); ++i) {
      theStorage[i] += 1;
      if (i==2) {
         throw std::runtime_error("OPS!");
      }
    }

  private:
    std::vector theStorage;
};

and his usage:



Foo aFoo;

aFoo.addInt(0);
aFoo.addInt(1);
aFoo.addInt(2);
aFoo.addInt(3);

at this point calling:



aFoo.sumOne();

will throw an exception leaving aFoo with partial updated elements, and from user point of view the aFoo is in an inconsistent state, so the sumOne() function shows here another problem that can break the exception safety of a class. The solution on this kind of problems is to work on a copy of internal class state and then make a swap between the internal state and the modified state.

Exceptions (part 1)

2006-11-28T03:19:00.000-08:00

As you already know a modern way to deal with "errors" in C++ is the exception handling; however you need to be careful on using this mechanism. Let see how it works and some tips as well.
The exception mechanism is based on the try - catch blocks:



try {
  //some code in here that we attempt to execute
}
catch (...) {
  // this block is the error handling, the code in this 
  // block is executed if the code in the
  // try block above have thrown an exception
}

easy and net.

Let see how throw an exception so we can see more in depth what this mechanism offers, how use it, what to avoid.
An exception is thrown with a throw:



throw A;

where A is the type of the object thrown, in that case we are throwing an object of type A inizialized with his default constructor, we could have done:



throw A(3, "foo");

or even:



A a(3, "foo");
throw a;

The catch(...) { } handler is supposed to handle all kind of exception that code inside the try block throws, in this way we lose the kind of exception thrown so it's a bit reductive because the error handler doesn't have a clue on what is going on; fortunately is possible to specify wich kind of exceptions we want to manage (we as well ignore some).
Specify which kind of exception we want manage is done in this way:



catch(Foo f) {
}

we can have multiple catch blocks after a try:



try {
  //code we are attempting to execute
}
catch(Foo f) {
  // error handler in case a Foo type was thrown
}
catch(Bar b) {
   // error handler in case a Bar type was thrown
}

if the code in the try block thrown an exception that is not Foo and neither Bar then is like we are not using the try-catch blocks ( apart the introduction of the try's extra scope ).

We can obtain the same behaviour (logging for example we were not able to catch any expected exception) in this way:



try {
  //code we are attempting to execute
}
catch(Foo f) {
  // error handler in case a Foo type was thrown
}
catch(Bar b) {
  // error handler in case a Bar type was thrown
}
catch(...) {
  // log the event in here
  throw;  // this throws again the same exception.
}

There is still something behind all this. As you have seen the catch blocks are very similar to a function declaration where the arguments are passed by value.
The catch blocks can have indeed all kind of parameters:

catch( T )
catch( T & )
catch( T * )
catch( const T )
catch( const T & )
catch( const T * )

so you can think of throwing an exception have same effect of calling a function, but is not.
Consider this code:



void foo()
{
  ...
  A aLocalObject;
  ...

  throw aLocalObject 
}

and the call of foo is inside a try block:



try {
   foo();
}
catch( A anException) {

}

in this case, as the catch "signature" suggests, the anException is a copy of aLocalObject so
no problem with it, but what if we catch by reference?



catch( A & anException ) {
}

in this case we can think to have a reference to a destroyed object ( when an exception is thrown is not like a function call and all the local variable on that scope are destroyed ), well this is not the case indeed even if you catch an exception by reference the c++ runtime support will perform a copy of the object thrown, and this happens always, you can not avoid this copy even if the object will not be destroyed going out of scope ( a static variable for example ).

So in case of:

catch( A ) { }

you have 2 copies performed, in case of:

catch ( A & )

you have just one copy. For this very reason is not possible to modify the object thrown, because you have on the catch block a copy of it.

Next week the second and last part about exceptions.

operator new - new operator

2006-11-22T05:51:00.000-08:00

At first shot this two entities can appear to be the same; however they are not.
Let see what they are, what you can change and the safe rules to handle them.

First of all let see what happens when you write something like:

C * pC = new C;

Enough memory is allocated to contain the object requested
The constructor of C is called to initialize the object the lays in the memory allocated

This described is the new operator behave and you can not change the way he acts.

The first point in the sequence above is the only think you can change, the way the memory is allocated, the new operator uses for this task what is called: operator new.

So the pseudo code for C * pC = new C; could be:

Call operator new
Construct an object of the type request at the location returned from previous step

The operator new signature is something like this:

void * operator new(size_t);

so if you want change the way the new operator allocates the memory for your type then you need to rewrite the operator new.
As you already know when you specify a name in a scope ( for example a method name in a class ) this will hide the same name in the scopes that are containing your actual ( the base class scope for example ). So rewriting your operator new what you do is to hide the other forms of operator new. For instance these forms are:

void * operator new(std::size_t, std::nothrow_t) throw();
void * operator new(std::size_t, void *);

the former is the nothrow new the latter is the in place new. Actualy you can break more than this if you define your own operator new, something like:

void * operator new(std::size_t, T);

remember in this case that first argument of operator new shall be always std::size_t, in this case you will hide not only the "less common" operator new version but also the plain new one. I quoted less common because in reality the STL does heavy usage of in place new.