PIMPL idiom in C++
Developers working on a “software library” in C++ (or any other native language) should follow a set of critically important rules, or their customers will soon be in trouble using their product. Some of these rules are for example Semantic Versioning, Good API Design and Keeping Backward Compatibility. The later one has many aspects and requires its detailed post. In this post, I will discuss one of the vastly used techniques in C++ which helps developers to keep Backward Compatibility in binary (ABI) level.
What is Backward Compatibility?
Backward Compatibility is a promise made by the creator of an interface. Briefly speaking, it means that the newer version of a product (which expose and interface) is still applicable for all users of the older versions. The term has been widely used to describe properties of various systems, especially in telecommunications and computing. As an example, consider the Universal Serial Bus protocol. A host that implements USB 3.0 will work properly with older gadgets that implement USB 2.0 or 1.1.
Speaking of native libraries written in C++, there are several aspects of backward compatibility. Most commonly source compatibility (in API level) and binary compatibility (ABI) are considered.
On the source level, BC usually means to follow proper versioning rules. That is, for example, a software that links against version 1.2.3 of a library, should be able to link as well to version 1.2.4 or 1.3.0 of it. According to the semantic versioning rules, as long as the MAJOR version number (first digit) is not changed, backward compatibility is present. API compatibility guarantees that the client source does not need any modification to use the compatible new version. Though it does not mean that the binary artefacts are interchangeable. One may be or be not able to use the new library as a drop-in replacement for the older version. That points us to the ABI compatibility. If ABI compatibility is broken, then the client needs to re-compile and link against the new version, even though they have not changed a single line of their code.
Binary compatibility promise guarantees that a newer version of a library is a drop-in replacement for the older one, and the client code does not need to re-compile. This kind of BC is of grave importance! Because there are many scenarios in which the lack of ABI BC will cause great trouble. Consider for example security updates. Let’s assume a vulnerability is found in your library. Naturally, you fix the bug as soon as possible –while keeping API intact– then deploy your changes. If you fail to keep ABI compatible with the previous version, you’ll be in big trouble. You will have to re-compile the entire code base that links against your library. Most probably, you don’t own all of the client codes, so you will have to ask them to re-compile their code, just to adapt your tiny little fix! But if you keep ABI intact, then the update is merely replacing a binary in the customer site. (That is to replace a DLL file in Windows for example)
Methods of Keeping ABI Backwards Compatible
There are several ways to make it easier to keep BC promises on ABI
level. Most important one is to produce Position Independent
Code. This could be done by adding a lookup table (GOT) and calculate
function addresses at runtime before calling them. This way if one
re-orders member functions of a class or add new ones, they can keep
backward compatibility (despite the fact that the actual addresses
have been changed already). In Linux-like systems, adding -fPIC
flag
to compiler produces position independent code. That is useful
especially when building shared libraries. In fact, almost all shared
libraries in common repositories do this.
Though using -fPIC
is not the whole story. There exists situations
in which you will have to modify memory layout of an existing
class. One for example is to add a new member variable. That will
change all addresses if you add it to the beginning of the members
list.
PIMPL Idiom
Pointer to Implementation (pimpl) is a programming technique that helps developers to preserve ABI across versions. Using pimpl enables developers to keep ABI BC in a vast variety of scenarios. That means using this method, you may add as many new members you need the ABI will remain intact.
In order to use pimpl idiom, you must put all of the member variables of the class, that could be subject to change in the future versions, inside a non-API class / struct; then point to an instance of it on the heap (using either a raw or smart pointer). Let’s have a look at an example (:
A Simple Example
Let’s have a look at a simple example to see how the pimpl can help to
preserve BC. This example shows what the problem is with breaking BC
and how to fix it using the pimpl idiom. To do so, I will introduce a
very simple class named person
which keeps name and last name for an
individual, and does nothing else. For the sake of simplicity I have
removed many details like symbol exporter #define
, modifiers,
etc. So our little class looks like this:
class LIBFOO_API person {
public:
person(const std::string& name, const std::string& last);
~person() = default;
std::string name() const;
std::string last() const;
private:
std::string m_name;
std::string m_last;
};
Like any common shared library in Linux, I am going to compile it with
gcc, adding -fPIC
:
g++ -DLIBFOO_EXPORT -shared -fPIC -fvisibility=hidden -o libfoo.so ./libfoo.cpp
To demonstrate how BC works, I am also going to need a simple program that uses this library. Let me write it like this:
#include <iostream>
#include "libfoo.hpp"
int main(int argc, char* argv[]) {
person people[3] {{"Dexter", "Fortescue"},
{"Armando", "Dippet"},
{"Albus", "Dumbldore"}};
for(int i=0; i<3; ++i)
std::cout << "Hello " << people[i].name() << "!\n";
return 0;
}
This program can be considered as client’s code, which uses our library. To compile and link it the customer may invoke this:
g++ -o program ./main.cpp -L. -lfoo
So the output user expects from the program is like this:
$ ./program
Hello Dexter!
Hello Armando!
Hello Albus!
$ echo $?
0
Everything looks fine. Let’s assume this is the version 2.1.4
of the
libfoo and in the next version, we are required to keep age of the
individuals. So we are going to modify person
class and add member
variables accordingly. This change is not a breaking change in API
level. So the new version number will be 2.2.0
:
class LIBFOO_API person {
public:
person(const std::string& name, const std::string& last);
person(const std::string& name,
const std::string& last,
const uint16_t age);
~person() = default;
uint16_t age() const;
std::string name() const;
std::string last() const;
private:
uint16_t m_age;
std::string m_name;
std::string m_last;
};
So I am going to compile the library as before. The client code has no
information about the change and has no way to know about age
variable. The part of API that client has been aware of, has not
changed at all. All functions, the constructor and members, from an
API point of view is the same as before. So we would expect the code
to run properly with no change. Sadly that’s not the case. Now if I
try to run client’s software, it will crash:
free(): invalid pointer
Aborted (core dumped)
$ echo $?
134
To see why this happens, we must take a look at the memory layout of
objects in use. First let’s see initial version of the library
(2.1.4
):
0 | class person
0 | class std::__cxx11::basic_string<char> m_name
32 | class std::__cxx11::basic_string<char> m_last
| [sizeof=64, dsize=64, align=8,
| nvsize=64, nvalign=8]
After adding age
, we can observe how memory layout has changed. The
address of m_name
and m_last
now differ:
0 | class person
0 | uint16_t m_age
8 | class std::__cxx11::basic_string<char> m_name
40 | class std::__cxx11::basic_string<char> m_last
| [sizeof=72, dsize=72, align=8,
| nvsize=72, nvalign=8]
But why crash happens? Although no member variable has been used
directly by the client code, and the fact that the library is a
position-independent code, one may expect the client code to work with
the new version. The reason for this behaviour is the ABI break caused
by different sizes of the class. The stack on the client-side is now
corrupted and destructor call for people
will corrupt memory. There
are other examples we can demonstrate that clients code will directly
segfault instead. There exist even more complicated situations in
which there is absolutely no change on the API but the ABI breaks.
Fix ABI breaks using PIMPL
In order to implement PIMPL idiom, we’ll need to change the person
class like this:
class LIBFOO_API person {
public:
person(const std::string& name, const std::string& last);
~person() = default;
std::string name() const;
std::string last() const;
private:
struct details {
details(const std::string& name, const std::string& last);
std::string m_name;
std::string m_last;
};
details* m_impl;
};
You can use smart pointers like std::uniqur_ptr
instead of raw
pointers. Also, you may move the definition of details
to another
domain, like a non-API header or the beginning of the source file as
well. That would provide a stronger level of encapsulation and
separation of implementation. Note that details
already is out of
public API since it has private access level. If you do apply both
aforementioned changes, the final class would look like this:
class LIBFOO_API person {
public:
person(const std::string& name, const std::string& last);
~person() = default;
std::string name() const;
std::string last() const;
private:
struct details;
std::unique_ptr<details> m_impl;
};
Let’s go ahead and compile our new PIMPL-ready library and also the client’s code. Now we have a version of the library which provides an ABI, resilient to changes. The memory layout now looks like the below code. We can observe that the layout includes only a pointer to implementation. There is no trace of any data whatsoever.
0 | class person
0 | struct person::details * m_impl
| [sizeof=8, dsize=8, align=8,
| nvsize=8, nvalign=8]
Obviously, we must also modify the implementation details. Now all
functions need to pass an extra level of indirection to access
underlying data. For example name()
would look like this:
std::string person::name() const {
return m_impl->m_name;
}
Then client codes compiles and links against libfoo just like before.
Now, let’s say we need to deploy a new version of libfoo containing
age
. We will need to add a variable to details
class, then update
our API accordingly. The code for new version of person
class will
look like this (changes are highlighted):
class LIBFOO_API person {
public:
person(const std::string& name, const std::string& last);
~person() = default;
std::string name() const;
std::string last() const;
// New members
person(const std::string& name,
const std::string& last,
const uint16_t age);
uint16_t age() const;
private:
struct details {
details(const std::string& name,
const std::string& last,
const uint16_t age);
uint16_t m_age;
std::string m_name;
std::string m_last;
};
details* m_impl;
};
Note that there is no need to keep API of details
compatible,
meaning no new constructor is needed. That is because details
is not
part of libfoo’s public API, so we can break things as we like. Now if
we look at the memory layout of new version, we can observe that it is
exactly same as the previous one. So the client’s program, can use
this new library without a re-compile. They just need to replace
libfoo’s binary artefact with its predecessor. We already did exactly
that by re-compiling the library (which replaces libfoo.so
).
Pros and Cons
Besides easing BC of ABI, using PIMPL idiom is beneficial in some other ways. Amongst pros of PIMPL-enabled classes are:
-
Interface Segregation: PIMPL provides a better encapsulation of data, since it hides entire implementation details from public API. Using PIMPL one can even hide some dependencies, which is useful for the client.
-
Compile Time Improvement: Since the main class hides its implementation details, it does not need to
#include
their details. Therefore client’s code can be spared of having some unnecessary details in its domain. Size of a PIMPL-enabled header file after pre-processing can be significantly reduced with forward declaring type information; resulting in better compile time.
As a fundamental rule of Theory of Information, there exist no cost-free abstraction, right? PIMPL adds another layer of indirection. So let’s see what is the cost:
-
Development Effort: Using PIMPL in most cases requires a non-default deconstructor, copy and move constructors and their corresponding assignment operators. In terms of development effort, PIMPL adds a cost.
-
Performance: As mentioned before, PIMPL adds a level of indirection. However, since the pointer to details has a lifetime bound to the actual instance’s lifetime, most compilers can optimise away de-reference cost to it. Using PIMPL also moves data away from it’s creation point (onto heap memory actually) probably resulting in less cache-friendly code.
Final Thoughts
You must note that adding PIMPL does not resolve your library’s problems automagically! You can not apply PIMPL to a non-PIMPL class without actually breaking ABI. Also, there are situations in which you have no other way than breaking ABI to provide a feature or fix a bug.
If your library has a huge user base, then PIMPL can be a
saviour. Otherwise, it may even not worth the effort, for example, to
keep ABI intact for an in-house tool or a very specific library for
your teammates in your company. Some developers add an empty raw
pointer (a void*
or a pointer to a forward-declared, non-existing
type) just in case. That’s considered good practice.
Final point is that there is no reason to have all members be hidden in implementation details. You can always have members which are not subject to change, alongside with a pointer to implementation details.