Reference Counting


Diligent Engine implements thread-safe reference counting system that supports weak pointers. The main reason for implementing this system and not using smart pointers provided by c++ standard library is because the engine is packed into dynamic libraries, which can be used in any language or system, and thus needs to provide simple and efficient method for managing object lifetimes. C++ smart pointers are efficient. They maintain internal reference counters, but do not allow accessing them directly. The counters can only be changed by creating new smart pointers, which limits their application to c++ only. The functions exposed by the engine can be made available to any other language and will seamlessly work for both internal and external users of engine interfaces. Besides, implementing own resource counting system gives more flexibility and provides better possibilities for optimizations. Diligent Engine for performance reasons implements its own atomic operations. On Win32 platform, both Diligent Engine and standard c++ atomics use InterlockedIncrement() Windows function, but because DE introduces less overhead, its implementation is more than 2 times faster than one provided by c++ standard library.

Reference counting system in Diligent Engine includes the following classes and interfaces:

  • IObject is the base interface to the reference counted object
  • IReferenceCounters is the interface to the helper object that implements reference counting and controls the lifetime of an object
  • RefCountedObject is the template base class for all reference counted objects
  • RefCountersImpl is the template class that implements actual reference counting functionality defined by the  IReferenceCounters interface
  • MakeNewRCObj is the template class that is responsible for creating object + reference counters pair
  • RefCntAutoPtr is the template class that implements strong pointer
  • RefCntWeakPtr is the template class that implements weak pointer

The following diagram illustrates different components of the system:


IReferenceCounters Interface

The declaration of  IReferenceCounters interface is given in the listing below:

The interface defines the following functions:

  • AddStrongRef()  – Increments the number of strong references by 1. The method returns the number of strong references after incrementing the counter. It is thread-safe and does not require explicit synchronization.
  • ReleaseStrongRef() –  Decrements the number of strong references by 1 and destroys the referenced object when the counter reaches zero. If there are no more weak references, destroys the reference counters object itself. The method is thread-safe and does not require explicit synchronization.
  • AddWeakRef() – Increments the number of weak references by 1 and returns the number of weak references after incrementing the counter. The method is thread-safe and does not require explicit synchronization.
  • ReleaseWeakRef() – Decrements the number of weak references by 1. If there are no more strong and weak references, destroys the reference counters object itself.
  • GetObject() – Gets the pointer to the IUnknown interface of the referenced object. If the object was destroyed, nullptr will be returned. If the object is alive, the pointer to the object’s IUnknown interface will be returned. In this case, the number of strong references to the object will be incremented by 1.
  • GetNumStrongRefs() – Returns the number of outstanding strong references. This method is intended for debug purposes only. In a multithreaded environment, the returned number may not be reliable as other threads may simultaneously change the actual value of the counter. The only reliable value is 0 as the object is destroyed when the last strong reference is released.
  • GetNumWeakRefs() – Returns the number of outstanding weak references. This method is intended for debug purposes only. In a multithreaded environment, the returned number may not be reliable as other threads may simultaneously change the actual value of the counter.

IObject Interface

The declaration of IObject interface is given in the listing below:

The interface defines the following functions:

  • QueryInterface() – Queries the specific interface. The method increments the number of strong references by 1. The interface must be released by a call to Release() method when it is no longer needed.
  • AddRef() – Increments the number of strong references by 1. This method is equivalent to GetReferenceCounters()->AddStrongRef(). It is thread-safe and does not require explicit synchronization.
  • Release() – Decrements the number of strong references by 1 and destroys the object when the counter reaches zero. This method is equivalent to GetReferenceCounters()->ReleaseStrongRef(). It is thread-safe and does not require explicit synchronization.
  • GetReferenceCounters() – Returns the pointer to IReferenceCounters interface of the associated reference counters object. The metod does not increment the number of strong references to the returned object.

RefCountedObject Class

RefCountedObject is the base template class that implements common functionality of all reference counted objects. Its source code is given in the listing below:

Few comments regarding the implementation:

  • AddRef(), Release(), and GetReferenceCounters() methods are declared as  final and  inline. The first keyword tells the compiler that the methods will never be overridden in derived classes, so that the compiler can avoid virtual calls.  inline is another hint for the compiler to inline the methods.
  • Calls to AddRef() and Release() are delegated to the reference counters object, pointer to which is kept in  m_pRefCounters member.
  • Valid pointer to the reference counters object must be provided to the constructor. Note that  m_pRefCounters does not necessarily have to track the object itself. It may be another object that owns this one, directly or indirectly.
  • The class defines two operators new and two operators delete. These operators are private and are only accessible by MakeNewRCObj class that is responsible for constructing new objects. The first pair uses standard c++ operator new to allocate required amount of memory. The second pair uses custom allocator. These versions take reference to the allocator as an argument. They also use debug information such as allocation description, file name and line number where allocation was performed. Also note that either operator delete is only called if an exception is thrown during the object construction.

RefCountersImpl class

RefCountersImpl class is responsible for the bulk of the actual reference counting functionality and implements IReferenceCounters interface.

Destroying controlled object

The most important operation that  RefCountersImpl class must support is destroying the object it controls. This operation needs to be flexible enough to support different types of objects as well as different allocators. One way to achieve this is to make the class a template class parameterized by the type of the controlled object as well as by the allocator type:

Such template class can hold pointers to the object and to the allocator:

so that the object can be destroyed as follows:

As long as ControlledObjectType is a type derived from RefCountedObject, it will have a virtual destructor and will be successfully destroyed by pObj->~ControlledObjectType(). This approach also handles any allocator and is thus a legitimate method to handle different types of objects and different types of allocators. In fact, this is how it was originally implemented in Diligent Engine. It however has one drawback: the type of RefCountedObject::m_pRefCounters member needs to be IReferenceCounter. The reason is that when an object is part of another reference counting object (for example, default view of a texture is part of the texture object), it keeps pointer to the reference counters of the parent object. Since the types of the two objects are unrelated, the only alternative left is using  IReferenceCounter class which is common ancestor for all reference counters. The biggest disadvantage of this method is that AddRef() and Release() methods of  RefCountedObject class perform virtual calls to AddStrongRef() and ReleaseStrongRef() of a specific  RefCountersImpl instantiation, which is suboptimal. To the contrary, in the implementation of   RefCountedObject class shown above, the type of  RefCountedObject::m_pRefCounters member is  RefCountersImpl (non template). All methods of  IReferenceCounter interface implemented by  RefCountersImpl class are labeled as final, and inline, so a good compiler will not only be able to eliminate virtual call, but to eliminate the function call altogether by inlining the methods.

The question now is: if  RefCountersImpl class does not depend on the type of the object and allocator, how can we provide the required flexibility? The answer is using object wrapper template class that knows how to destroy the object. First of all, we will define non-template abstract base class as shown in the listing below:

This class only provides two pure virtual methods: one to destroy the object, and the second to query the specified interface (which we will use to implement GetObject() method later). So as long as  RefCountersImpl class keeps pointer to the object wrapper of type ObjectWrapperBase, it can destroy controlled object as easy as pWrapper->DestroyObject(). Specific details about destroying particular object with particular allocator are defined by the inherited template class given in the listing below:

You can see that all the object destruction logic moved to DestroyObject() virtual method. Since the class is parameterized by the type of object and allocator (the same parameters that  RefCountersImpl class had originally), it can handle all object types and all allocator types. The only question we need to answer is how to initialize specific instance of ObjectWrapper class. For that,  RefCountersImpl class provides template method Attach() that has the same template parameters and initializes specific instance of  ObjectWrapper class. However, instead of creating the class on the heap, the method initializes it in place in the raw memory buffer provided by  RefCountersImpl class:

Typical size of m_ObjectWrapperBuffer buffer will be the size of three pointers (vtbl pointer plus m_pObject and m_pAlloctor members). In-place new operation

initializes raw memory with an instance of ObjectWrapper<ObjectType, AllocatorType> class, whose  m_pObject and m_pAlloctor members will store pointers provided to the constructor from the calling function. Also, the compiler will generate specific  DestroyObject() method instantiation for the ObjectType and AllocatorType types, which will be accessible though the virtual table of the  ObjectWrapper<ObjectType, AllocatorType> class, pointer to which will in turn be stored in the first element of  m_ObjectWrapperBuffer buffer. So putting everything together,  RefCountersImpl::ReleaseStrongRef() will execute the following code to destroy the controlled object:

Class Members and Methods

Now when we described how  RefCountersImpl class manages various types of objects allocated with different allocators, we can describe implementation of class methods. But first let’s take a look at other members that the class defines:

  • m_lNumStrongReferences is the strong reference counter
  • m_lNumWeakReferences is the weak reference counter
  • m_LockFlag is the lock flag used to gain exclusive access to the object members
  • m_ObjectState is the state of the object (not initialized, alive or destroyed)

Let’s now look at the implementation of the class methods.  AddStrongRef() ,   AddWeakRef(),   GetNumStrongRefs(), and  GetNumWeakRefs() are straightforward:

Note that inline and final keywords are very strong hints to a good compiler that these methods need to be inlined when they are called using the pointer of  RefCountersImpl* type.

The real work is done by ReleaseStrongRef(), ReleaseWeakRef() and GetObject() methods. At first glance, the implementation may seem to be straightforward, but in a multithreaded environment, there may be a lot of tricky situations. Consider few examples:

  • There is one strong reference and one weak reference remaining. One thread releases the last strong reference, while the second one releases weak reference. The first thread destroys controlled object, but which one should destroy the reference counter itself? If the threads simply check that both counters have reached 0, then the object will be released twice, which will cause undefined behavior.
  • There is one strong reference and one weak reference remaining. One thread releases the last strong reference, while the second one attempts to get the object by calling GetObject(). Who wins in this situation? How to avoid returning reference to the destroyed object?

These were just few examples, but there exists many more complex cases. We will now give implementations of all three functions for reference, and then give detailed explanation for every function:

Let’s now take a look at how these functions work. Let’s start with  ReleaseStrongRef(). The function first atomically decrements the strong reference counter without acquiring the lock. This is important as locking is an expensive operation and you want to pay this cost only when it is absolutely necessary. If the decremented value is 0, which means that the function released the last reference, object should be destroyed. Note that Atomics::AtomicDecrement() decrements the counter atomically. This means that if several threads reach the instruction at the same time, access to the counter will be serialized. The function returns decremented value, and only one thread will read 0. Note, that it is crucially important to use the value returned by the function, because if we used m_lNumStrongReferences in comparison, an object could be destroyed multiple times as more than one thread could read 0. The thread that reads 0, starts destroying the object:

Interference between ReleaseStrongRef() and GetObject() methods

This is the first situation where we need to carefully design interference with other functions. What happens if another thread has a weak reference and starts getting strong reference to an object through GetObject()? Without special care, the method can potentially return reference to an object that will very soon be destroyed by  ReleaseStrongRef() from another thread. After  ReleaseStrongRef() starts destroying the object, it acquires exclusive access to the class using the lock.  GetObject() also acquires the lock, so only one method in one of the threads can now be running. Since   ReleaseStrongRef() can not be stopped and the object will be destroyed anyway, it is responsibility of   GetObject() to detect this situation and avoid returning reference to the soon-to-be destroyed object. After obtaining the lock,  GetObject() first increments the number of strong references (we will discuss later why it is important to increment the counter while having exclusive access to the counters) and examines the returned value. There are two cases now:

  • Returned value is 1. This means that there are no more alive strong reference to the object, and the object is either already destroyed or will be destroyed very soon. In this case, we should not return reference to the object
  • Returned value is greater than 1. This means there exists at least one another strong reference to the object. Since we already incremented the counter, no  ReleaseStrongRef() in other threads will be able to decrement it to 0 (assuming the calls to AddRef() and Release() are correctly balanced), so it is safe to return the reference to the object.

Note that QueryInterface() also increments the strong reference counter, but before calling the method we need to be sure that the object is alive.

The following two figures illustrates two possible scenarios that show what happens if one thread is releasing the last strong reference to the object, while the second one is attempting to get strong reference to the object through a weak reference using GetObject(). In the first scenario, ReleaseStrongRef() decrements strong reference counter first, and the object will be released:


In the second scenario, GetObject() increments strong reference counter first, and the object is kept alive:


Note that it is crucial that  GetObject() increments the strong reference counter only after acquiring the lock. Consider what may have happened if  GetObject() did not acquire the lock and there were several threads running   GetObject() while another thread was releasing the last reference to the object:


As you can see, what happens in the scenario above is that two threads running  GetObject() increment strong reference counter. As a result, the thread that increments the counter second sees two strong reference and starts returning the object. The object, however will have been destroyed by that time or will be destroyed soon after by the thread running ReleaseStrongRef(). This scenario is not possible if   GetObject() acquires the lock:


The crucial difference is that if the lock is acquired, the second thread (thread 3) running  GetObject() will only be able to increment reference counter after the first thread running  GetObject() (thread 2) has decremented it. As a result, thread 3 will also see only one strong reference and will not return the object. In the alternative scenario where one of the threads running  GetObject() increments the counter before  ReleaseStrongRef() decrements it, the object will be kept alive and both threads 2 and 3 will obtain valid object reference.

There may be two questions that careful reader may ask at this point. First, if GetObject() acquires the lock, why do we use atomic operations to increment and decrement the counter? This is because other methods ( AddStrongRef() and ReleaseStrongRef()) access the counter without acquiring the lock. The second question is: if  AddStrongRef() increments reference counter without acquiring the lock, is it possible that the same faulty scenario happens when one thread runs  ReleaseStrongRef(), second runs  GetObject() while the third one runs  AddStrongRef()? The answer is no because since the third thread runs  AddStrongRef(), there exist at least one another strong reference to the object. So the first thread running  ReleaseStrongRef() is not releasing the last reference.

Interference between ReleaseStrongRef() and ReleaseWeakRef() methods

Without special care, there may be a problem if two threads are simultaneously releasing the last strong and weak reference through  ReleaseStrongRef() and  ReleaseWeakRef() methods. If both methods see there are no more strong and weak references, two steps are taken to assure that only one method destroys the reference counting object itself. First,  ReleaseStrongRef() checks the number of strong references while still holding the lock and sets the flag indicating if reference counting object needs to be released:

Second,  ReleaseWeakRef() decrements weak reference counter only after acquiring the lock. If two threads are running  ReleaseStrongRef() and  ReleaseWeakRef(), there are two scenarios possible depending on which method first acquires the lock. In the first scenario,  ReleaseStrongRef() acquires the lock first. Since number of weak references it will see will not be equal zero, the method will not destroy the reference counting object:


In the second scenario,  ReleaseWeakRef() acquires the lock first. However,  ReleaseStrongRef() must first destroy the controlled object, so  ReleaseWeakRef() must not self destroy reference counting object. To detect this situation, the methods use m_ObjectState flag. The state is atomically set to ObjectState::Destroyed by  ReleaseStrongRef() method. If  ReleaseWeakRef() sees that the state is not ObjectState::Destroyed, it means there are alive strong references or  ReleaseStrongRef() method is not completed as in the scenario below:


Note that it is crucial that  m_ObjectState is accessed when lock is acquired and that weak reference counter is only decremented while lock is acquired as well. For example, consider the following scenario where weak reference counter is decremented without acquiring the lock, which leads to self-destroying reference counters object twice:


Acquiring the lock has two effects:

  • If ReleaseStrongRef() sets bDestroyThis flag to true, this means there are no other threads that may run code related to weak references, because ReleaseWeakRef() decrements weak reference counter after acquiring the lock. So reference counters object can be safely destroyed
  • If  ReleaseWeakRef() sees that  m_ObjectState is set to  ObjectState::Destroyed, then all strong reference-related code must be completed by this time because the object state is updated while keeping the lock. In this case, the reference counters can be safely destroyed as well

Interference between ReleaseWeakRef() and GetObject() methods

There is in fact no interference possible between these two methods if they are used correctly. If GetObject() is called in one thread, while another thread is running  ReleaseWeakRef(), this means there are at least two weak references. Therefore,  ReleaseWeakRef() will only decrement the counter, but will not destroy the reference counters object as there will be at least one another outstanding weak reference.

MakeNewRCObj Class

MakeNewRCObj is a class that is responsible for creating object + reference counters pair. The class is a template class parameterized by the type of the object and type of the memory allocator:

The class defines the following private members:


  • m_pAllocator is the pointer to the allocator that will be used to allocate memory for the object. If nullptr, default system allocator is used.
  • m_pOwner is the pointer to the object that owns the one to be created. If nullptr, the object has no owner.
  • m_dbgDescription, m_dbgFileName, and m_dbgLineNumber are debug members provided to the allocator to describe the allocation for debug purposes.

The class provides two constructors to initialize its members:

The class defines template operator () that performs allocation of the object:

There are few interesting things about the operator. First, it is a variadic template function. It can take any number of arguments of any types. The function passes all its arguments to the object constructor using perfect forwarding mechanism:

Perfect forwarding assures that every parameter is passed as either lvalue, or rvalue, depending on its original type. Second, the method uses in-place new to create object using custom allocator in case it is provided:

If allocator is not provided, default allocator is used:

Recall that RefCountedObject defines two private versions of operator new, and that  MakeNewRCObj is a friend class of  RefCountersImplMakeNewRCObj is thus the only class that has access to these operators, and is the only place where instances of classes derived from  RefCountersImpl can be created on the heap.

After the object is created, the method attaches this object to the reference counters using template Attach() method that we discussed earlier.

If an exception is thrown by the object constructor, the method catches it, destroys reference counters object, and re-throws the exception. Note that in our original implementation, reference counters object was created by the object constructor. This resulted in a memory leak if constructor threw an exception, because destructor would never be called and memory never freed.

There is one tricky situation related to correct exception handling. Consider a scenario, where object A owns object B that keeps weak reference WP to object A. If constructor of A throws an exception, then destructor of weak reference may try to destroy the reference counting object. As we know, this operation is performed by the  MakeNewRCObj class, and we must avoid destroying the same object twice. What helps here is that ReleaseWeakRef() checks if the state of the object is ObjectState::Destroyed. Since the object has never been constructed, its state will be ObjectState::NotInitialized, and ReleaseWeakRef() will not destroy the reference counting object.

The following macros is defined to facilitate usage of  MakeNewRCObj class:

With the help of this macros, typical allocation looks like this:

Smart Pointer Classes

The reference counting system provides two smart pointer class, RefCntAutoPtr and RefCntWeakPtr that implement strong reference and weak reference functionality. The class implementation is relatively straightforward as they are thin wrappers over  RefCountersImpl and RefCountedObject classes.