Managing Descriptor Heaps

Introduction

Resource descriptors and descriptor heaps are key concepts of a new resource binding model introduced in Direct3D12. A descriptor is a small block of data that fully describes an object to the GPU, in a GPU specific opaque format. Descriptor heap is essentially an array of descriptors. Every pipeline state incorporates a root signature that defines how shader registers are mapped to the descriptors in the bound descriptor heaps. Resource binding is a two-stage process: shader register is first mapped to the descriptor in a descriptor heap as defined by the root signature. The descriptor (which may be SRV, UAV, CBV or Sampler) then references the resource in GPU memory. The picture below illustrates simplified view of the D3D12 resource binding model.

D3D12ResourceBinding

There are four types of descriptor heaps in D3D12:

  • Constant Buffer/Shader Resource/Unordered Access view ( D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV)
  • Sampler ( D3D12_DESCRIPTOR_HEAP_TYPE_SAMPLER)
  • Render Target View ( D3D12_DESCRIPTOR_HEAP_TYPE_RTV)
  • Depth Stencil View ( D3D12_DESCRIPTOR_HEAP_TYPE_DSV)

For GPU to be able to access descriptors in the heap, the heap must be shader-visisble. Only the first two heap types ( CBV_SRV_UAV and SAMPLER) can be shader visible. RTV and DSV heaps are only CPU-visible. The size of the CPU-only descriptor heap is only limited by the available CPU memory. The size of the shader-visible descriptor heap has more strict limitations. While CBV_SRV_UAV heap can hold as many as 1,000,000 descriptors or more, the maximum number of samplers in a shader-visible descriptor heap is only 2048 (see D3D12 Hardware Tiers on MSDN). As a result, not all descriptor handles can be stored in a shader-visible descriptor heap, and it is responsibility of D3D12 application to make sure that all descriptor handles required for rendering are in GPU-visible heaps. This article describes a descriptor heap management system implemented in Diligent Engine 2.0.

Overview

Descriptor heap management system in Diligent Engine consists of five main classes:

  • DescriptorHeapAllocation is a helper class that represents descriptor heap allocation, which is simply a range of descriptors
  • DescriptorHeapAllocationManager is the main workhorse class that manages allocations in D3D12 descriptor heap using variable-size GPU allocations manager
  • CPUDescriptorHeap implements CPU-only descriptor heap that is used as a storage of resource view descriptor handles
  • GPUDescriptorHeap implements shader-visible descriptor heap that holds descriptor handles used by the GPU commands
  • DynamicSuballocationsManager is responsible for allocating short-living dynamic descriptor handles used in the current frame only

Each class as well as their interactions will be described in details below.

Descriptor Heap Allocation

DescriptorHeapAllocation, the first class used by the Diligent Engine descriptor heap management system, represents a descriptor heap allocation. It can be initialized as a single descriptor or as a continuous range of descriptors in the specified heap.

DescriptorHeapAllocation

Note that the descriptor heap allocation only references a range in the heap. It contains the first CPU handle in CPU virtual address space, and, if the heap is shader-visible, the first GPU handle in GPU virtual address space. The class prohibits copies and only allows transfer of ownership through move semantics. The class is defined as shown below:

One field that requires some clarification is m_AllocationManagerId. As we will discuss later, a descriptor heap object may contain several allocation managers. This field is used to identify the manager within the descriptor heap that was used to create this allocation.

Descriptor Heap Allocation Manager

Second class that constitutes descriptor heap management system is DescriptorHeapAllocationManager. This class uses variable-size GPU allocations manager to handle allocations within the descriptor heap.

DescriptorHeapAllocationsManager

Every allocation that the class creates is represented by an instance of DescriptorHeapAllocation class. The list of free descriptors is managed by m_FreeBlocksManager member. The class declaration is given in the listing below:

The class provides two constructors. The first constructor creates a new D3D12 descriptor heap and address the entire available space. The second constructor uses subrange of descriptors in an existing D3D12 heap. This allows a number of allocation managers to share the same D3D12 descriptor heap, which is essential for GPU-visible heaps.

Allocation routine uses DescriptorHeapAllocationManager::Allocate() to allocate the requested number of descriptors in the heap and returns DescriptorHeapAllocation object representing the allocation.

Similarly, deallocation routine takes DescriptorHeapAllocation object and uses DescriptorHeapAllocationManager::Free() to release the allocation. Note that since GPU commands are executed asynchronously, the allocation cannot be released immediately. Instead, the manager adds it to the queue along with the current frame number and releases all stale allocations later when the frame is completed by the GPU (which is detected by a signaled fence).

ReleaseStaleAllocations()  method must be called at the end of every frame to actually release all stale allocations from previous frames:

CPU Descriptor Heap

The next part of the descriptor heap management system is CPU descriptor heap. CPU descriptor heaps are used by the engine to store resource views when a new resource is created. Since there are total four descriptor heap types, the system maintains four CPUDescriptorHeap instances (the heaps are part of the render device). Every CPU descriptor heap keeps a pool of Descriptor Heap Allocation Managers and a list of managers that have unused descriptors:

The following figure gives an example of the contents of the CPU descriptor heap object:

CPUDescriptorHeap

When allocating a new descriptor, the CPUDescriptorHeap class goes through the list of managers that have available descriptors and tries to process the request using every manager. If there are no available managers or no manager was able to handle the request, the function creates a new descriptor heap manager and lets it handles the request. The source code of the allocation function is given in the listing below:

For instance, if we request a new allocation with five descriptors, the function will first ask manager [1] to handle this request, but it will fail as it only has maximum two consecutive descriptors. The function will then ask manager [2], which will be able to handle the request:

 

CPUDescriptorHeap-Allocation1

If after that we ask to allocate three descriptors, no managers will be able to handle this request and the function will add new manager to the pool and use it to handle the request:

CPUDescriptorHeap-Allocation2a

Deallocation routine calls Free() method of the appropriate allocation manager. Recall that the method is called from the destructor of DescriptorHeapAllocation. Note that the function uses GetAllocationManagerId()  to retrieve the index of the manager that created this allocation:

Finally, there is usual method that must be called at the end of the frame to release all stale allocations when it is safe to do so. Note that it is this method that returns the manager to the list of available managers. Only after descriptors have been actually released is it safe to do so.

GPU Descriptor Heap

The main goal of the CPU descriptor heap is to provide storage for the resource view descriptors. For GPU to be able to access the descriptors, they must reside in a shade-visible descriptor heap. Only one SRV_CBV_UAV and one SAMPLER heap can be bound to the GPU at the same time. Source descriptors may be scattered across several CPU-only descriptor heaps, but must be consolidated in the same SRV_CBV_UAV or  SAMPLER heap before a draw command can be executed. As a result, GPUDescriptorHeap object contains only single D3D12 descriptor heap. The space is broken into two parts: the first part is intended to keep rarely changing descriptor handles (corresponding to static and mutable variables). The second part is used to hold dynamic descriptor handles, i.e. temporary handles that live during the current frame only. While the first part is shared between all threads, it would be very inefficient to have the second part organized the same way. Dynamic descriptor handle allocation can potentially be very frequent operation, and if several threads record commands simultaneously, allocating dynamic descriptor handles from the same pool will be a bottleneck. To avoid this problem, dynamic descriptor handle allocation is a two stage process. On the first stage, every command context recording commands allocates a chunk of descriptors from the shared dynamic part of the GPU descriptor heap. This operation requires exclusive access to the GPU heap, but happens infrequently. The second stage is suballoction from that chunk. This part is lock-free and can be done in parallel by every thread. The structure of the GPU heap can then be depicted as shown below:

GPUDescriptorHeap

There are two classes that implement the strategy described above. The GPUDescriptorHeap manages the two parts of the heap and  DynamicSuballocationsManager handles suballocations within the dynamic part. As we talked above, GPUDescriptorHeap class contains two descriptor heap allocation managers, one for static allocations, one for dynamic allocations:

Note that both these allocation managers are initialized to perform suballocations from the same D3D12 descriptor heap. Also, the first manager is assigned id 0, the second one is assigned id 1. The class provides two methods to allocate from static and dynamic parts of the heap:

There is only one Free() method as manager id can be used to understand if allocation belongs to the static or dynamic part:

Note that all methods lock mutexes to acquire exclusive access to the allocation managers. AllocateDynamic() method is solely used by the  DynamicSuballocationsManager class to allocate a chunk of heap to perform suballocations from. The class maintains a list of chunks allocated from the main GPU descriptor heap as well as the offset within the current chunk:

During every frame, allocations are performed in a linear fashion. The allocation method fist checks if there is enough space for the requested number of descriptors in the current chunk. If there is not, the method requests a new chunk from the main GPU descriptor heap. The suballocation then happens from the new chunk:

Note that this method is lock-free as every context has its own suballocations manager. The thread may only be blocked when a new chunk is requested from the main GPU descriptor heap, but this is infrequent situation.

Suballocations are not released individually, so DynamicSuballocationsManager::Free() method does nothing. Instead, all allocations are discarded when command list from this context is recorded and executed by the render device:

Clearing the vector causes all Descriptor Heap Allocation objects to be destroyed, which in turns calls their destructors. Destructors call GPUDescriptorHeap::Free() method of the parent GPU descriptor heap, which adds the allocation to the release queue. The allocations are actually released few frames later when GPUDescriptorHeaps::ReleaseStaleAllocations() is called by the CloseAndExecuteCommandContext() method of the render device.

The Big Picture

Now when we presented every individual component, we can describe how they interact with each other and the rest of the system. There are four shared CPU-only descriptor heaps ( CBV_SRV_UAVSAMPLER, RTV and DSV) implemented by CPUDescriptorHeap class, and two shader-visible (GPU) descriptor heaps ( CBV_SRV_UAV and SAMPLER) implemented by GPUDescriptorHeap class. Every device context that is used for recording commands contains two dynamic suballocation managers (corresponding to two shader-visible descriptor heap types) represented by DynamicSuballocationsManager class. CPU descriptor heaps are used when a new resource view is created. GPU descriptor heaps are used by the shader resource binding system to allocate storage for shader-visible descriptors. They also used for allocation of dynamic descriptors.

DescriptorHeaps-BigPicture1

Usage scenarios

Let’s now talk about few scenarios where descriptor heaps are involved.

Creating Resource View

Let’s first consider how resource views are created using the example of creating a shader resource view (SRV) of a texture. The process proceeds as follows:

  1. An allocation containing single descriptor handle is requested from the  CBV_SRV_UAV CPU-only descriptor heap. Descriptor heap allocation goes as discussed above through the following steps:
    1. The CPUDescriptorHeap::Allocate() method acquires exclusive access to the CPU descriptor heap object
    2. The method iterates over descriptor heap managers that have available descriptor handles and requests one-descriptor allocation
      1. Since only one descriptor handle is requested, the very first manager will be able to handle the request
    3. If there are no available managers, new manager (and a new D3D12 descriptor heap) is created to handle the request
  2. D3D12 render device is used to initialize shader resource view in the allocated descriptor (see ID3D12Device::CreateShaderResourceView on MSDN)
  3. Descriptor Heap Allocation object is kept as part of the resource view object and is destroyed when resource view object is released. At this point:
    1. Destructor of the Descriptor Heap Allocation object calls CPUDescriptorHeap::Free() that locks the heap and calls DescriptorHeapAllocationManager::Free()  method of the allocation manager that created the allocation
    2. The manager inserts allocation attributes (offset and size) along with the frame number into the deletion queue
    3. Few frames later when frame completion fence is signaled, the allocation is actually released by  CPUDescriptorHeap::ReleaseStaleAllocations() method

Creating all types of texture views (SRV, RTV, DSV and UAV) as well as all types of buffer views is done in the same way.

Allocating Dynamic Descriptor

Let’s now recap how dynamic descriptors are allocated:

  1. The context which needs dynamic descriptor uses one of its two dynamic suballocation managers ( CBV_SRV_UAV or SAMPLER) to request the desired type of descriptor handle
    1. The suballocation manager checks if the last chunk contains enough space to suffice the allocation request. In most situations that will be the case and the descriptor handles will be suballocated from this chunk
    2. If there is no enough space, the suballocation manager reuquests the main GPU descriptor heap to allocate new chunk of descriptor handles. The handles are then suballocated from the new chunk
  2. At the end of the frame, the suballocation manager disposes all chunks which go back to the GPU descriptor heap
    1. The GPU descriptor heap inserts all chunks along with the frame number into the release queue
    2. Few frames later when frame completion fence is signaled, the chunks are actually released and the space becomes available for new allocations

Shader Resource Binding

Diligent Engine uses shader resource binding model that includes three types of shader resources based on the frequency of change (static, mutable and dynamic) as well as shader resource binding object. When new shader resource binding object is created, it allocates space in the GPU descriptor heap for its mutable and static resources. The allocation is kept by the shader resource binding object and is released when the owning object is destroyed. This topic will be discussed in details in a separate post.

Multithreading and GPU-safety concerns

The descriptor heap management system is correct, safe and efficient in a multithreaded environment. All three types of allocations (CPU descriptor, static/mutable GPU descriptor and dynamic GPU descriptor) proceed through thread-safe paths. CPU and static/mutable descriptor allocation functions ( CPUDescriptorHeap::Allocate() , GPUDescriptorHeap::Allocate() ) acquire exclusive access to descriptor heap objects and potentially may block other threads. However, descriptor allocation is fast and constitutes only a tiny portion of work associated with resource creation, so this is not a problem. Dynamic descriptor heap allocation ( DynamicSuballocationsManager::Allocate()) is free-threaded, so can be called in parallel by many threads with no performance cost (the same context should not be used by different threads simultaneously). The only blocking function is  GPUDescriptorHeap::AllocateDynamic(), but it is only called occasionally.

Deallocation is more complicated as besides CPU-side safety the system must also make sure that descriptors are not used by the GPU. CPU-side safety is achieved by protecting the deallocation methods ( CPUDescriptorHeap::Free() , and GPUDescriptorHeap::Free() ) with mutexes. GPU-side safety is assured by recording the command list number when the allocation is destroyed. For CPU and static/mutable GPU descriptors, it does not matter which thread releases the allocation. As long as there are no more references, the allocation can never be used again in any new GPU command, but it may be referenced by the commands pending execution by GPU. So at the moment when allocation is released, it is added by the deleting thread into the deletion queue along with the current command list number. Deletion queues are purged once at the end of each frame by the render device. The device knows how many command lists have actually been completed by the GPU and can release all allocations that are referenced by completed commands.

For dynamic descriptors, deallocation happens when command list from the context is closed and executed. It does not matter which thread recorded the list. As long as it has been sent to the command queue for execution (from any thread), all dynamic descriptors are stale and can be discarded. So the context returns all chunks back to the GPU descriptor heap object, which adds them to the release queue. For a deferred context that means that until it is executed, all dynamic descriptors are unavailable for use by other contexts.

This page gives comprehensive description of resource lifetime management in Diligent Engine.

Discussion

In the current implementation, same CPU descriptor heap objects are used to allocate resource view descriptor handles on all threads. We did not notice this to be a problem as descriptor heap allocation/deallocation is very fast unless new CPU descriptor heap needs to be created. This however should not be a problem as the descriptor heap manager size can be specified at the initialization time to furnish the applications demands. The system provides methods to query the maximum size that every heap achieved during the application run time.

Careful reader may have noticed that  GPUDescriptorHeap class uses generic DescriptorHeapAllocationManager to allocate dynamic chunks of equal sizes. The only situation when the chunk size may be different is when the number of requested descriptors is larger than the default chunk size. This however a very untypical situation, so a more efficient fixed-size block allocator may be used instead of the variable-size allocations manager.

Diligent Engine currently supports only single GPU descriptor heap of each type ( CBV_SRV_UAV and SAMPLER). While the first heap can contain large number of descriptor handles (1,000,000+), sampler heap size is limited to 2048 descriptors, which can potentially lead to heap exhaustion. However, in most cases the type of the sampler in the shader is known in advance and never changes. D3D12 introduced a concept of static samplers to handle such cases, which is also exposed by Diligent Engine. Static samplers should be used whenever possible, and the number of static samplers is unlimited. So the sampler descriptor heap will be used only to keep descriptor handles of samplers that change at run-time, which is less typical situation.