RJ_Containers

December 15, 2008

The RJ_Containers project is a reimplementation of the Standard Template Library (STL) container functionality. It has been architected for the needs of game development, but should be applicable to any performance critical or allocation sensitive programming.  

 

Links to stable versions of the library and read access to the live SVN repository can be found at the end of the article.

 

Enhanced Containers

Games need to run fast and they need to run in a limited memory environment. This is especially true for game console development and is the driving influence for most differences between the STL and RJ_Containers library. I will walk through the significant features I have made to create a container library that I use personally and which multiple game studios have built container implementations from.

 

It is also worth noting that Electronic Arts has created a custom container library named EASTL. When I first read about EASTL, it was comforting to see that they had independently developed similar solutions to a similar set of issues. From what I have gathered (having never worked at EA), they have fleshed out a large portion of STL while I have just implemented a subset focusing on containers and their dependencies. EASTL has also placed effort in conforming to the STL naming conventions, while I have adapted the STL functionality, but not the naming. Unfortunately there is no public access to EASTL, but you are free to use, modify, or just reference my library in the development of your own container solutions.

 

Portable Implementation

A core problem with the STL is its lack of a solid implementation definition. The STL functionality is based on features and guidelines. While the standard may explicitly define the return type and parameters of functions, it does not specify in detail how they should be implemented. It can be argued that a beneficial result of this decision is the ability for different vendors to optimize each container to the specifics of their respective platform, but in a field such as game development, code needs to run in a predicatble manner on multiple platforms.

 

Being such complex pieces of software, games require a thorough testing period and by reducing the differences between code on each supported platform, you are also reducing the chances of having a platform specific issues. If one platform decides to allocate memory in a different order, you may end up with fragmentation. If one platform decides to sort elements with a different algorithm, you may end up with different bottlenecks. Fortunately, the internal details of most of the STL remain consistent between STL vendors, but differences pop up here and there.

 

The most common performance variations are the result of a given STL implementation not following the standard’s guidelines. While this is not the fault of the STL itself, it does happen and creates issues when porting code between implementations. List container implementations have been a common issue in this area. The standard specifies that splicing lists should be a fast operation, and getting the size of a list does not have to be a fast operation. In order to create the fastest splice functionality possible, the container cannot track the number of stored elements. If the size were tracked, the elements moved during a splice would need to be counted. Conversely, by not storing the size, asking for the size requires counting all of the elements in the container. While most STL implementations build their lists without tracking the number of elements, some do the reverse, resulting in opposite performance between the size and splice functions. Once a program relies on either function being efficient, porting to another STL implementation may require modifications to maintain an acceptable performance.

 

In addition to the standard not enforcing all algorithmic decisions, the behavior of functions can also be undefined in certain states. For example, popping the front off an empty list has an undefined behavior. It may be safe in one implementation and crash in another. This can once again create porting issues if the code base depends on any undefined functionality.

 

With regards to the data stored within the containers, there is also little guidance. If the STL implementation for one platform uses more memory than on a second platform, it will result in parent structure sizes changing. This can lead to varied memory layouts between platforms.

 

Even memory allocation is not always standardized in the STL. Certain container functions, such as pushing elements onto a full vector, are allowed to perform internal allocations. It is often the case that instead of just resizing itself to handle the incoming element, a vector will reserve extra elements in the hope of limiting future allocations. An issue arises due to no explicit rule for how large of a reserve should be made. In practice, vectors often resize to between 150% and 200% of their current size when overflowing. This difference alone can cause drastic changes to the memory footprint and fragmentation of an application.

 

The obvious solution to preventing different code from creating a different data layout and behavior is to stop using different code. A portable STL implementation can be used for all supported platforms, and I believe this is the best option. Existing portable versions of the library function in multiple compilers and on multiple platforms. Unfortunately (like any heavy C++ code), they may fail to work when moving to a brand new platform. At this point, a decision must be made determining if it is best to abandon the portable library and use the one provided with the new platform (at least until the portable library is updated to support it), or to try and update the portable implementation to handle the new platform.

 

Being a third party library, RJ_Containers, solves the platform independence issue by creating a portable container implementation. While the library hasn’t been as tested to work on as many compilers or platforms as implementations like STLport, the code is built to be easier to read and understand. This makes the task of fixing up the library to work on an odd platform far simpler when necessary.

 

Improved Debugging

While it is very rare to find bugs in the actual STL code, it is not so rare to find that your own bugs have caused crashes in the STL code or are somehow related to poor use of the STL. Debugging these issues will sometimes require searching through an STL callstack or even stepping through STL code. STL is code is notoriously hard to read making the debugging process a bit of a nightmare. Not only is the internal logic of STL very complicated, there are usually no comments and naming conventions often look like they were designed to scare you away (if you can even find a naming convention).

 

The RJ_Containers code is written with descriptive variable names and comments to assist the user with understanding the code being stepped through. 

 

No Dependence on Exceptions

When the standard specifies for error checking, it is performed with exceptions. Due to the performance hit of exceptions, most performance critical software (e.g. games) with have the feature disabled.  This then removes the programmer’s ability to process and error results from the STL. There has also often been little to no debug error checking inside the STL (although this has improved greatly in recent years).

 

The  RJ_Containers library makes use of assertions help catch fatal errors as soon as possible. Non-fatal errors such as allocation failure communicate a failure through the return value rather than an exception. For example, a failure to push onto the end of a vector will return false and a failure to insert into a list will return an iterator to the end of the list. This allows the user to respond to allocation failure without the overhead of enabling exceptions.

 

Reference Based Predicates

When a function requires a predicate, the containers will be passed the predicate by reference rather than by value. The function should also not create any internal copies of the predicate. This will prevent unnecessary copies of complex predicate classes.

 

Direct Access to Vector Element Array

The new vector container provides direct access to pointers for the beginning and end of the internal contiguous element buffer. For an empty vector, the beginning and end pointers will be equal. In the STL vector you need to take the address of the front element to get a pointer to the buffer. This creates an illegal dereference when the vector is empty which can often be an inconvenience to the user. Using the added pointer access to the data for iteration can also assist in compiler optimizations as well as creating faster code in unoptimized debug builds.

 

Containers Are Never Specialized For a Type

The STL is missing a true vector of bool values. When requesting a bool vector, the user is supplied with a specialized vector implementation that uses an array of single bit values rather than the actual bool type. The bool vector is specialized in this manner to conserve memory. This results in the unexpected requirement for generic algorithms to never use the address of a vector’s first element as a pointer to an array of elements.

 

The RJ_Contianer vector implementation is an actual vector of the bool type. There is currently no equivalent container for a vector of bits, although if one is added it will be through a separate class due to there being no actual bit type.

 

Container Construction Must Be Efficient

In STL, unwanted allocations may also occur during the constructor of a container. List containers will often do this to create their end node. This results in every construction and destruction of a list class being a potentially expensive operation even if it the list is never actually used. The user must now treat lists with an added precaution from most other containers, making sure they are not instantiated within loops or recursive functions.

 

The RJ_Containers implementations of containers are not allowed to perform allocations in their constructors. The process of construction and destruction needs to be low if the container is never used.

 

Expensive Memory Allocations Are Not Automatic

The standard does not explicitly specify when a container can allocate memory or how much should be allocated. This results in some implementations allocating memory in unexpected ways which can cause problems when working with a strict memory budget or with a slow allocation process.

 

Vectors can potentially double their capacity when a new element insertion causes an overflow. This incurs a cost of allocating a new section of memory, copying over the existing elements, and freeing the old section of memory. To prevent this, a programmer can pre-reserve the necessary element limit. Unfortunately, if the reserved size is accidentally one element too short, the vector might end up doubling the reserve upon the final insertion. The programmer will have no obvious indication of the potentially large memory waste and performance hit they have created. Note that this is not a memory leak, but just a waste of unused memory and therefore may not be as easy mistake to catch.

 

The RJ_Containers library forces the user to make explicit decisions regarding to when and how much memory should be allocated. Containers using contiguous memory, such as vectors, are no longer allowed to perform allocations behind the scenes as an attempt to manage the reserve. Instead, the user is tasked with resizing the reserved memory as necessary.  If the user tries to push more elements onto the container than were reserved, an assertion is triggered.

 

Detailed Allocation Control

STL allocators are intended to be a stateless class (i.e. they cannot contain data). In practice, most STL implementations allow for some state internal to an allocator, but they certainly don’t encourage it.

 

An allocator can be supplied to a container’s constructor for initialization of the internal allocators created by the container (note that these internal allocators may be rebound to another type). True access to the allocator in use is never given. Containers provide an allocator accessor function, but it returns an allocator by value and for most containers it won’t even return a copy of the allocator actually used.

 

For example, a list can rebind your supplied allocator to for list node allocation. You can never get access to this list node allocator. When you call the allocator accessor, it is actually returning you a copy of an element bound allocator which is never used. An STL list will usually contain two allocator members and gives you access to the unused one.  This means there is no runtime modification or inspection of allocator values after the container constructs. Allocators also need to be very light weight given that they get passed around by value, and are duplicated inside containers.

 

The RJ_Containers implementations give reference based access to the actual allocators used by a container. The container will also never create allocators that are not intended for use. For a list container, the user can access the actual list node allocator and only the list node allocator is the only allocator instantiated by the list.

 

public:
  // Only 'return by value' allocator access is provided and it is only
  // provided for the unused allocator.
  AllocatorType<ElementType> get_allocator() const { return m_elementAllocator; }
 
private:
  // Unused allocator bound to element type
  AllocatorType<ElementType> m_elementAllocator;
 
  // Used allocator bound to list node type
  AllocatorType<NodeType> m_nodeAllocator;

 

public:
  // Full rerence based access is provided to the actual node allocator
  // used by the list.
  AllocatorType<NodeType> &       NodeAllocator()       { return m_nodeAllocator; }
  const AllocatorType<NodeType> & NodeAllocator() const { return m_nodeAllocator; }
 
private:
  // Only the used node allocator is instantiated in the class.
  AllocatorType<NodeType> m_nodeAllocator;

 

Allocators Can Safely Support Byte Alignment

In STL, it is non-trivial to control container allocation alignment. An initial attempt at creating a list that allocates elements with a specific alignment, you might have an allocator only return aligned memory. Unfortunately, the allocator will just end up aligning a list node class rather than aligning the memory of the element itself. To properly align the memory of the element, you need to know the element’s byte offset within the list node class.

 

If a container in the RJ_Containers library allocates an object containing the element (e.g. a list node), the element is always the first member of the parent structure. This guarantees that the element and the parent structure share alignment. Thus, if an allocator always returns data aligned to some interval, the element will be properly aligned regardless of the fact that that is contained within another structure.

 

Additional Functions to Reduce Copy Construction

Many functions in the STL, such as vector::push_back(element), require an element as a parameter. If the user wants the new vector element to just use it’s default constructor, this method can incur the extra cost of constructing a temporary value to pass into push_back as well as running the copy constructor of the element appended to the list. The RJ_Containers library implementations have additional functions for appending or inserting data which do not take an element parameter. The newly created element will use its default constructor in these cases.

 

The RJ_Containers library pair class also supports additional constructors that can allow one of the members to use a default constructor while supplying a value for the other.

 

Containers Are Named Based On Implementation

In STL, containers are named for their use. For example, a map is used to efficiently map a key to a value. The underlying map implementation can be based on any structure deemed useful by the STL implementation. In practice, implementations will common methods (e.g. a map is normally implemented as a red black tree). In the RJ_Containers library, containers are named according to their implementation rather than hiding it. For example, instead of having a map class there is a red black tree map class along with a hash table map class.

 

Reduced Member Function Overloads

Some member functions of STL containers contain numerous overloads. A prime example is the insert function of the STL list. In the RJ_Containers list implementation, the insert function has been split up into Insert, InsertSequence, and InsertRange functions. This is done for a few reasons. First, it reduces the need for complicated template dispatching in order to implement the overloads safely for generic types. Second, it allows for overloaded versions of the insert functions that use default element construction (the new version would otherwise clash with other overloads). Third, I find it makes the resulting user code easier to write and read.

 

Clear Will Actually Clear the Reserve

If an STL container uses a reserved buffer of memory (e.g. the vector), the clear function will not release any reserved memory. It will only destruct the elements. It can viewed the same as a call to vector.erase(vector.begin(),vector.end()). In order to actually clear out a vector’s reserve, the user needs to either destruct it or swap it with a second vector using a matching allocator.

 

To reduce this confusion, calling the clear function on an RJ_Containers vector will actually clear the reserved memory. If the user just wishes to erase all of the elements without clearing the reserve, an EraseAll function has been added.

 

Added Hash Map Container

While hash maps are not part of the standard, a custom implementation is often added due to their wide spread use. While the hash map extension is fairly common, it is still absent from some implementations and may have a slightly modified interface between those that do contain it. This once again promotes using a portable STL implementation.

 

Useful Set of Allocator Implementations

By implementing containers with the new allocator rules, allocators can control allocation alignment and manage local data.  One of the largest benefits is that allocators can be used to create dynamic or static reserves for any container. For example, a list can have a contiguous block of memory preallocated for its nodes. A list that never allocates can also be built by having the node pool instantiated as a member of the class.

 

Standard allocators can be used to allocate from the global heap. This is the default allocator used by all containers. The library also supports custom callbacks for performing global allocations. These can be used to override the standard allocator instead of writing a new allocator that interfaces with your custom memory management system.

 

Pool allocators can be used to allocate elements from a pool of fixed size memory blocks.  This allocator type is useful for containers that make multiple fixed size allocations such as the list. A sized pool allocator can be supplied with a maximum element count to reserve at runtime. The entire element pool is then reserved in a single allocation. A static pool allocator is supplied with a maximum element count at compile time and reserves the memory within the allocator itself. This results in zero allocations ever being made.

 

Solitary allocators are used to allocate only single block of memory at any given time. This allocator type is useful for containers based on contiguous blocks of memory. Because only a single block can ever exist, the allocator does not need to maintain any extra data and is very lightweight. By enforcing only a single allocation at one time, the allocator will also guarantee no additional allocations are unintentionally made through the container (e.g. updating a vector’s reserving multiple times) which could cause memory fragmentation. In a similar fashion to the pool allocators, static and sized versions of the solitary allocator are available.

 

Shared allocators are used to assist in creating multiple containers that reference the same allocator. A shared allocator stores a pointer to an external allocator that all allocations are forwarded to. This allows for a single allocation pool to be used for multiple containers. For example, if a set of elements will be sorted into two lists, the user needs to make sure any reserved pools for each list is able to support up to the total element count. With a shared allocator, memory can be conserved by letting both lists share from a single pool capable of tracking all the elements.

 

Library Information 

Below is you will find a link to the downloads page for the project where you can get a zipped version of the library. There is also a link to the publicly viewable SVN repository for the project. Some simple functionality tests are provided with the library, but make note that the tests are not robust enough to guarantee there are no bugs, and there are no guarantees of the software being flawless. It is for use at your own risk. Please see the "main.cpp" source file for information and examples.

 

If you make use of the source code, feel free to send by any feedback, questions, or requests.

 

Downloads

 

SVN Repository

http://svn.ryanjuckett.com/rj_containers/trunk/

 

Changelog

ver 1.3.1
 - Added key only inserts to the red black tree
 - Fixed a compile error in the red black tree FindUpperBound
 - Changed list, vector and deque element assignments to use constructor directly when
   possible to support explicit construction
 - Hash tables now allow empty element insertion
 
ver 1.3.0
 - Removed STL-like interface from the containers and created new STL
   wrapper classes in the test application that expose the STL-like
   interface. For example, tStlList has a push_back() function in
   comparison to the PushBack() function found in tList. 
 - List splicing functions correctly support a list being spliced into itself
   and assert when invalid range is provided.
 - Renamed GetLowerBoun, GetUpperBound and GetEqualRange to FindLowerBound,
   FindUpperBound and FineEqualRange to indicate that a search is performed.
 - Renamed the Grow function to the more explicit GrowCapactiy.
 - Fixed a list bug where an element's copy constructor was being called
   unnecessarily
 
ver 1.2.1
 - Fixed return type of tVector::GetDataEnd() const
 - Improved hash table performance in debug builds
 - Switched to premake generated solutions
 - Code compiles for mac and linux
 - HashTable and RedBlackTree InsertUnique functions now output false as
   the pair.second in all failure cases making it easier to test for
   insertion succeess
 
ver 1.2.0
 - Container allocation alignments default to 4
 - Updated to VisualStudio 2010
 
ver 1.1.6
 - Fixed bad assert in red black tree
 
ver 1.1.5
 - Fixes for strict compilers
 
ver 1.1.4
 - Compiles warning free on PS3.
 - Added a "Grow" function to tVector for increasing the reserve in a similar
   manner to std::vector::push_back.
 - Added accessors for all functors used by tHashTable and tRedBlackTree.
 
ver 1.1.3
 - Fixed bugs with overloading the global allocation interface.
 
ver 1.1.2
 - Improved element inspection from a debugger for the list and red-black
   tree containers.
 
ver 1.1.1
 - Fixed missing semicolon in the STL naming compatibility function,
   'reserve', in the tVector and tDeque classes. This was not creating
   an error previously due to it never being called and thus not compiled.
 - Fixed a bug in optimized builds of the Find function. There was a missing
   valid case statement where the compiler was told to assume that all valid
   cases were specified.
 - Added functions for testing if pool based heaps and allocators are in a
   full or empty state.
 - Added functions for testing if a given memory address could be a legal
   allocation from pool based heaps and allocators.
 
ver 1.1.0
 - Updated example code and example documentation
 - Used derivation of potential stateless member variables to compress
   container sizes.
 - Vectors allow direct pointer access to data.
 - Containers support allocator controlled alignment.
 - Containers can return failure from allocations rather than just asserting.
 - Split core into separate library reusuable by other projects.
 - Made function and variable naming conventions consistent.
 - Heaps now support element counts up to 32 bits (was 16 bits)
 - Added support for user-defined global allocate, reallocate, and deallocate
   functions.
 - Added support for user-defined assertion function.
 - Simplified allocator definitions by removing the Construct and Destruct
   member functions.
 - Library now build without warning at the highest warning level in Visual
   Studio 2008.
 
ver 1.0.0
 - Initial version using the RJ namespace

 

;