Entity Management

I've punted on writing any sort of entity system for a while now. Like I mentioned very early in the book, there's no strict order in which the engine systems need to be implemented. While having a functional rendering system will make it much easier to visualize the correctness of whatever entity management strategy I decide to use, I could have achieved the same level of validation using logs instead.

I don't necessarily need an "entity system" in my game engine. There are many ways to implement such a system, but it's also completely possible to just use regular old classes to represent my game objects. In fact, many successful games were built in languages that didn't have classes at all! In this chapter, I'd like to evaluate my options, choose one, implement it, and refactor all of the hard-coded features out of the Engine class.

5.1 Choosing an Architecture

I don't want to spend too much time going through all the possible options, because I've already kind of made up my mind, but I feel like it's important that I at least acknowledge the existence of the primary options. Hopefully I can also give sufficiently detailed reasons to justify my final choice.

Object-Orientated Architecture

This style of entity management involves building classes to represent the various types of game objects within the game. You might have a Tree, an Enemy, or a Car which all have very different behavior. Often times, less experienced programmers fall into the trap of over-utilizing inheritance to share commonalities between different types of game objects. Tree, Enemy and Car are all visual elements, so naturally they should all inherit from a base Drawable class, right? Only Enemy and Car are movable objects, so perhaps we introduce a Movable base class for them, which itself inherits from Drawable.

The problem with using inheritance for this problem is easily illustrated with this example: what happens if I want an object that is Movable but not Drawable? Do I create a separate InvisibleMovable base class that contains some duplicate code? Do I split Movable and Drawable into separate base classes?

Splitting them up leads to multiple inheritance, which notoriously suffers from the "diamond problem". Since a lot of game engines utilize a common update() function across all game object types, you might run into this problem quicker than you think!

Of course, experienced programmers will tell you to favor composition over inheritance. This is actually not a bad solution at all, but tends to converge into a common pattern where every game object class has an update() method which just delegates to the update() method of all of its "components", which segways perfectly into the next solution.

Entity-Component Architecture

Perhaps one of the most common architectures in modern games, thanks in no small part to Unity's popularity, is the Entity-Component (EC) architecture. In this pattern, game objects (entities) are containers for an arbitrary number of components. The behavior (and the data associated with that behavior) are defined in each component. Unity's common base class for components is aptly named MonoBehaviour for fairly obvious reasons:

Components are written in C#, which uses the Mono runtime
Behavio(u)r is defined in derived classes of this base class
It uses British English for some reason, even though that's not the case for other types within the engine, such as Color

Since game objects simply contain arbitrary components, it becomes easy to mix and match which components any given object has. If we need to define a new game object which is Movable but not Drawable, then no problem - just add the Movable component but don't add the Drawable component!

Specific types of game objects (like a Tree, Enemy, or Car) aren't defined formally in this architecture, but are instead conceptualized as a sum of their game object's components. All game objects which contain only a Drawable component might be considered a "tree", though that's not strictly the case. Both Enemy and Car only contain Drawable and Movable components, but a "car" is not necessarily an "enemy" (though it could be, if that's the design of your game).

The unique combination of a set of component types is sometimes referred to as an "archetype". Even though a "car" is not an "enemy", they are both members of the { Drawable, Movable } archetype.

CPU Caching

Since all of the members of a given archetype have the same behavior, the game engine might optimize their performance by locating the game objects (and their owned components) contiguously in-memory, in order to take advantage of the CPU's caching capabilities.

CPUs contain multiple caches, but they all serve the same overall purpose: to reduce the number of required memory operations. Whenever the CPU accesses data from memory, it doesn't just grab the requested data, it actually grabs a chunk of data around the requested data and puts it into the CPU cache. If the CPU ends up only working on the piece of data that was specifically requested, then the rest of the cache will just go unused, and the entire cache will be repopulated whenever the next piece of data is requested from memory.

Game engines can take advantage of this behavior by specifically locating components next to each other ("contiguously") in memory, such that iterating over those components (in order to perform a common function on them) reduces the total number of memory accesses required to populate the CPU cache.

Some engineers go so far as to declare their data in terms of a "struct of arrays" rather than an "array of structs". With an "array of structs" (such as an array of game objects, each containing their own components), the total number of memory accesses can be substantial, since there's no clear order to the data which needs to be accessed. On the other hand, a "struct of arrays" (a structure containing an array for each component type) can reduce memory accesses by grouping similar data, and, depending on the type of data in those groups, provide the ability to utilize much faster SIMD (single-instruction-multiple-data) CPU instructions.

Entity-Component-System Architecture

Much like Entity-Component architectures, an Entity-Component-System (ECS) architecture conceptualizes game objects (entities) as the sum of their components. The big difference between EC and ECS is that, in ECS, components only contain data, and "systems" are written to iterate over the data of a common archetype and perform common operations on that data. Because the game objects and components don't actually contain any behavior, many implementations of ECS consider an "entity" to be nothing more than a unique ID that a set of components are associated with.

Because the game objects don't contain any behavior, the systems containing the behavior need a way to query for entities that are relevant to that specific system. This mechanism is incredibly important for the performance of an ECS-based engine.

This architecture, when implemented well, can display some incredible performance improvements over EC architectures (in fact, Unity seems to be developing a new ECS-based architecture), because the systems iterate over common sets of components, which are located contiguously in-memory. When implemented poorly, its performance is by far the worst of the bunch.

Nuance in Software Architecture

While there are plenty more entity management patterns that I could write about, these are definitely the most popular and most flexible that I'm aware of. I try not to be too extreme when it comes to supporting particular programming paradigms these days, and I don't consider this decision to be any different.

All of these architectures have been used across many successful commercial video games. While it can be easy to get sucked into the trends (ECS architectures are definitely trendy these days), you can't ignore the fact that the financial success of the game isn't actually based on which architecture you used to manage your game object's behavior.

In my early programming career (before you could consider it a career, really), I wrote very chaotic code - whatever the least amount of effort to get to a desired result might have been. Freedom sat square in the midst of my terrible codebase full of events and loops that I could barely wrap my head around. I rarely concerned myself with the "right" way to solve a problem.

Once I started my professional career, I quickly became obsessed with doing things the most correct way possible. I fully embraced whatever the latest architectural crazes were, and advocated for their adoption among my peers. I found myself changing my mind a lot, largely due to changes in the software ecosystem.

About the time I was introduced to test-driven development (TDD), I decided to slow down on making sweeping architectural changes, but TDD quickly put my obsession with correctness into overdrive. Having a codebase with 100% test coverage is a weird dopamine hit for a programmer that rivals what social media platforms have managed to engineer. Chasing that sweet sweet test coverage encouraged me to build my classes a certain way, heavily utilizing dependency injection to make sure classes were defined as individually testable entities, which could then be mocked in order to test the higher order classes which utilized those classes as dependencies.

Eventually, Java 8 came out, and everyone was following the "streams" craze. To me, streams were antithetical to the heavily object-oriented code that TDD encouraged you to write - at least if you wanted your "coverage" metrics to be high. Still, I couldn't argue against a functional programming paradigm being a good fit for our backend application. That's when I started to experiment.

Rather than following the trends, I began to combine functional and object-oriented programming paradigms within our application, where each was best suited. Java, being an object-oriented language first and foremost, implemented its streams to utilize interfaces such as Function<T, U> (which receives an input of type T, and returns an output of type U), Predicate<T> (which receives an input of type T, and returns a boolean), and many others. My design was effectively a main method that contained the stream definition, which utilized differing implementations of the stream interfaces. Those implementations were very object-oriented, following the SOLID principles, and utilizing TDD. The engineers were pleased because they were finally able to utilize streams in our backend in a way that made sense, and management was onboard because the test coverage remained above a high threshold.

I'm quite sure I'm not the only person to have combined programming paradigms in that way. My point is that these types of decisions are more nuanced than "this one is always right" or "it has to be one or the other" - I think most people just prefer to pick a side because it feels good to be part of a team.

My Flavor of ECS

Like I said, I had already made up my mind. There is beauty to the simplicity of ECS architectures. Components are trivial to add without screwing anything up, and systems can be easily added to utilize preexisting components. However, there is a substantial amount of up-front work involved in allowing the systems to query for entities that fit a given archetype.

My personal flavor of an ECS architecture is characterized by:

Systems which are object-oriented and utilize dependency injection for access to core subsystems, such as the Renderer class.
The "archetype graph" - a means of enabling systems to quickly query for entities that fit a particular archetype, while ensuring that components for that archetype are stored contiguously in-memory.

My interpretation would probably be repulsive to some people who are more interested in a "pure" ECS implementation, but that's okay. I think it's important to continue using object-oriented programming in places where it's at its strongest, and use a more data-oriented approach elsewhere. With this architecture, I'll have the flexibility to implement just about any game system I can imagine.

As a quick aside, I avoid using the word "system" when talking about ECS unless I specifically mean the "system" abstraction that ECS defines (the "S" in "ECS"). For this reason, I tend to say "ECS architecture" rather than "ECS system", because "Entity-Component-System system" sounds weird. The implementation of the architecture will be a subsystem of the engine, but overloaded terms make everything more difficult to talk about. When researching this type of architecture, you might run into people referring to "an Entity-Component system", which would technically mean the implementation of an EC architecture, but they actually mean ECS. Choose your words carefully.

5.2 The Creative Process and Iteration

This won't be my first time implementing an ECS architecture into a game engine. Unlike the renderer, I don't think I will be able to avoid using templates here, but that remains to be seen. In my previous implementations, I made heavy use of std::type_info when dealing with client code that needs access to specific component types, so I will likely do the same in this implementation.

I'm still not entirely sure how to write about code that I haven't written yet. In section 4.1, in the subsection entitled "There Was an Attempt", I tried to document my movement through the code as I was implementing it. This obviously didn't work out - I can barely follow my own thought process through it. Throughout the remainder of that chapter, I tried to let the code speak for itself, but I was apprehensive to document any code that didn't compile - or worse, code that did compile but didn't function correctly. I don't want someone skimming through the book to find a malfunctioning code snippet, but lacks the context to know that it's not supposed to work.

Truth be told, I'm still unsure that a written book is even a suitable format for this type of development log. While it affords me the flexibility to go off on these tangents filled with doubt and frustration (for which a YouTube audience would likely have little patience), the written format does a poor job at conveying just how iterative the coding process can be. I have an idea of what code I want to write, and maybe an abstract layout of how the disparate pieces should fit together, but the actual process of typing that code into the IDE is filled with rapid iteration, mistakes, refactoring, and shuffling things around until the dust finally settles into something that I'm somewhat happy with. The process beckons memories of cartoons in which a character builds something quickly within a cloud of dust and clamoring noises, only for the final product to be present as the dust disappears.

Coding is a very creative endeavor, much like drawing or writing music. There is only a glimmer of the desired final product in your head when you start, and you just go through the motions until you've gotten reasonably close. Sometimes, the end result is nothing like you thought it would be. In the best cases, the result is even better than you had imagined! In the less desirable cases, you decide to throw away your work and start over entirely. In the worst cases, you throw away your work and give up on the idea altogether. All of these are valid results of the creative process.

How then can I accurately depict this process through text? The goal of the book, after all, was to follow my thought process in detail to explain why I implemented things the way I did. I think the best way I can go about it is to go through the iterative creative process in chunks, without writing any content for the book, and then subsequently describe the code I've written along with some of the other options I toyed around with in the process. I'm learning as I go here, so give me some slack.

5.3 Defining the Abstraction

I don't use the word "architecture" lightly in software design. Object-oriented abstractions can be awesome because it allows you to swap out the implementation details without modifying the interface that the rest of the code utilizes. Declaring what type of architecture your application uses has a much more wide-spread effect on the shape of the code.

An ECS architecture is no different. The game object entities will be managed by some "entity manager" (brilliant name, I know). The game's behavior will be made up of a series of systems, which each query the entity manager for relevant entities. Depending on the implementation, the entity might be further queryable for other arbitrary components.

It actually somewhat pains me to admit that the structure of the application will become so rigid. My first instinct is to build an abstraction that is still flexible so that I can continue to swap things out as I see fit. There might be some middle ground, so I'll see what I can whip up.

In my ECS implementation, an entity's components will be stored alongside the components of other entities which are part of the same archetype. I'll obviously need to be able to add and remove components from an entity, which will involve moving it between different archetypes. I'll also need to be able to access the components for that entity. I'll start things off by defining an Entity interface.

linguine/include/entity/Entity.h

#pragma once

#include <typeinfo>

namespace linguine {

class Entity {
  public:
    virtual ~Entity() = default;

    virtual uint64_t getId() = 0;

    template<typename T>
    [[nodiscard]] inline bool has() const {
      return has(typeid(T));
    }

    template<typename T, typename... Args>
    inline T* add(Args&&... args) {
      return new(static_cast<T*>(add(typeid(T), sizeof(T)))) T(std::forward<Args>(args)...);
    }

    template<typename T>
    inline void remove() {
      return remove(typeid(T));
    }

    template<typename T>
    inline T* get() const {
      return static_cast<T*>(get(typeid(T)));
    }

  private:
    [[nodiscard]] virtual bool has(const std::type_info& typeInfo) const = 0;

    virtual void* add(const std::type_info& typeInfo, size_t size) = 0;

    virtual void remove(const std::type_info& typeInfo) = 0;

    [[nodiscard]] virtual void* get(const std::type_info& typeInfo) const = 0;
};

}  // namespace linguine

Most of the Entity is fairly self-explanatory. I'm making heavy use of the typeid() operator, which allows me to quickly translate the template functions into suitable runtime data. The Entity is also responsible for casting arbitrary void pointers into a pointer to the actual component type. The most complicated thing here is the use of the "placement new" operator, which basically allows you to invoke a constructor on already-allocated memory. Since it's the job of the archetype to allocate the memory, this is one of the few situations where I'd say it's appropriate. This complicated method allows us to properly initialize memory of components using constructors and default values, and enables the passing of constructor arguments:

entity.add<CameraComponent>(position, rotation);

I've also created an EntityManager that just lets us create new entities. Eventually, it will also be how we query entities containing specific components. I'm intentionally punting on how querying will work until I have a better idea of how my data will be organized.

linguine/include/entity/EntityManager.h

#pragma once

#include <memory>

#include "Entity.h"

namespace linguine {

class EntityManager {
  public:
    virtual ~EntityManager() = default;

    virtual std::shared_ptr<Entity> create() = 0;
};

}  // namespace linguine

I'm utilizing shared pointers for entities because C++ doesn't support polymorphic stack variables. Ideally I'd be able to lean into the speed of stack variables, but in order to define the interfaces separately from the implementation, I have to use pointers. For now, it's a design choice that affords me the flexibility to switch out implementations as needed. That doesn't mean I can't refactor things later into non-polymorphic types. Who knows, I might end up using a separate allocator altogether, so this wouldn't be a problem at all.

5.4 Implementing the Interfaces

The ArchetypeEntity class is as dumb as it can possibly be. It is designed to be nothing more than a thin proxy into the ArchetypeEntityManager. To construct an ArchetypeEntity, all you need is a reference to the ArchetypeEntityManager and the entity's ID.

ArchetypeEntity(ArchetypeEntityManager& entityManager, uint64_t id)
    : _entityManager(entityManager), _id(id) {}

All of the other methods just forward to an equivalent method within the ArchetypeEntityManager itself, which allows the manager to be much more opinionated over how the entities are represented.

The Archetype Graph

I've spent the last several years believing that I was the first person to come up with this solution, and I was actually pretty excited to write about it. I decided to Google it, and come to find out Sander Mertens - the author of the popular ECS library Flecs - wrote about it back in March 2020, which would have been about two months after I wrote my initial implementation. Flecs is way more fully featured than anything I would have ever been able to put together, apparently pre-dates my work by about a year, and is still under active development. It's an amazing piece of work, and unless you're a masochist like myself who likes to do everything the hard way, I'd highly recommend it.

When I set out to build my first ECS implementation, I didn't really know what I was getting myself into. Storing similar components contiguously in-memory seemed like it was easy enough to accomplish, but I had failed to consider the other side of the problem: how do you query every entity for access to those components?

As I began my quest for optimizing query times, the first solution seemed obvious: don't check every single entity for the existence of the desired components. Instead, sort the entities into buckets of common components, and then query the buckets instead. In this way, ECS implementations are inspired by relational database theory, in which you "select rows from tables". The trouble is knowing which tables to select from!

At the time, I was fascinated by the early development of Unity's ECS implementation and decided to try to replicate it in a way that didn't feel like a bolt-on solution. I borrowed the term "archetype" from Unity to describe the concept which I was previously referring to as a "bucket".

My next optimization was to organize the archetypes into "tiers", where tier 0 contained the "root" archetype (which stored 0 components), and tier N would store N components. Each tier would contain many archetypes representing every possible combination of N components. If my query requested entities which contained 4 components, I could skip checking tiers 0-3 entirely.

I drew out of few possible algorithms on a small whiteboard I had at home. I landed on the concept of storing the archetypes in directed graph, where each node represents an archetype supporting a set of components. Each node would have children (each representing a superset of the current node's components), as well as parents (each representing a subset of the current node's components). The use of the terms "parent" and "child" make this sound like a tree, but by definition, a node in a tree can only have one parent.

In order to find all archetypes supporting a given set of components, you would:

Start at the root archetype node
For each component:
Traverse to the child node whose superset contains that component

At the end of the traversal, assuming the graph is comprehensive, the resulting node and all of its children (recursively) would contain the requested set of components.

Take a Big Byte

The graph provided insanely good query times. Traversing the graph jumped around memory locations by the same number as the number of components you requested. Most of the time, a system only requests entities containing one or two components, so it was never a big deal.

The big downside was maintaining the comprehensive graph. Every time a new component was introduced into the EntityManager, it proactively added new archetype nodes for every possible combination of components which would include the new component, and thus had to the traverse the entire graph to create edges to and from existing nodes.

I justified this at the time by telling myself I could "prime" the graph behind a loading screen by adding a bunch of components to a single temporary entity, and just destroy that entity before removing the loading screen. This isn't "wrong" by any means, it just fully disregarded the amount of memory usage that would result from such a vast data structure.

I hate combinatorics as much as the next guy, but let's just do the math really quick. For a game that has 50 possible components, the set of archetypes which each contain 25 of those components would be... 126,410,606,437,752 - that's 126.4 trillion! If each of those only took up 1 byte of memory (unlikely!), it would require ~114.97 petabytes. That's just the 25th "tier" of archetypes - the first and last tiers would contain one archetype; the second and forty-ninth tiers would contain 50; the third and forty-eighth tiers would contain 1,225.

I think you get the point. Representing a graph of all possible component combinations is entirely un-scalable. The iPhone certainly doesn't have 115 petabytes of memory - at least not this year's model.

A Different Approach

This time, I'll keep creating edges for archetypes that actually need to traverse between one another, but I will not be attempting to construct a comprehensive graph of all possible component combinations.

I can keep a std::unordered_map<std::set<std::type_index>, Archetype> so that I have an alternative way of finding archetypes that aren't yet connected, and create those edges for future use. If an archetype doesn't exist in the map at all, then it simply doesn't exist yet.

Because the graph will only contain nodes for archetypes that have actually been represented by real entities (and not all possible combinations of all components), it should be reasonably cheap to traverse the graph entirely to query for relevant archetypes. For a game that has 50 components, the minimum number of archetypes in the graph would be 51 (we'll still have a root archetype). The maximum number purely depends on the different combinations of components that entities wind up using. Theoretically, we could still end up using 115 petabytes, but in practice we will never come close.

This solution will perform queries more slowly, but the memory savings will be significant, to say the least.

Component Storage

Even though cache line efficiency is one of the big selling points for adopting an ECS architecture, I've completely glossed over the complexities of managing components in-memory. I wouldn't say it's anything special, but it's certainly not a very "type-safe" solution.

Each Archetype in the graph contains its own instance of a Store. The Store is responsible for allocating chunks of memory for that particular Archetype, using the knowledge of which components are expected to live within that memory. The memory is allocated using good old malloc(), which reminds me - I forgot to free() the memory in the destructor! This is exactly why I don't use malloc() very often.

Memory doesn't actually get allocated for empty Archetypes (which should never happen with our more efficient graph representation), or if the Archetype doesn't actually contain any components (which is the case for the root). Just like a dynamic array, the store will grow in size when it fills up. In my implementation, the first time memory is needed, 16KiB is allocated. Whenever there is not enough memory to add another entity, a new chunk is allocated with double the current size, and all contents are copied to the new chunk prior to freeing the old one.

I've chosen to store the components using an "array of structs" (AoS) style. In this strategy, if an archetype supports components A, B, and C, then the layout of the memory looks something like A1 B1 C1 A2 B2 C2 A3 B3 C3, where the numbers indicate association with a unique entity. This storage strategy is most effective if you intend to access multiple components from the same entity before moving onto the next entity.

In previous projects, I used an SoA component to signal to the system that I wanted the components to be stored using a "struct of arrays" strategy instead. With this strategy, components are stored together by type rather than the entity they belong to. The counter of the previous example would be A1 A2 A3 B1 B2 B3 C1 C2 C3. This strategy makes the most of cache line efficiency when a system needs to iterate over one component at a time across a lot of entities. I'm not going to bother implementing this into Linguine - I can always add it later if I find a specific case that would benefit from it.

Whenever an entity is removed from an archetype, there are two options:

Move the components of the last entity within the archetype to fill the gap left in the chunk.
Let the gap remain vacant, but store a queue of vacant offsets so that incoming entities can fill it later.

The first option is technically better for cache efficiency, since gaps will be immediately remedied. However, I've implemented the second option because it's cheaper to store an offset than it is to move component memory around. I might change my mind later if I start noticing significant memory fragmentation. It's also possible that I could occasionally de-fragment memory between frames, while utilizing the queue-based approach within the context of a single frame.

The biggest thing I dislike about my current implementation is that I'm storing the offsets and size of each type within a TypeMetadata struct, managed inside of a std::unordered_map<std::type_index, TypeMetadata>, which needs to be queried every time a component is requested. I entertained using a std::map instead, but profiling showed std::unordered_map to be a clear winner in this case. Ideally I can get rid of the lookup altogether, but I would end up with some crazy usage of compile-time templates to accomplish that - it would be templates all the way down!

Querying for Components

Implementing such a complex component storage strategy doesn't really buy us anything if we have no way to discover entities that contain specific components. The whole point of an archetype-based ECS implementation is to iterate over a subset of the total number of entities, selected based on relevance to the system that requires them.

Somewhat counterintuitively, the time complexity for updating entities using our archetype-based approach (O(A * N), where A is the number of archetypes and N is the number of entities) is actually worse than the time complexity of just iterating over all of the entities blindly (simply O(N)). The reason is simple: time complexity is a measure of the worst case. The worst case for the archetype-based approach is when the subset of entities being selected is actually the set of all entities. In this case, we would first have to traverse the archetype graph (whose breadth and depth are based on the number of possible components and the order in which they were added to the entities) to find which entities are relevant to the query (in this case, all of them). In both algorithms, we would still end up iterating over all entities, but the archetype-based solution would have the added overhead of traversing the archetype graph.

Time complexities are a fantastic tool to quickly identify and compare performance characteristics of algorithms, but they rarely convey the nuances of CPU cache optimizations. As an example, accessing an array at a specific index is an O(1) operation. It is easy to determine where in-memory that particular index lives as long as you know where the beginning of the array is (and how big each element is). Similarly, accessing an element from a hash table is also an O(1) operation. You simply figure out where in-memory the element is located based on the result of a hashing function of a known key. However, if we were to access an element in the array at index 5, and then immediately access another element at index 6, then we would expect the new element to already be in the current CPU cache line, so the sheer amount of time to access that element from the perspective of our code would be very fast. In a hash table, we could do the same exercise using keys "5" and "6", but there is no guarantee that the elements are anywhere near each other in-memory, reducing the likelihood of the elements existing withing the same cache line, thus requiring additional memory load operations. Furthermore, we also have to perform the actual hashing algorithm of the keys - even cheap hashing algorithms represent an overhead that does not exist in direct array accesses. The number of steps in both of these examples does not change based on the number of elements within the data structure, which makes them both O(1) (that's the best possible time complexity!). However, having a better understanding of the algorithms, as well as the hardware they are running on, makes it clear that one is preferable over the other.

I'm not saying that time complexities are irrelevant - I'm not one to make absolute claims like that. Entry-level job interviews tend to make a big deal out of them because they are important for quickly judging relative performance, especially among high-level constructs. I use them often at work during technical discussions. The reality is that cache line efficiency just isn't a big concern when you can just throw money at the problem (just run bigger servers). However, when your code is running on a machine that you don't have control over, such as a user's web browser or phone, these types of optimizations become more important. Even still, web sites usually don't have to squeeze out every last drop of performance to function properly. Video games, on the other hand, must maintain high frame rates, and it is imperative to optimize the performance of these low level systems so that the actual gameplay code has more room to breathe.

In order to get a view into the entities any given system cares about, I'll need to create a new method on the EntityManager which returns some Result. I'll also need to create an interface for the Result, which can be implemented differently for each entity management system. I'll start with something like this:

linguine/include/entity/EntityManager.h

#pragma once

#include <memory>
#include <set>
#include <typeindex>

#include "Entity.h"
#include "Result.h"

namespace linguine {

class EntityManager {
  public:
    virtual ~EntityManager() = default;

    virtual std::shared_ptr<Entity> create() = 0;

    template<typename... T>
    inline std::shared_ptr<Result> find() {
      return find({ typeid(T)... });
    }

  private:
    virtual std::shared_ptr<Result> find(std::set<std::type_index> types) = 0;
};

}  // namespace linguine

linguine/include/entity/Result.h

#pragma once

#include <functional>
#include <vector>

#include "Entity.h"

namespace linguine {

class Result {
  public:
    virtual ~Result() = default;

    virtual void each(const std::function<void(Entity&)>& function) const = 0;

    [[nodiscard]] virtual std::vector<std::shared_ptr<Entity>> get() const = 0;
};

}  // namespace linguine

Constructing a std::set<std::type_index> for every query is not ideal, but allows me to ignore forwarding template parameters all the way down the call stack. If this runtime type mechanism proves to be too slow for my target devices, I can optimize it later to use compile-time template parameters. The trade offs of doing compile-time type parameters compared to runtime type checking can be a bit tricky to evaluate:

Compile-time type parameters will increase the size of your compiled binary, depending on how many different parameter types are used throughout your code. Essentially, the compiler will generate a separate function for every set of types that you pass into that function - hence the name "template".
Runtime type checking can be expensive and memory-intensive. Without the hard-wired function generation, it's up to the programmer to store the runtime types in some data structure, which will also have to be queried.
In my opinion, the code to achieve runtime type checking is much easier to reason about, and therefore easier to maintain. Utilizing compile-time templates requires that you think more about how your code gets compiled in addition to how it runs, and that gets exhausting.

The implementation of the find(std::set<std::type_index>) method within the ArchetypeEntityManager executes an internal Query on the archetype graph to find suitable archetypes containing the set of requested types. The Query itself is just a depth-first search on the archetype graph. Because it's a graph rather than a tree, it's imperative to pass along the set of visited nodes so that children that have multiple parents don't get evaluated multiple times. There's something really cool about how the archetype graph is structured: if any given node matches the set of requested types, then all of its children are implicitly also a match, and all of those children's children, recursively! Once I find a match, I can add all of the children to the resulting archetypes, as well as put them in the set of visited nodes so that I don't have to check their types against the input types. A breadth-first search might actually result in better performance, but requires an internal queue data structure to implement. Again, if the search proves to be a bottleneck, I'll revisit its implementation later.

After the archetype query executes, the find method returns an ArchetypeResult using the resulting set of archetypes.

linguine/src/entity/archetype/ArchetypeResult.cpp

#include "ArchetypeResult.h"

#include "ArchetypeEntity.h"

namespace linguine::archetype {

void ArchetypeResult::each(const std::function<void(Entity&)>& function) const {
  for (const auto* archetype : _archetypes) {
    archetype->each([this, function](uint64_t id) {
      auto entity = ArchetypeEntity(_entityManager, id);
      function(entity);
    });
  }
}

std::vector<std::shared_ptr<Entity>> ArchetypeResult::get() const {
  auto results = std::vector<std::shared_ptr<Entity>>();

  for (const auto* archetype : _archetypes) {
    archetype->each([this, &results](uint64_t id) {
      results.push_back(std::make_shared<ArchetypeEntity>(_entityManager, id));
    });
  }

  return results;
}

}  // namespace linguine::archetype

Performance Profiling

Actually, this is the second implementation of the ArchetypeResult. In the first iteration, the Archetype returned a std::vector<uint64_t> containing all of its entity IDs, in the same order as the Store's indices. Due to the way the Store releases chunks of memory, this can leave gaps in the vector of IDs that would have to be skipped by the iterator in the ArchetypeResult. The number of gaps is significant for archetypes which are used as a compositional "stepping stone" - that is, archetypes that exist because they represent a "partial" state of an entity that hasn't received all of its components yet. Rather than maintaining this vector and returning a reference to it, I instead implemented an each method directly on the Archetype class, which can iterate over its map of entity-IDs-to-Store-indices. In my profiling test case, this resulted in a ~55% performance boost - from ~180 FPS to ~280.

My profiling test case is not "average" by any means - it is written to stress the performance of archetype queries and entity iteration. I'm currently composing 10,000 entities, each with a QuadRenderFeature containing a randomized color, and a Transform component containing a randomized position. A random 50% of these entities contain a Rising component, containing a randomized speed, and the other 50% contain a Rotating component, also containing a randomized speed.

My Engine's update() method has been updated to call a handful of "systems" (the "S" in "ECS"), which are currently just private methods in the Engine itself. The riserSystem() and rotatorSystem() utilize the Transform component to adjust the position or rotation of the entity. The quadTransformationSystem() is responsible for converting the position and rotation of the Transform component into the model matrix appropriate for the QuadFeature. Similarly, the cameraSystem() is responsible for constructing the camera's model matrix, converting it into the view matrix, additionally constructing the projection matrix, and finally combining them into the viewProjectionMatrix used by the renderer.

linguine/src/Engine.cpp constructor

// Camera
auto cameraEntity = _entityManager->create();

auto cameraTransform = cameraEntity->add<Transform>();
cameraTransform->position = glm::vec3(0.0f, 0.0f, 0.0f);

auto hasCamera = cameraEntity->add<HasCamera>();
hasCamera->camera = _renderer->getCamera();

auto random = std::random_device();
auto xDist = std::uniform_real_distribution(-2.55f, 2.5f);
auto yDist = std::uniform_real_distribution(-5.0f, 5.0f);
auto zDist = std::uniform_real_distribution(0.0f, 10.0f);
auto normalDist = std::uniform_real_distribution(0.0f, 1.0f);
auto rotationDist = std::uniform_real_distribution(-1.0f, 1.0f);

auto componentDist = std::uniform_int_distribution(0, 1);

for (int i = 0; i < 10'000; ++i) {
  auto entity = _entityManager->create();

  auto transform = entity->add<Transform>();
  transform->position = glm::vec3(xDist(random), yDist(random), zDist(random));

  auto quad = entity->add<Quad>();
  quad->feature = std::make_shared<QuadFeature>();
  quad->feature->color = glm::vec3(normalDist(random), normalDist(random), normalDist(random));

  auto drawable = entity->add<Drawable>();
  drawable->renderable = _renderer->create(quad->feature);

  if (componentDist(random) > 0) {
    auto rising = entity->add<Rising>();
    rising->speed = normalDist(random);
  } else {
    auto rotating = entity->add<Rotating>();
    rotating->speed = rotationDist(random);
  }
}

linguine/src/Engine.cpp "system" methods

void Engine::riserSystem(float deltaTime) {
  _entityManager->find<Rising, Transform>()->each([deltaTime](const Entity& entity) {
    const auto rising = entity.get<Rising>();
    const auto transform = entity.get<Transform>();

    transform->position += glm::vec3(0.0f, 1.0f, 0.0f) * rising->speed * deltaTime;
  });
}

void Engine::rotatorSystem(float deltaTime) {
  _entityManager->find<Rotating, Transform>()->each([deltaTime](const Entity& entity) {
    const auto rotating = entity.get<Rotating>();
    const auto transform = entity.get<Transform>();

    transform->rotation *= glm::angleAxis(glm::two_pi<float>() * rotating->speed * deltaTime, glm::vec3(0.0f, 0.0f, 1.0f));
  });
}

void Engine::quadTransformationSystem() {
  _entityManager->find<Transform, Quad>()->each([](const Entity& entity) {
    const auto transform = entity.get<Transform>();
    const auto quad = entity.get<Quad>();

    quad->feature->modelMatrix = glm::translate(glm::mat4(1.0f), transform->position)
                                 * glm::mat4_cast(transform->rotation);
  });
}

void Engine::cameraSystem() {
  const auto height = 10.0f;
  const auto viewport = _renderer->getViewport();

  _entityManager->find<Transform, HasCamera>()->each([height, viewport](const Entity& entity) {
    const auto transform = entity.get<Transform>();
    const auto hasCamera = entity.get<HasCamera>();
    const auto camera = hasCamera->camera;

    auto cameraModelMatrix = glm::translate(glm::mat4(1.0f), transform->position)
                             * glm::mat4_cast(transform->rotation);

    camera->viewMatrix = glm::inverse(cameraModelMatrix);
    camera->projectionMatrix = glm::ortho(
        -height / 2.0f * viewport->getAspectRatio(),
        height / 2.0f * viewport->getAspectRatio(),
        -height / 2.0f,
        height / 2.0f,
        0.0f,
        10.0f
    );

    camera->viewProjectionMatrix = camera->projectionMatrix * camera->viewMatrix;
  });
}

Luckily, the queries against the archetype graph make up a measly 0.07% of the total runtime samples. The quads are not utilizing any sort of "instanced" rendering, so issuing their draw calls is substantial, making up for a chunky 34.53% of the runtime cost. However, the renderer is not the primary offender: calls to get a specific component from an entity make up for 51.72% of the runtime cost!

The reason for this is clear to me. The ArchetypeEntity's get() method has to go through a relatively deep call chain before returning a pointer to the correct component:

ArchetypeEntity.get(std::type_info) calls ArchetypeEntityManager.get(uint64_t, std::type_info) using its own ID
ArchetypeEntityManager.get(uint64_t, std::type_info) first retrieves the entity's current archetype from a vector containing the archetypes for all entities before forwarding the call to Archetype.getComponent(uint64_t, std::type_info)
Archetype.getComponent(uint64_t, std::type_info) must look up the entity's ID in a map to determine which index to look up in the Store before calling Store.get(size_t, std::type_info).
Store.get(size_t, std::type_info) must look up the offset for the requested type within yet another map in order to return the correct component's pointer

All of these steps happen for every single entity, often multiple times per entity (for example, the riserSystem() has to get the Rising and Transform components from each entity). There is a lot of redundant work happening in this flow: the ArchetypeResult already knows exactly which archetype a given entity ID belongs to. So why do we have to perform the redundant work?

Well, what happens if we're iterating over the entities, and we decide to remove a component from a specific entity? Its archetype changes, which means all of its components are moved to the new archetype, which means all of its existing component pointers are invalidated. Making the ArchetypeEntity go through the redundant steps means it doesn't matter if an entity's archetype changes, its get(std::type_info) method will always return a pointer to the correct location!

Unfortunately there is still one situation that I haven't accounted for: using a pointer to a component after the entity has changed archetypes.

auto entity = _entityManager->create();
auto a = entity->add<A>();
entity->add<B>();
a->value = 5; // Doesn't work!

I can easily verify this by setting my camera's position either before or after I've added another component:

auto cameraTransform = cameraEntity->add<Transform>();
// cameraTransform->position = glm::vec3(5.0f, 0.0f, 0.0f); // Works here

auto hasCamera = cameraEntity->add<HasCamera>();
hasCamera->camera = _renderer->getCamera();

cameraTransform->position = glm::vec3(5.0f, 0.0f, 0.0f); // Does not work here

This can actually be more dangerous than my intended updates not taking effect - it's completely possible for another entity to be assigned the chunk of memory that used to belong to the entity that I'm updating. I'd be updating data for the wrong entity entirely!

To resolve this, rather than returning the raw pointer to the underlying component, I would need to return a wrapper object that contains the intended entity ID and component type, so that every time it is accessed, it points to the correct location. Doing so would slow things down even further, since the above flow wouldn't just happen when getting a component, it would also happen when accessing any field of that component!

Unfortunately I don't think I'll be fixing any performance issues without going deep into the compile-time templating rabbit hole, which would require some major changes to an already "working" system. Let's go ahead and implement the fix for the incorrect memory access and evaluate where we are.

Currently I have an Entity interface which needs to be implemented separately by every EntityManager implementation. Therefore, I have an ArchetypeEntity that goes along with my ArchetypeEntityManager. The Entity class itself contains some template methods, which convert the template parameters to std::type_info objects, and forward those to the pure-virtual versions of the methods, left to be implemented however works best for the underlying EntityManager.

There are two Entity methods that I'm concerned about: add<T>() and get<T>(), which currently return pointers to the underlying component storage. If the components move around in-memory, then the pointers are no longer valid. I'll change these methods to return a Component<T>, which will be constructed with a reference to the EntityManager, the entity's ID, and the std::type_info relevant for that component.

The trick here is to overload the -> operator so that you can use this wrapper object exactly as you would a real pointer. The big difference is that using the operator will always route through the EntityManager, so you know that it will always point to the correct memory location.

There are, of course, a couple of edge cases. If you were to remove a component that you already have a Component<T> for, then the -> operator will return a nullptr. Interestingly, if you were to add the same type of component back to the entity, then the previous Component<T> object would start working again.

The ArchetypeEntity class was already doing this type of indirection to make sure any method results were always up-to-date. Since the new Component<T> class is performing this same type of indirection, while not being specific to the ArchetypeEntityManager's implementation, I've decided to just get rid of the ArchetypeEntity, and just add this indirection directly to the Entity class.

linguine/include/entity/Entity.h `get<T>()`

template<typename T>
inline Component<T> get() const {
  return Component<T>(_entityManager, _id, typeid(T));
}

linguine/include/entity/Component.h `operator->()`

inline T* operator->() const {
  return static_cast<T*>(_entityManager.get(_entityId, _typeInfo));
}

Re-running my camera position test case now appears to work correctly! The downside is that my framerate is down from ~280 FPS to ~220, over 20% worse than before. The price of safety can be substantial.

Looking back at my profiling results, a very significant portion of time being spent within the ArchetypeEntityManager's get() method is on some call to strcmp(). While I'm not directly calling strcmp() myself, I do have a chunk of code that is seemingly harmless that is responsible:

if (!archetype.has(typeInfo)) {
  return nullptr;
}

The Archetype class contains a std::set<std::type_index>, and the has() method checks for a type within that set. The documentation for std::set says that they are commonly implemented as red-black trees - if you've ever been in an interview where the topic at hand was implementing a red-black tree from scratch, I feel for you. In any case, whichever implementation I happen to be utilizing appears to be using strcmp() for its node comparisons, in order to search for the existence of a type within the set.

Interestingly, I can just remove the call to has() altogether, and instead just have the Store return a nullptr if it doesn't support the requested type. The result is exactly the same, and the Store is already doing a lookup in its std::unordered_map anyway. Removing that conditional increased the framerate from ~220 FPS TO ~340, another ~55% gain!

This hot path also contains a query against a std::map of entity IDs, mapped to the indices within the Store. I wonder what will happen if I replace it with a std::unordered_map instead. Doing so bought us another ~17%, with frame rates exceeding 400 FPS!

At this point, the clear bottleneck is the Store class querying its internal std::unordered_map in order to fetch the offset for the requested component. Beyond re-writing the entire component storage mechanism to utilize compile-time templates (perhaps utilizing std::tuple), any further optimizations here are unknown to me.

The number of draw calls has remained constant throughout this tuning process. Even though frame rates will fluctuate depending on the device, the relative runtime cost of the systems should be very similar. Recall that we started with the renderer making up 34.53% of the runtime cost. After we reduced the runtime cost of the entity queries and component fetching, the renderer now makes up for 52.46%. The two percentages are relative to the performance of the rest of the engine, but represent nearly identical amounts of time per frame.

Something I want to emphasize is that the actual game will almost certainly not require 10,000 entities to be created, queried, iterated over, and drawn. The test is intentionally stressful in order to discover its current limitations. My laptop is actually capable of performing this test using 100,000 entities (another order of magnitude) at a respectable 30 FPS, though I admit that would likely not be feasible on a phone.

5.5 Putting the "S" in "ECS"

So far, my "systems" are just private methods in my Engine class that get invoked by the update() method. This totally works for a game that only has one scene. A scene can be thought of as a particular screen in the game, which does not necessarily share entities or systems with the other screens. Some engines call this abstraction a "world", but the concept is the same. The level-selection screen doesn't need to share entities with the actual level, nor should it be able to query any entities from the level.

In the context of this engine, a scene can be a class which contains its own entity manager along with a vector of systems. This new class will contain its own update() and fixedUpdate() methods, which simply iterates over all of its systems and calls the corresponding method for each of them. The Engine will invoke the update() and fixedUpdate() methods for the "current" scene.

Ideally I can add support for dynamically switching between scenes within a system dedicated to doing so. Since I haven't yet fleshed out the InputManager to handle inputs such as key presses or clicks, creating such a system wouldn't be very functional at the moment, so I'll just add that functionality later.

A thought occurs: if I did want to go ahead and detect inputs from within a system, then the Scene would have to pass an instance of the Engine's InputManager to the system. The easy way to do that would be for the Engine to pass the InputManager to the Scene, which can then be forwarded to any systems that require it - but what if I were to implement some sort of on-demand scene loading system into the engine? How would I construct the scene objects if I don't know which parameters they will require?

Dependency Injection and Service Locators

I previously stated that my preferred flavor of ECS uses dependency injection to pass engine-level services into the systems. The systems themselves are not responsible for finding the services that they require, they just declare what they need as part of their constructors.

This is a beautiful solution, in theory, because it both reduces the responsibilities of each system, and makes the system highly testable (which is often completely neglected in game programming). How then do the systems receive instances of their declared dependencies?

There are a ton of dependency injection frameworks available for just about every programming language imaginable. Some frameworks are more explicit than others, requiring that you return an instance of a given class using a "provider" of some sort, so that any class that depends on that class will eventually receive the result of your provider. Others are more magical, requiring very little interaction from the developer beyond adding constructors which declare dependencies.

However, it's not impossible to implement dependency injection completely manually. It becomes up to you, as the developer, to construct an instance of each possible dependency, and pass them into the constructors of other classes, which each become a possible dependency for other classes. This is called a "dependency graph", and it's exactly what these frameworks are doing for you under the covers.

Dependency injection frameworks are really just a way of implementing inversion of control, allowing developers to focus on the behavior of their code rather than "how it works". I would argue that it's a great fit for video game systems, which are intended precisely to focus on a specific behavior.

Dependency injection is not the only means for a class to gain access to dependencies. Unity heavily utilizes the service locator pattern, which allows components to simply find an instance of whichever class they want. These locator functions can be expensive, and it's not uncommon to see people calling them every single frame rather than caching the results of the first call.

Our Entity's get<T>() method is a form of the service locator patter, analogous to (and heavily inspired by) Unity's GetComponent<T>(). The big difference is that Unity's components contain behavior, which makes each component an actual "service", whereas our components are pure data. In this way, you can think of our get<T>() method as being more of a "data locator", which is a fancy way of saying "a database query".

My proposal is simple: each Scene will utilize the service locator pattern to find the services that its systems require. Each System, on the other hand, will just contain a constructor declaring which services it needs to function correctly.

Extracting the Systems

Something I'd like to do is finally clean up the mess I've made in the Engine class by getting rid of all of its system methods. A System is a very simple abstraction, as I described before.

linguine/include/System.h

#pragma once

#include "entity/Entity.h"
#include "entity/EntityManager.h"
#include "entity/Result.h"

namespace linguine {

class System {
  public:
    explicit System(EntityManager& entityManager)
        : _entityManager(entityManager) {}

    virtual ~System() = default;

    virtual void update(float deltaTime) = 0;

    virtual void fixedUpdate(float fixedDeltaTime) = 0;

  protected:
    inline std::shared_ptr<Entity> createEntity() {
      return _entityManager.create();
    }

    template<typename... Types>
    inline std::shared_ptr<Result> findEntities() {
      return _entityManager.find<Types...>();
    }

  private:
    EntityManager& _entityManager;
};

}  // namespace linguine

With this simple interface, we can now extract the riserSystem() method from the Engine into its own RiserSystem class!

linguine/src/systems/RiserSystem.h

#pragma once

#include "System.h"

namespace linguine {

class RiserSystem : public System {
public:
  explicit RiserSystem(EntityManager& entityManager)
      : System(entityManager) {}

  void update(float deltaTime) override;

  void fixedUpdate(float fixedDeltaTime) override {}
};

}  // namespace linguine

linguine/src/systems/RiserSystem.cpp

#include "RiserSystem.h"

#include "components/Rising.h"
#include "components/Transform.h"

namespace linguine {

void RiserSystem::update(float deltaTime) {
  findEntities<Rising, Transform>()->each([deltaTime](const Entity& entity) {
    const auto rising = entity.get<Rising>();
    const auto transform = entity.get<Transform>();

    transform->position += glm::vec3(0.0f, 1.0f, 0.0f) * rising->speed * deltaTime;
  });
}

}  // namespace linguine

The same can easily be done for the RotatorSystem and QuadTransformationSystem. The CameraSystem is the only system that currently has an external dependency - it requires a Renderer so that it can get the current Viewport:

linguine/src/systems/CameraSystem.h

#pragma once

#include "System.h"

#include "renderer/Renderer.h"

namespace linguine {

class CameraSystem : public System {
  public:
    CameraSystem(EntityManager& entityManager, const Renderer& renderer)
        : System(entityManager), _renderer(renderer) {}

    void update(float deltaTime) override;

    void fixedUpdate(float fixedDeltaTime) override {}

  private:
    constexpr static float _height = 10.0f;

    const Renderer& _renderer;
};

}  // namespace linguine

linguine/src/systems/CameraSystem.cpp

#include "CameraSystem.h"

#include "components/Transform.h"
#include "components/HasCamera.h"

namespace linguine {

void CameraSystem::update(float deltaTime) {
  const auto viewport = _renderer.getViewport();

  findEntities<Transform, HasCamera>()->each([viewport](const Entity& entity) {
    const auto transform = entity.get<Transform>();
    const auto hasCamera = entity.get<HasCamera>();
    const auto camera = hasCamera->camera;

    auto cameraModelMatrix = glm::translate(glm::mat4(1.0f), transform->position)
                             * glm::mat4_cast(transform->rotation);

    camera->viewMatrix = glm::inverse(cameraModelMatrix);
    camera->projectionMatrix = glm::ortho(
        -_height / 2.0f * viewport->getAspectRatio(),
        _height / 2.0f * viewport->getAspectRatio(),
        -_height / 2.0f,
        _height / 2.0f,
        0.0f,
        10.0f
    );

    camera->viewProjectionMatrix = camera->projectionMatrix * camera->viewMatrix;
  });
}

}  // namespace linguine

This whole time, the Engine has also contained a bunch of member variables and logic in order to log the current update() and fixedUpdate() frame rates. This can now easily be extracted into its own FpsSystem class.

linguine/src/systems/FpsSystem.h

#pragma once

#include "System.h"

#include "Logger.h"

namespace linguine {

class FpsSystem : public System {
  public:
    FpsSystem(EntityManager& entityManager, Logger& logger)
        : System(entityManager), _logger(logger) {}

    void update(float deltaTime) override;

    void fixedUpdate(float fixedDeltaTime) override;

  private:
    Logger& _logger;

    float _dtAccumulator = 0.0f;
    int _updateCounter = 0;

    float _fdtAccumulator = 0.0f;
    int _fixedUpdateCounter = 0;
};

}  // namespace linguine

linguine/src/systems/FpsSystem.cpp

#include "FpsSystem.h"

namespace linguine {

void FpsSystem::update(float deltaTime) {
  _dtAccumulator += deltaTime;
  _updateCounter++;

  while (_dtAccumulator >= 1.0f) {
    _logger.log("update(): " + std::to_string(_updateCounter) + " fps");

    _dtAccumulator -= 1.0f;
    _updateCounter = 0;
  }
}

void FpsSystem::fixedUpdate(float fixedDeltaTime) {
  _fdtAccumulator += fixedDeltaTime;
  _fixedUpdateCounter++;

  while (_fdtAccumulator >= 1.0f) {
    _logger.log("fixedUpdate(): " + std::to_string(_fixedUpdateCounter) + " fps");

    _fdtAccumulator -= 1.0f;
    _fixedUpdateCounter = 0;
  }
}

}  // namespace linguine

Constructing the Scene

Rather than the Engine calling all of its system methods each frame, I can just construct instances of these system classes, and call update() and fixedUpdate() on all of them as needed, and everything continues to function correctly. While we have successfully reduced a ton of clutter from the Engine class, it is still responsible for constructing all of the entities and components in its constructor. This can all be extracted into a Scene, along with the updates to the systems.

linguine/include/Scene.h

#pragma once

#include <vector>

#include "System.h"

namespace linguine {

class Scene {
  public:
    virtual ~Scene() = default;

    void update(float deltaTime) {
      for (const auto& system : _systems) {
        system->update(deltaTime);
      }
    }

    void fixedUpdate(float fixedDeltaTime) {
      for (const auto& system : _systems) {
        system->fixedUpdate(fixedDeltaTime);
      }
    }

  protected:
    inline void registerSystem(std::unique_ptr<System> system) {
      _systems.push_back(std::move(system));
    }

  private:
    std::vector<std::unique_ptr<System>> _systems;
};

}  // namespace linguine

From here, I can create a TestScene class, and extract all of the composition logic from the Engine class. Finally, I can just keep a std::unique_ptr<Scene> member variable around and call update() and fixedUpdate() on it appropriately.

linguine/src/scenes/TestScene.h

class TestScene : public Scene {
  public:
    TestScene(EntityManager& entityManager, Renderer& renderer, Logger& logger)
        : Scene() {
      registerSystem(std::make_unique<FpsSystem>(entityManager, logger));
      registerSystem(std::make_unique<RiserSystem>(entityManager));
      registerSystem(std::make_unique<RotatorSystem>(entityManager));
      registerSystem(std::make_unique<QuadTransformationSystem>(entityManager));
      registerSystem(std::make_unique<CameraSystem>(entityManager, renderer));

      // And all the entity composition logic...
    }
};

At this point, there is still only one EntityManager, owned by the Engine, which gets passed into the scene, which then passes it to the various systems. I want the scenes to create and manage their own EntityManager, but I don't want them to care about the fact that they are actually constructing an ArchetypeEntityManager. For this, I'll use a "factory" object, and pass that into the scenes instead. If I want to switch out which type of EntityManager the scenes will use, I can just change the implementation of the EntityManagerFactory. If I only want a specific scene to use a different type of EntityManager, then it is free to construct whichever type it wants.

linguine/include/entity/EntityManagerFactory.h

#pragma once

#include <memory>

#include "EntityManager.h"

namespace linguine {

class EntityManagerFactory {
  public:
    virtual ~EntityManagerFactory() = default;

    virtual std::unique_ptr<EntityManager> create() = 0;
};

}  // namespace linguine

linguine/src/entity/archetype/ArchetypeEntityManagerFactory.h

#pragma once

#include "entity/EntityManagerFactory.h"

#include "ArchetypeEntityManager.h"

namespace linguine::archetype {

class ArchetypeEntityManagerFactory : public EntityManagerFactory {
  public:
    std::unique_ptr<EntityManager> create() override {
      return std::make_unique<ArchetypeEntityManager>();
    }
};

}  // namespace linguine::archetype

Now the base Scene's constructor takes in a std::unique_ptr<EntityManager>, and I've added some convenience methods to make it easier for subclasses to interact with it.

linguine/include/Scene.h convenience methods

protected:
  inline EntityManager& getEntityManager() {
    return *_entityManager;
  }

  inline std::shared_ptr<Entity> createEntity() {
    return _entityManager->create();
  }

linguine/src/scenes/TestScene.h using the convenience methods

class TestScene : public Scene {
  public:
    TestScene(EntityManagerFactory& entityManagerFactory, Renderer& renderer, Logger& logger)
        : Scene(entityManagerFactory.create()) {
      registerSystem(std::make_unique<FpsSystem>(getEntityManager(), logger));
      registerSystem(std::make_unique<RiserSystem>(getEntityManager()));
      registerSystem(std::make_unique<RotatorSystem>(getEntityManager()));
      registerSystem(std::make_unique<QuadTransformationSystem>(getEntityManager()));
      registerSystem(std::make_unique<CameraSystem>(getEntityManager(), renderer));

      // Camera
      auto cameraEntity = createEntity();

      // And the rest of the entity composition logic...
    }
};

Implementing the Service Locator

My version of the service locator pattern will be an interface that contains methods to retrieve all of the possible underlying services that a system might want access to. My Engine class will implement this interface, and pass itself as a ServiceLocator to the scene.

I'll need to create virtual methods for all of these services that I've created along my journey:

EntityManagerFactory
InputManager
LifecycleManager
Logger
Renderer
TimeManager

It's tedious, but I've created the methods, such as getEntityManagerFactory(), and implemented them within the Engine class. I kind of dislike having separate methods for each of them, so within the ServiceLocator, I'll create a templated get<T>() method, with specializations for each supported type. I'll make the verbosely named methods private so that no one can access them directly. The template causes one interesting problem though: what happens if someone calls it with a type that is unsupported?

I found this trick on Stack Overflow, which utilizes static assertions to fail compilation if the template hasn't been specialized for the requested type. That will give me instant feedback if I need to add a new service to the service locator.

Finally, I can access the services as needed within the scene, and pass references to relevant services to the systems which depend upon them.

linguine/src/scenes/TestScene.h with the `ServiceLocator`

class TestScene : public Scene {
  public:
    explicit TestScene(ServiceLocator& serviceLocator)
        : Scene(serviceLocator.get<EntityManagerFactory>().create()) {
      registerSystem(std::make_unique<FpsSystem>(getEntityManager(), serviceLocator.get<Logger>()));
      registerSystem(std::make_unique<RiserSystem>(getEntityManager()));
      registerSystem(std::make_unique<RotatorSystem>(getEntityManager()));
      registerSystem(std::make_unique<QuadTransformationSystem>(getEntityManager()));
      registerSystem(std::make_unique<CameraSystem>(getEntityManager(), serviceLocator.get<Renderer>()));

      // Entity composition blah blah blah...
    }
};

The cool thing about this solution is that I can already imagine a use case for expanding it: accessing the engine's SceneManager, which systems can use to transition between scenes - but that will have to wait for another day.

This chapter has had a distinct lack of images, so without further ado, here is a heavily compressed GIF of 10,000 quads of random colors and positions, with half of them rotating and the other half rising up toward infinity.

ECS Demo

I can assure you that this demo runs at 400 frames per second on my laptop. Maybe someday I'll figure out how to make WebP or APNG files and retroactively convert the videos I've made so far. Even a lossless format wouldn't actually contain 400 frames per second, since this screen is only refreshing 120 times per second - but I digress.

Story Time

Five or six years ago, at one of my previous jobs, I was leading a team of a couple of developers, along with a quality analyst. This particular company was fairly small, so it came to me by surprise when I heard that my team was getting an intern.

My team was a goofy bunch. We would crack jokes often during meetings and play video games together over lunch or after work hours (sometimes during work hours, like when Super Smash Bros. Ultimate first released). I didn't consider us to be the image of professionalism that I expected leadership to choose for an intern to observe, but the truth of the matter is that we took our work very seriously.

We took pride in our code - how it looked, how it performed, how testable it was, and how it was architected. As the technical leader and most senior member of that team, I was responsible for guiding the others in their pursuit of "C.L.E.A.N." code. The phrase was coined with the help of our intern during an afternoon planning session on a Friday: "Code Looks Even Awesomer Now". The grammar is terrible, but the message is clear.

I still keep in touch with one of the members of that team, who has since taken on a leadership position at that same company. I'm proud of his growth as a software engineer over the years - bravo, Mark.

The code up to this point is available on GitHub at this commit. I hope you can appreciate how C.L.E.A.N. it is.

Entity Management

5.1 Choosing an Architecture

Object-Orientated Architecture

Entity-Component Architecture

CPU Caching

Entity-Component-System Architecture

Nuance in Software Architecture

My Flavor of ECS

5.2 The Creative Process and Iteration

5.3 Defining the Abstraction

linguine/include/entity/Entity.h

linguine/include/entity/EntityManager.h

5.4 Implementing the Interfaces

The Archetype Graph

Take a Big Byte

A Different Approach

Component Storage

Querying for Components

linguine/include/entity/EntityManager.h

linguine/include/entity/Result.h

linguine/src/entity/archetype/ArchetypeResult.cpp

Performance Profiling

linguine/src/Engine.cpp constructor

linguine/src/Engine.cpp "system" methods

linguine/include/entity/Entity.h get<T>()

linguine/include/entity/Component.h operator->()

5.5 Putting the "S" in "ECS"

Dependency Injection and Service Locators

Extracting the Systems

linguine/include/System.h

linguine/src/systems/RiserSystem.h

linguine/src/systems/RiserSystem.cpp

linguine/src/systems/CameraSystem.h

linguine/src/systems/CameraSystem.cpp

linguine/src/systems/FpsSystem.h

linguine/src/systems/FpsSystem.cpp

Constructing the Scene

linguine/include/Scene.h

linguine/src/scenes/TestScene.h

linguine/include/entity/EntityManagerFactory.h

linguine/src/entity/archetype/ArchetypeEntityManagerFactory.h

linguine/include/Scene.h convenience methods

linguine/src/scenes/TestScene.h using the convenience methods

Implementing the Service Locator

linguine/src/scenes/TestScene.h with the ServiceLocator

Story Time

linguine/include/entity/Entity.h `get<T>()`

linguine/include/entity/Component.h `operator->()`

linguine/src/scenes/TestScene.h with the `ServiceLocator`