Skip to content

Dynamic Renderables

One of the first times I began to dabble with game programming was in 2015, using a Java-based library called libGDX. This library advertises itself more as a "framework" for game development, rather than an "engine". It provides a lot of features which expose you to common interfaces in game dev, and helps guide you in using them. Multi-dimensional position coordinates, physics libraries, OpenGL bindings - all available within the context of a library that promises cross-platform support. It sounds perfect!

When they say it's not an "engine", what they mean is that there's no proper entity system, renderer, or physical simulations out-of-the-box (at least, not that I was aware of). They provide interfaces for constructing a basic game loop, but it's up to you to take the APIs available and intertwine them into something functional. With the breadth of libraries available to a new game dev, it can be intimidating.

Always a sucker for RPGs, I set out to build a 2D top-down tile-based world. I did enough Googling to learn how to draw a square on the screen, in a very similar way in which I've drawn the triangle from the last chapter. I drew the square directly in the center of the screen, and that was my "character".

Expanding upon that idea, I maintained a 2-dimensional array of "tiles" in the world, which each had a color and a position. In order to move my character throughout the world, I iterated over the data structure, and moved every single tile according to the player's inputs. I then iterated over all the tiles again, and drew them to the screen. You can imagine in a vast world how inefficient it would be to update the positions of every tile every single frame!

The positions of all of these tiles contained X and Y values, which I passed into the vertex shader using 0.0 as the Z value and 1.0 as the W value (just like I did in for the triangle in the last chapter). I had no idea what "clip space" or "normalized device coordinates" were at the time. It made sense to me that I could draw to the screen on the X and Y axes, where the center was {0.0, 0.0}, and if either of the numbers went below -1.0 or above 1.0, then it would be "off the screen". Yes, that means that I was maintaining a separate vertex buffer for every single tile, and yes, I was updating every single vertex buffer every single frame and re-uploading it to the GPU! I knew it was inefficient, but I didn't get caught up on it because it worked, and I could focus on optimizing performance later. For all the insanely bad things I was doing, I did have one valuable realization: my character wasn't moving, it was the world which was moving around my character!

I've lost count of how many games I've built since then - most of which fizzle out into oblivion. With each new game idea, I start from scratch in a slightly different way. Sometimes I try to build the same game idea that I've tried to build before using a different engine or framework. Much like some sort of machine learning model, I've converged on some much better solutions to some problems.

One of these solutions I learned in a later project, when I decided I'd set about building and releasing my own general-purpose game engine for public use (another terrible idea on my part). I had previously built a few games in Unity, and noticed that you didn't have to move the world around the character, so I set about replicating that functionality. I came across this tutorial which explains how to use matrix multiplication to adjust the position of your vertices based on the parameters of your world and a "camera" within it. Funny enough, this is listed as the third entry in their "Basic tutorials" section - guess I didn't even try to learn how to do things the right way before! The tutorial beautifully illustrates what these matrix operations are doing, so I'd recommend you read it. It also quotes one of my favorite shows, Futurama:

The engines don’t move the ship at all. The ship stays where it is and the engines move the universe around it.

Wow, that resonated with me deeply. Rather than my game logic moving every single tile within the world, each tile should have a static position, which could be adjusted using matrix multiplication within my ~~ship's~~ game's engine!

Obviously that project didn't work out, but I now had a very powerful new tool. Here's a quick rundown:

  • Each vertex within a single object is defined relative to the origin of the model itself, which is called "model space".
  • Each object has a position, rotation, and scale within the "world", which can be represented by a "model" matrix.
  • Multiplying a vertex's position in model space by the model matrix will convert it into "world space".
  • Players have a view into that world via a camera object, which itself has a position and rotation. The inverse of the camera's model matrix is called the "view" matrix.
  • Multiplying a vertex's position in world space by the view matrix will convert it into "view space".
  • View space is also referred to as "camera space", since the new vertex position is relative to the camera object, and has nothing to do with the operating system's "view" widget (such as MTK::View).
  • The camera also defines a type of projection, which can be represented as a "projection" matrix.
  • Multiplying a vertex's position in view space by the projection matrix will convert it into clip space!
  • "Perspective" projections are typically used for 3-dimensional scenes, and contain a field of view (FoV), and aspect ratio, a near plane, and a far plane.
  • "Orthographic" projections are typically used for 2-dimensional scenes and simply contain a width and a height.

Here's the crazy part: I wasn't wrong! We still end up moving every single object in the world on every single frame. In fact, we're moving every single vertex of every single model, which means more total instructions, which should be slower, right?

If we were doing this math on the CPU, it would absolutely be slower than what I was originally doing all those years ago. The GPU, however, is very good at doing these types of floating-point operations - much faster than the CPU. What's more is that we are no longer maintaining a separate vertex buffer for every single object in the world, only a different buffer for each different model that is used, which greatly reduces overhead on both the CPU and the GPU. Furthermore, we are no longer updating the vertex buffer every frame at all.

The use of matrices certainly has a bit of overhead, however. We'll need to create and update uniform buffers, which contain the values of all of these matrices, so that they can be used in our vertex shader. Additionally, these uniform buffers must be activated prior to issuing a draw call which requires them.

One of the most beautiful parts about this solution is that it makes it very easy to reason about where in the world each object is located. Only the GPU cares about calculating where on the screen the object should be located.

There are a few goals that I have for this chapter:

  • Draw multiple objects on the screen, without hard-coding them into the renderer.
  • Objects should have positions and rotations defined in terms of world space.
  • Those positions and rotations should be updatable on a frame-by-frame basis.

Creating an entity system is a non-goal for now. Rather than hard-coding things in the Renderer, I'll just hard-code things in the Engine.

4.1 The "Render Feature" Abstraction

Renderers come in all shapes and sizes. Some are massive, highly configurable behemoths capable of rendering just about anything you can imagine; others are tiny and rigid, only capable of rendering simple shapes. Some renderers expose hyper-realistic material systems, perhaps utilizing cutting edge techniques like ray-tracing to perform lighting calculations that rival real life; others just allow small textures on a flat surface.

It's tempting for a programmer to look at the problem with big eyes and imagine all of the possibilities, but when writing an engine for a specific game, it's important to only build the features that the game will need.

This presentation by Natalya Tatarchuk of Bungie goes into extraordinary detail about the rendering architecture used in the game Destiny. It's pleasantly technical, and explains how all of the parts work together to achieve the final result of a highly complex visual environment. My main takeaway from this architecture was their concept of "render features", which allows the renderer itself to remain focused, yet modular enough to enable the addition of new rendering use cases over time. The game code can configure an object to use a feature without having to know anything about how it was implemented - the render feature to utilize is part of the renderer's abstraction.

I took a stab at implementing this type of architecture in (yet another) side project of mine which supported both OpenGL and Vulkan rendering backends. It was cool to see it all come together in such an extensible way, but I spent a very long time painstakingly implementing every piece of the renderer described in the presentation, and didn't even end up building a game with it.

For a game like Destiny, render features, job systems, and cache coherence can be absolutely critical to the success of the game. These complex engine features are exactly what the game needs.

For my game, however, it's unlikely that I'll need to do much more than render a texture with an arbitrary position, though it's not inconceivable that I'll want to use specific shaders for things like progress bars or other UI elements. Therein lies the conundrum - how can I reduce the amount of unnecessary work, while still developing a renderer that is extensible?

The Proposal

I think it's important that we are able to add new renderer features for our game code to utilize over time without being forced to refactor everything that happens to use the renderer. For this reason, I'd still like to make a render feature abstraction. Each renderable object should be able to set a feature object, which contains all of the data that the specific feature will require to be rendered.

This is what I'm imagining:

std::shared_ptr<RenderFeature> characterFeature = std::make_shared<CharacterFeature>();
characterFeature->setPosition(characterObject->position);
characterFeature->setDirection(CharacterFeature::Direction::Left);

auto characterRenderable = _renderer->createRenderable(characterFeature);

// Disable the character while we're loading
characterRenderable->setEnabled(false);

// Create a progress bar
std::shared_ptr<RenderFeature> progressBarFeature = std::make_shared<ProgressBarFeature>();
progressBarFeature->setProgress(0.0f);

auto progressBarRenderable = _renderer->createRenderable(progressBarFeature);

// Over time, keep increasing its rendered progress
progressBarFeature->setProgress(0.85f);

// Later, I'd like to replace the progress bar with a "success" message
std::shared_ptr<RenderFeature> successFeature = std::make_shared<SuccessFeature>();
progressBarRenderable->setFeature(successFeature);

// After some more time, when I'm done with the progress bar entirely...
progressBarRenderable->dispose();
characterRenderable->setEnabled(true);

This way, I'd be able to keep references to my renderables between frames in order to switch out features as needed, or destroy them when I'm done with them.

I can already foresee some issues with how this solution will eventually end up integrating with other systems:

  • I'll eventually need to build some sort of entity system, which means I'll need to be mindful about how my entities manage their own renderables and render features between frames.
  • I will probably want to skip draw calls for objects that aren't in front of the camera, but that will require a physics system capable of calculating whether any given object is within the frame (the camera's "frustum") - a technique known as "frustum culling".
  • I'll need to take special care of rendering objects which utilize transparency, which can cause things to look very weird if done incorrectly.

Those issues aren't unsolvable using the abstraction that I've proposed. The first concern just comes down to memory management, which I can worry about when I actually design my engine's entity system. The second is a matter of calling the setEnabled() method of my renderables based on camera position. The last one can hopefully be handled within whichever render feature is capable of rendering the transparent geometry, which might actually be none! I'll simply have to implement that later, if I find myself needing it.

I feel like I'm overthinking it, but these are all problems that I've stumbled upon in the past trying to implement the rendering logic naively. In an attempt to get out of this analysis paralysis, I'm just going to start building out this abstraction.

There Was an Attempt

Okay, honestly, I've been working on this here and there for the last couple of days without really being able to focus on it. Combined with the fact that I still struggle with C++ templates and cyclical class dependencies, I haven't come up with a solution that I'm happy with yet.

Whipping up the feature structs is fairly trivial. They are pure data and I really like that about them. It's also really easy to make a createRenderable() method on the Renderer that just constructs a Renderable object.

In order for the Renderable to hold any type of feature struct, I made a RenderFeature marker interface, which all feature structs are required to inherit. From there, I can just hold a pointer to a RenderFeature object within the Renderable. You can then access the getFeature<T>() template method in order to access the data, which just casts the stored feature pointer to the desired feature type - assuming you know which type of feature that particular Renderable is supposed to contain. This is technically a form of type erasure, which is closer to how Java generics work - and I'm much more comfortable with advanced Java than I am with advanced C++.

I was struggling to come up with a performant way to store the Renderable objects. I finally decided that the feature renderers should just store the Renderable objects themselves, which seemed to work alright. So I created a FeatureRenderer base class, which stores Renderable pointers in a std::vector. I extracted the drawing logic from the MetalRenderer into a new TriangleFeatureRenderer, which now owns the hard-coded vertex buffers, shader code, and pipeline descriptor creation, as well as the drawing logic. I also went ahead and created a QuadFeatureRenderer which just draws a plain white quad.

The feature renderers live in the renderers/metal CMake project, since they deal with the Metal API directly. The idea is that each rendering API that I want to support in the future just needs to implement the same feature renderers as any existing API. I quickly realized that the feature renderers needed more context about the current state of the renderer. For example, in order to activate the vertex buffers, bind the pipeline, and issue the draw call, I actually need access to the current MTL::RenderCommandEncoder. I played around with creating a Context abstraction that would get passed into the draw() method of each feature renderer. The idea was that a separate implementation of the Context would be passed into each rendering API's implementation of the feature renderers, but this quickly because unwieldy, since doing so would require templating the feature renderers, which bubbled up to the top-level renderer. Instead, I just pass in a MetalContext& reference to each feature renderer's constructor, and they can keep it around and use it later.

With all of that in place, I had to actually add the Renderable objects with their appropriate feature renderer, so I added a getFeatureRenderer<T>() template method in the Renderer front-end class. Each feature renderer is currently a separate pointer, which is set by the rendering API implementation (for example, the MetalRenderer class is responsible for setting the _quadFeatureRenderer to its own QuadFeatureRenderer implementation). Then I have to create a separate template specialization for each feature type (in the same example, there is a getFeatureRenderer<QuadFeature>() specialization that returns _quadFeatureRenderer).

The MetalRenderer currently just calls each of these specializations in its own draw() method, and calls draw() on each one of them, after updating the MetalContext with the new MTL::RenderCommandEncoder.

Finally, I updated the createRenderable() method so that it adds the Renderable to the relevant feature renderer's list of managed Renderable objects.

This totally works for Scampi, since it uses the MetalRenderer directly, but remember that Alfredo uses two renderers: the MTKRenderer which forwards the draw() call to the MTK::View, which then forwards the call to the actual MetalRenderer. Just a couple of paragraphs above, I stated:

...the MetalRenderer class is responsible for setting the _quadFeatureRenderer to its own QuadFeatureRenderer implementation...

However, our MTKRenderer doesn't have any feature renderers to do this fragile initialization logic. Instead of fixing the fragility problem, I just deleted the MTKRenderer altogether. Instead, you can create a MetalRenderer with an autoDraw flag, which basically just does the exact same flow we used to do with our draw()/drawInternal() calls from the last chapter, except I've now named the internal call doDraw(), which is just as unhelpful.

In any case, after making that change, Alfredo also works. I can create a Renderable using either the QuadFeature or the TriangleFeature and get the correct outcome on the screen - but there are a lot of reasons to hate this design.

First of all, there are a lot of specific steps required to add a new "feature":

  1. Create the feature struct, but make sure it inherits from RenderFeature
  2. Add a member variable for the new feature renderer in the Renderer class
  3. Add the template specialization for getFeatureRenderer<T>() in the Renderer class
  4. For every supported rendering API:
    1. Create the feature renderer, inheriting from FeatureRenderer
    2. From the renderer which inherits from Renderer, set the new member variable to an instance of the new feature renderer
    3. In that same renderer's draw() method, call draw() on the new feature renderer

Secondly (and probably the real reason I'm dissatisfied), when I try to implement a setFeature<T>() template method on the Renderable, I found that I needed to reference the Renderer. Because the Renderer already includes the Renderable, it turns into a cyclical dependency. Normally I would just forward-declare the Renderer class within Renderable.h, and then actually include the Renderer within Renderable.cpp. The problem is that the function which requires the Renderer is a template, and so must be defined within the header file.

I considered flipping the problem around, and forward-declaring the Renderable within Renderer.h, but the Renderer also uses the Renderable from within a template method!

Lastly, I have completely neglected to even create the proposed dispose() method of the Renderable. It just made sense that if I could get setFeature<T>() working, then dispose() would be trivial, since they both have to remove resources from some feature renderer anyway.

The Do-Over

This is a chaotic mess (I believe the technical term is "dumpster fire"), so I'm going to start over. I was thrashing a lot, and the excessive use of templates was unnecessarily complicating the implementation. This is a testament to what sleep deprivation and a lack of focus can do to a developer.

C++ templates are incredibly powerful, effectively allowing you to make compile-time optimizations for "generic" use cases. I put "generic" in quotes, because there are significant differences between generics and templates. This documentation from Microsoft does a pretty good job of conveying those differences.

Regardless of their power, using templates at this phase might actually be an early optimization that I could just solve using object-oriented programming. I know this is a bit sacrilege among a large portion of the C++ community, but the reality is that I'm nowhere close to any sort of performance bottleneck, and at this point I should be prioritizing readability and maintainability in my code. I'm not saying my final solution won't end up using templates, just that I need to hold off on utilizing them for now.

So I'll start by just deleting everything out of the Renderer and Renderable classes. I don't actually want to have to re-write any of the code that interacts with the Metal API, so I'll leave all of that alone for the time-being.

I've added a create() method to the Renderer, which just constructs a Renderable using a provided RenderFeature pointer. This method was a template before, which utilized the templated setFeature<T>() method, which caused so many issues. Because RenderFeature is the base of all possible render features, I can just use a pointer to the base class. The setFeature() method is therefore no longer a template, and just accepts a pointer to a RenderFeature. In addition to a pointer to the feature, the Renderable class stores a unique identifier and a flag to indicate whether it's enabled.

One of the things that annoyed me about the previous implementation was the number of touch points required to support a new feature. To solve this, I'm adding a virtual getFeatures() method to the Renderer class, which will be implemented by each derived class. The MetalRendererImpl class just stores a std::vector<std::unique_ptr<FeatureRenderer>> member variable, and returns a const reference to it as the method implementation. Then, in the draw() method of the MetalRendererImpl, I just iterate over the features in the vector and invoke their draw() methods.

Here's the original list of steps required in order to support a new feature, modified to reflect the changes I've made:

  1. Create the feature struct, but make sure it inherits from RenderFeature
  2. ~~Add a member variable for the new feature renderer in the Renderer class~~
  3. ~~Add the template specialization for getFeatureRenderer<T>() in the Renderer class~~
  4. For every supported rendering API:
    1. Create the feature renderer, inheriting from FeatureRenderer
    2. From the renderer which inherits from Renderer, ~~set the new member variable to an instance of the new feature renderer~~ add it to the vector of FeatureRenderer pointers
    3. ~~In that same renderer's draw() method, call draw() on the new feature renderer~~

That is much better. One of the downsides is that there's no error state if a feature is not implemented by a specific renderer. The old code required you to specify the FeatureRenderer to use for every possible RenderFeature, using template specializations. I think I can live with this tradeoff because it should be blatantly obvious if something in the game isn't being rendered - the game itself displays the error state.

The big thing that is still missing is the ability for the FeatureRenderer classes to access the Renderable objects that they care about, and furthermore access the RenderFeature objects which they contain as the derived feature. What I mean is that the QuadFeatureRenderer doesn't care about Renderable or RenderFeature objects - it cares specifically about QuadFeature objects, which are currently stored as RenderFeature pointers.

For now, I'm going to do something that I absolutely hate. I'm going to add a getFeature<T>() template method to the Renderable class which attempts to dynamically cast the RenderFeature pointer to whichever type we request. If the result is a nullptr then the cast failed, but that should never be the case, because of the next step.

I'm also going to add an onFeatureChanged(Renderable&) method to the Renderer, which will iterate over all the FeatureRenderer objects returned by getFeatureRenderers(), and call an onFeatureChanged(Renderable&) method, which I'll also add to the FeatureRenderer class.

In the FeatureRenderer, I'll keep a vector of Renderable pointers, and only keep track of the Renderable objects that the specific FeatureRenderer implementation cares about. When a feature is changed for any given Renderable, I'll add it or remove it from the vector as needed, using a pure virtual isRelevant(Renderable&) method, which will need to be implemented by each FeatureRenderer implementation.

For the QuadFeatureRenderer, the isRelevant() method will return true if renderable.getFeature<QuadFeature>() does not return a nullptr. Likewise, the TriangleFeatureRenderer will check for the TriangleFeature. I don't think it makes sense for the FeatureRenderer to be responsible for the null-check, so I'll add a hasFeature<T>() template function to the Renderable class that does the check for me.

Finally, I'll add a protected getRenderables() method to the FeatureRenderer base class that returns a reference to the vector of Renderable objects. With that, I can iterate over the Renderable objects that are relevant, and call the rendering API functions required.

I'll modify the Engine class to test out what I have so far. I'll create a single _renderable member variable, and add this to my constructor:

auto feature = std::make_shared<QuadFeature>();
_renderable = _renderer->create(feature);

It works! I can see a white quad as expected from the code I wrote in the QuadFeatureRenderer. I'll just change the feature to a TriangleFeature instead, and... it also works! There is a white triangle in the middle of the screen, according to how I wrote the TriangleFeatureRenderer.

Now for something a bit more complex. I'll change my constructor back to using a quad feature, but in my update() method, I'll switch the feature to a triangle feature after the first second has passed:

void linguine::Engine::update(float deltaTime) {
  _dtAccumulator += deltaTime;
  _updateCounter++;

  while (_dtAccumulator >= 1.0f) {
    _logger->log("update(): " + std::to_string(_updateCounter) + " fps");

    _dtAccumulator -= 1.0f;
    _updateCounter = 0;

    if (_renderable->hasFeature<QuadFeature>()) {
      auto feature = std::make_shared<TriangleFeature>();
      _renderable->setFeature(feature);
    }
  }
}

Remember that the while loop "ticks" for every second that has passed, and reports our current framerate. I just added a quick conditional which replaces the quad with a triangle the first time that this loop is hit. After that, _renderable->hasFeature<QuadFeature>() should just always return false.

As anticipated, the application opens and displays a quad on the screen, and after one second, the quad disappears and a triangle appears in its place. Excellent!

Feature Data

Now for the real test. I need to be able to set data on my RenderFeature which gets utilized in the rendering logic of the associated FeatureRenderer. To demonstrate this, I'll add a single float variable to the QuadFeature struct, which I'll simply name value. This value will range from 0.0 to 1.0, representing how red I want the quad to be - at 0.0, the quad should be completely black, and at 1.0, the quad should be a pure red color.

Before I can dynamically change the color of the quad, I'll first need to write the rendering logic to enable dynamic coloring. I'll accomplish this using a separate uniform buffer for each Renderable that the feature renderer has registered, which means I'll need to store a vector of MTL::Buffer pointers as well (and release() them later).

To test out the changes to the shader code, I'll just hard-code a single uniform buffer for now. The vertex shader function looks like this:

VertexOutput vertex vertexColored(uint index [[vertex_id]],
    const device float2* positions [[buffer(0)]],
    const constant float& value [[buffer(1)]]) {
  VertexOutput o;
  o.position = float4(positions[index], 0.0, 1.0);
  o.color = half3(value, 0.0, 0.0);
  return o;
}

Using constant instead of device indicates that the value is constant for all vertices. Furthermore, defining our float as a reference (&) instead of a pointer (*) forces it to be a single value rather than enabling array access syntax (who knows what might happen if we were to read into memory beyond that of our actual float value?).

Construction of the uniform buffer is as straight-forward as you might think:

float value = 1.0f;

_testUniform = _context.device->newBuffer(sizeof(float), MTL::ResourceStorageModeShared)
memcpy(_testUniform->contents(), &value, sizeof(value));
_testUniform->didModifyRange(NS::Range::Make(0, sizeof(value)));

Activating the buffer before the draw call is also simple:

_context.renderCommandEncoder->setVertexBuffer(_testUniform, 0, 1);
_context.renderCommandEncoder->drawPrimitives(MTL::PrimitiveType::PrimitiveTypeTriangle, NS::UInteger(0), NS::UInteger(6));

Indeed the quad is now red. If I adjust the value being copied into the buffer, then I get different shades of red as intended. Now I can adjust this to work for each Renderable. I'll create a vector of MTL::Buffer pointers named _valueBuffers, and I'll lazily create enough buffers to match the number of Renderable objects that I'm managing. As I'm iterating over the Renderable objects, I'll copy the feature value into a buffer of the matching index, and activate that buffer prior to making the draw call.

Here is the QuadFeatureRenderer's draw() method:

void linguine::render::QuadFeatureRenderer::draw() {
  _context.renderCommandEncoder->setRenderPipelineState(_pipelineState);
  _context.renderCommandEncoder->setVertexBuffer(_vertexPositionsBuffer, 0, 0);

  const auto renderables = getRenderables();

  while (_valueBuffers.size() < renderables.size()) {
    _valueBuffers.push_back(_context.device->newBuffer(sizeof(float), MTL::ResourceStorageModeShared));
  }

  for (int i = 0; i < renderables.size(); ++i) {
    auto renderable = renderables[i];

    if (renderable && renderable->isEnabled()) {
      auto feature = renderable->getFeature<QuadFeature>();

      auto valueBuffer = _valueBuffers[i];
      memcpy(valueBuffer->contents(), &feature->value, sizeof(feature->value));
      valueBuffer->didModifyRange(NS::Range::Make(0, sizeof(feature->value)));

      _context.renderCommandEncoder->setVertexBuffer(valueBuffer, 0, 1);
      _context.renderCommandEncoder->drawPrimitives(MTL::PrimitiveType::PrimitiveTypeTriangle, NS::UInteger(0), NS::UInteger(6));
    }
  }
}

If I run my application now, I see a black screen for one second, prior to the white triangle being shown - but this is actually expected, because the default value of my QuadFeature's value is 0.0, so I'm actually seeing a black quad!

Just to be sure, I'll add feature->value = 0.5f to my Engine constructor, which results in a dark-red quad being shown prior to the triangle.

As a final test, I'll remove that line, and instead I'll add the following to my update() method:

if (_renderable->hasFeature<QuadFeature>()) {
  auto feature = _renderable->getFeature<QuadFeature>();
  feature->value += deltaTime;

  if (feature->value > 1.0f) {
    feature->value = 1.0f;
  }
}

This little chunk of code will make my quad "fade in" from black to red, prior to it being completely replaced by the triangle.

Fade In

4.2 Operating System Updates

I really hope this section is short. I've been putting off updating my laptop for a couple of months now, for fear of screwing up my development environment. I've been getting a notification once a day to update to macOS 13.2 "Ventura", and like most people, I've been repeatedly clicking the "Remind me later" option - my younger self would be ashamed of what I've become!

I'm typing this while I wait for the rather large download. Once the update finishes, I'll open up CLion again and do a clean build for both Alfredo and Scampi to make sure everything continues to work as intended.


As far as I can tell, everything is working as expected, so I'll keep moving forward. Interestingly, Alfredo's framerate with v-sync disabled is way higher than it was before. Previously, it was hovering around 180 frames per second. After the update, it's running somewhere between 1,500 and 2,000 frames per second. I'm not sure why that changed, but I'm not complaining.

4.3 Camera Projections

Earlier I described how to utilize matrix multiplication to render objects relative to one another in the world, from the perspective of a virtual camera. I intend to implement this technique in the QuadFeatureRenderer.

In the last chapter, I mentioned that I'd be temporarily using Apple's <simd/simd.h> header for vectorized types within the Metal-specific rendering implementation, which makes sense, considering Metal can only be used on Apple devices. The whole point of the RenderFeature abstraction, however, is to pass arbitrary data from the platform-agnostic engine code into the renderer, such that it gets utilized in a platform-specific way. Since the features are intended to be platform-agnostic, I need to choose a different mechanism for storing vectorized types.

Historically I've used GLM for all of my projects. GLM was originally designed to be used with OpenGL, but has been updated to also make it easily compatible with Vulkan. GLM is mostly compatible with Metal, except for a few very particular cases, which I will have to be mindful of when implementing the FeatureRenderer classes. This Stack Overflow answer does a fantastic job of explaining the gotchas of using GLM with Metal. In particular, I'm concerned about the alignment requirements of 4x4 matrices.

Worst case, I can copy the data from a GLM type into a simd.h type, which will ensure compatibility with Metal shaders, albeit at the cost of an additional copy.

There are many alternatives to GLM, but it's definitely one of the most widely used, and it has first-class support for ARM64 CPUs, such as the ones that are in my laptop and phone. Furthermore, by using it, I won't have to learn yet another new API - so I'm rolling with it.

I've downloaded the latest published release of GLM, 0.9.9.8, and extracted the zip file into my third_party/ directory. Since GLM is a header-only library, I'll simply add "${THIRD_PARTY_DIR}/glm" to the include directories for the linguine target.

As a quick test, I've removed the inclusion of <simd/simd.h> from my QuadFeatureRenderer and TriangleFeatureRenderer classes. This obviously caused compiler errors in places where I was using simd::float2. I simply included <glm/vec2.hpp> instead, and replaced the references to simd::float2 with glm::vec2.

That worked just fine. Easy! Now I can get to work with matrices.

The Model Matrix

Up to this point, the QuadFeature struct has just contained a single float, representative of how "red" the resulting quad should be. When we use the shader defined in QuadFeatureRenderer, the vertices are drawn relative to the center of the screen. The center of the screen is bound to the center of the model. In order to move our model around the screen, we could update our model's vertices, or, as I described earlier, we could multiply each vertex by a model matrix inside of the shader. Doing so would convert each vertex into "world space", which means the center of the screen is instead bound to the center of the world.

Using the render feature abstraction, I should be able to add a model matrix to the QuadFeature, increase the size of each uniform buffer in the QuadFeatureRenderer, copy the model matrix into the uniform buffer, and utilize it within the shader. It's important to note that in matrix multiplication, unlike regular multiplication, the order in which you multiply the values actually matters. I won't go into the mathematical reasons why, but if you are interested, you should definitely Google it to learn more. So let's give this a shot.

linguine/include/renderer/features/QuadFeature.h

#pragma once

#include "RenderFeature.h"

#include <glm/mat4x4.hpp>

namespace linguine {

struct QuadFeature : public RenderFeature {
  glm::mat4 modelMatrix = glm::mat4(1.0f);
  float value = 0.0f;
};

}  // namespace linguine

Metal shader code within renderers/metal/src/features/QuadFeatureRenderer.cpp

struct QuadFeature {
  metal::float4x4 modelMatrix;
  float value;
};

struct VertexOutput {
  float4 position [[position]];
  half3 color;
};

VertexOutput vertex vertexColored(uint index [[vertex_id]],
    const device float2* positions [[buffer(0)]],
    const constant QuadFeature& feature [[buffer(1)]]) {
  VertexOutput o;
  o.position = feature.modelMatrix * float4(positions[index], 0.0, 1.0);
  o.color = half3(feature.value, 0.0, 0.0);
  return o;
}

half4 fragment fragmentColored(VertexOutput in [[stage_in]]) {
  return half4(in.color, 1.0);
}

renderers/metal/src/features/QuadFeatureRenderer.cpp draw() definition

void QuadFeatureRenderer::draw() {
  _context.renderCommandEncoder->setRenderPipelineState(_pipelineState);
  _context.renderCommandEncoder->setVertexBuffer(_vertexPositionsBuffer, 0, 0);

  const auto renderables = getRenderables();

  while (_valueBuffers.size() < renderables.size()) {
    _valueBuffers.push_back(_context.device->newBuffer(sizeof(glm::mat4) + sizeof(float), MTL::ResourceStorageModeShared));
  }

  for (int i = 0; i < renderables.size(); ++i) {
    auto renderable = renderables[i];

    if (renderable && renderable->isEnabled()) {
      auto feature = renderable->getFeature<QuadFeature>();

      auto valueBuffer = _valueBuffers[i];
      memcpy(valueBuffer->contents(), &feature->modelMatrix, sizeof(glm::mat4));
      memcpy(static_cast<std::byte*>(valueBuffer->contents()) + sizeof(glm::mat4), &feature->value, sizeof(float));
      valueBuffer->didModifyRange(NS::Range::Make(0, sizeof(glm::mat4) + sizeof(float)));

      _context.renderCommandEncoder->setVertexBuffer(valueBuffer, 0, 1);
      _context.renderCommandEncoder->drawPrimitives(MTL::PrimitiveType::PrimitiveTypeTriangle, NS::UInteger(0), NS::UInteger(6));
    }
  }
}

There's a little bit of grossness involved, since I'm doing some pointer arithmetic. I'm explicitly copying individual components of the feature into the buffer for a couple of reasons:

  • To ensure that I don't screw up the alignment requirements for the Metal shader.
  • QuadFeature inherits from RenderFeature, so its memory layout isn't as simple as the sum of its components.
  • I could first construct a struct that matches what Metal expects, and copy that into the buffer instead. However, that would actually involve more copying. I'd have to profile the application to see which solution works best.

I can't simply convert QuadFeature into a plain old struct without inheritance, since we're currently relying on storing the RenderFeature pointer polymorphically in the Renderable class.

I do have an idea though: I can declare the Metal version of the struct, and statically cast the void pointer returned by valueBuffer->contents() to a pointer to the new struct instead. Then I can copy each component from the RenderFeature to the Metal-specific feature, rather than doing any pointer arithmetic, while also ensuring the alignment is correct.

MetalQuadFeature declaration in renderers/metal/src/features/QuadFeatureRenderer.h

struct MetalQuadFeature {
  simd::float4x4 modelMatrix{};
  float value{};
};

Updated draw() logic in renderers/metal/src/features/QuadFeatureRenderer.cpp

auto valueBuffer = _valueBuffers[i];
auto metalQuadFeature = static_cast<MetalQuadFeature*>(valueBuffer->contents());

memcpy(&metalQuadFeature->modelMatrix, &feature->modelMatrix, sizeof(simd::float4x4));
memcpy(&metalQuadFeature->value, &feature->value, sizeof(float));
valueBuffer->didModifyRange(NS::Range::Make(0, sizeof(MetalQuadFeature)));

I definitely like that better, though I may just revisit the whole idea of features having to inherit from RenderFeature later.

Currently, everything runs exactly as it did before, because the model matrix is set to the "identity matrix", which represents a matrix which, when multiplied, results in the same value that it was multiplied against. It's the matrix equivalent of multiplying by 1, which is essentially a "no-op".

I should be able to perform a translation of the matrix over time to make the quad appear to move. I'll remove our 1-second "convert to triangle" logic, and continuously translate the model matrix up the Y-axis of the world.

Excerpt from linguine/src/Engine.cpp update(float deltaTime) definition

if (_renderable->hasFeature<QuadFeature>()) {
  auto feature = _renderable->getFeature<QuadFeature>();
  feature->value += deltaTime;

  if (feature->value > 1.0f) {
    feature->value = 1.0f;
  }

  feature->modelMatrix = glm::translate(feature->modelMatrix, glm::vec3(0.0f, 0.25f, 0.0f) * deltaTime);
}

Model Matrix

The View Matrix

Using model matrices is all about defining where each object is relative to the center of the world. The next step is to use a view matrix so that we can define where each object is relative to the camera, which itself is positioned relative to the center of the world.

As I mentioned before, we can accomplish this by constructing a model matrix for the camera object, translating it based on the desired position of the camera, and then inverting it. When we multiply this inverted matrix by our existing object's model matrix, the result is a matrix that can adjust the vertices of our model according to the camera's relative position to our object. It really is a little mind-blowing if you've never learned or thought about this stuff before, but it really makes perfect sense mathematically.

There is an alternative method of constructing a view matrix: GLM's lookAt function takes in parameters for the camera's position, the point on which you wish the camera to focus, as well as a vector that defines what "up" means to your camera. This method is so commonly mentioned in OpenGL tutorials and forums that I didn't even realize there was another method. It was only after building a couple of toy game engines that I stumbled across the understanding of inverting the camera's model matrix on my own. You can either perceive that as an enlightened developer making sense of the world around him, or an ignorant kid blindly doing what a tutorial told him without understanding what the code was doing. I'll leave that up to you.

I'll be using the inverse model matrix for my renderer, but there's a very important piece missing from this equation: the camera object. Implementing the camera, like many of the problems I've already encountered, can be as simple or as difficult as you want it to be. Some engines allow the use of multiple cameras to render scenes from several different angles at the same time. A subset of those engines allow each of the cameras to render objects differently. A clever example of this is the minimap of an RTS game, which is actually a texture rendered by an overhead camera. The camera looks down upon the entire battlefield, but rather than rendering the models and textures of each unit, it simply renders colored pixels representing the team with which the units are associated.

In the spirit of getting this thing up and running as quickly as possible, I'll just support a single camera. I'll start by adding a Camera struct, containing the view matrix. I'll add a member variable to the Renderer class to keep the camera around, and a getCamera() method which returns a reference to it.

linguine/include/renderer/Camera.h

#pragma once

#include <glm/mat4x4.hpp>

namespace linguine {

struct Camera {
  glm::mat4 viewMatrix = glm::mat4(1.0f);
};

}  // namespace linguine

Within the QuadFeatureRenderer, I've updated the shader to utilize a new uniform buffer containing the view matrix. This matrix is inside of a MetalCamera struct, because between you and me, I know that I'll be passing in more camera-related data later.

Metal shader code within renderers/metal/src/features/QuadFeatureRenderer.cpp

struct MetalCamera {
  metal::float4x4 viewMatrix;
};

struct MetalQuadFeature {
  metal::float4x4 modelMatrix;
  float value;
};

struct VertexOutput {
  float4 position [[position]];
  half3 color;
};

VertexOutput vertex vertexColored(uint index [[vertex_id]],
    const device float2* positions [[buffer(0)]],
    const constant MetalCamera& camera [[buffer(1)]],
    const constant MetalQuadFeature& feature [[buffer(2)]]) {
  VertexOutput o;
  o.position = camera.viewMatrix * feature.modelMatrix * float4(positions[index], 0.0, 1.0);
  o.color = half3(feature.value, 0.0, 0.0);
  return o;
}

half4 fragment fragmentColored(VertexOutput in [[stage_in]]) {
  return half4(in.color, 1.0);
}

The QuadFeatureRenderer constructor now requires a reference to the camera, which the MetalRenderer can pass via the getCamera() method. The constructor also allocates the new uniform buffer.

Finally, in the draw() method, after activating the vertex buffer, but before iterating over the Renderable objects, I'm copying the camera's view matrix into the new buffer and activating it.

Addition to draw() in renderers/metal/src/features/QuadFeatureRenderer.cpp

auto metalCamera = static_cast<MetalCamera*>(_cameraBuffer->contents());
memcpy(&metalCamera->viewMatrix, &_camera.viewMatrix, sizeof(simd::float4x4));
_cameraBuffer->didModifyRange(NS::Range::Make(0, sizeof(MetalCamera)));
_context.renderCommandEncoder->setVertexBuffer(_cameraBuffer, 0, 1);

Side note: I bound this new uniform to buffer 1, and switched the QuadFeature uniform to buffer 2. I adjusted the call to activate the QuadFeature uniform accordingly.

If I run this as-is, everything is exactly the same as it was before. We'll need to adjust the view matrix to see any real results. So in the Engine constructor, we'll add this bit of code:

auto camera = _renderer->getCamera();
auto cameraModelMatrix = glm::mat4(1.0f);
cameraModelMatrix = glm::translate(cameraModelMatrix, glm::vec3(-0.75f, 0.0f, 0.0f));
camera->viewMatrix = glm::inverse(cameraModelMatrix);

This code expresses the intent to move the camera 0.75 units to the left, which should make our quad appear on the right side of the screen. Let's give it a shot.

View Matrix

The Projection Matrix

The projection matrix is really the piece that brings everything together. You might remember that we defined the vertices for our quad as two triangles, in which every point is 0.5 units from the center of the model. Then why is our quad currently a rectangle instead of a square? Because that's the shape of the window! Up to this point, "1 unit" was half of the surface, and the fact that our window is taller than it is wide means "1 unit" is different depending on which dimension you're referencing.

A camera's projection will adjust the positions of the rendered vertices so that the screen is representative of the objects within the camera's frustum. That, by itself, does not necessarily make our quads perfectly square as we might expect. It depends on how we arrange our projection matrix. There are two commonly used projections: orthogonal and perspective.

Orthogonal projections are well-suited for 2-dimensional games because the camera's near and far planes are the same size, so changes in depth are subtle - objects closer to the camera appear to be the same size as those far away, as long as they are both within the planes of the frustum. Orthogonal projection matrices are constructed by defining the size of the camera's "lens". The resulting image can be thought of as the screen being bound to the edges of that lens. If the lens happens to be the same size as the window, then "1 unit" will appear to be the same vertically and horizontally.

Perspective projections are used in 3-dimensional games because they illustrate depth by using a field of view, given the aspect ratio of the window. Many of us learned in elementary school art class that objects which are close appear to be bigger than objects that are far away - this is precisely what perspective projections convey to the player. Because the projection uses the aspect ratio of the window, "1 unit" will appear to be the same vertically and horizontally, as long as the depth of the object is constant.

For my game (especially during the prototyping phase), I'll just use an orthographic projection. Updating the shader code is somewhat trivial, as is copying the camera's projection matrix into the uniform buffer. By setting the default value of the camera's projection matrix to the identity matrix, I can verify everything is wired up correctly by running the application and witnessing that nothing has changed. If I set the default value of the projection matrix to an empty matrix, then nothing shows up on the screen. This validates that the contents of the matrix are being copied into the uniform buffer and utilized by the vertex shader.

There is at least one mechanism that I need before I can construct the projection matrix: determining the window's size and aspect ratio. To resolve that, I'm just going to make a Viewport class, which will be accessible through a new getViewport() method in the Renderer. In the AlfredoViewDelegate and ScampiViewDelegate classes, we can update the size of the viewport within the mtkView:MTKView* drawableSizeWillChange:CGSize method.

To ensure that the projection matrix is updated whenever the viewport changes, we'll need to update the projection matrix every single frame within the update() method of the Engine:

const auto height = 5.0f;
const auto viewport = _renderer->getViewport();

auto camera = _renderer->getCamera();
camera->projectionMatrix = glm::ortho(
    -height / 2.0f * viewport->getAspectRatio(),
    height / 2.0f * viewport->getAspectRatio(),
    -height / 2.0f,
    height / 2.0f,
    0.0f,
    10.0f
);

The "height" is the vertical height of the camera's frustum - because it's orthographic, this height is consistent regardless of depth. In a perspective camera, the height of the frustum gets larger the further away from the camera you get. The parameters of glm::orth are, in order: left, right, bottom, top, zNear, zFar. We divide the height by 2.0f everywhere so that we can define these edges in terms of distance from the camera's location. Defining zNear as 0.0f allows us to see everything directly in front of the camera's location - something that is actually not possible with perspective cameras, which I don't want to get into right now. Defining zFar as 10.0f means that the camera's frustum will contain all objects up to 10 units away from the camera.

The width is derived from the viewport's aspect ratio so that 1 unit of height appears consistent with 1 unit of width on the screen. That means our quad should now appear to be a square!

Because the height is defined to be 5.0f, that means we should expect to see 5 units worth of objects within the vertical space of our screen. Because our quad is defined to be 1 unit tall, then we should be able to perfectly fit 5 stacked quads within our window.

We could also define our projection matrix in terms of a desired width by dividing the bottom and top parameters by the aspect ratio, instead of multiplying the left and right parameters by the aspect ratio.

Logically, we would expect this code to work - but it doesn't. Luckily, I know exactly why. Remember that Stack Overflow answer detailing the items to look out for when using GLM with Metal? It specifically mentions that GLM projection matrices are calculated assuming that the normalized device coordinates (NDC) will have a depth ranging from -1 to 1, which is how OpenGL works. However, rendering APIs such as Metal and Vulkan use an NDC with a depth ranging from 0 to 1. In order to support Vulkan, GLM added the ability to #define GLM_FORCE_DEPTH_ZERO_TO_ONE to override this behavior - and luckily this would work for Metal as well!

Unfortunately, our current project structure does not lend itself well to platform-specific defines like this. The code that cares about specific projection matrix formats lives in a library named metal, but the code that actually constructs the projection matrix lives in a library named linguine. If I add the compile definition to metal, then it won't actually change the projection matrix that is construction from linguine because these libraries are compiled separately, and linked into the final executable (alfredo or scampi). I could add the definition to linguine, but I don't actually want to do that, because that would introduce a platform-specific detail into its compiled output, which would cause problems if I later wanted to implement an OpenGL renderer, for instance.

Most C and C++ projects that I've worked on are littered with giant #ifdef blocks which behave differently for different platforms, and I absolutely hate that style of programming. It makes more sense in the case of C, which is not object-oriented, and so platform-specific implementations often involve a lot of duplicate code, which somewhat justifies the use of these compile-time conditionals. Since C++ is object-oriented, it's much easier to abstract out platform-specific details like this into specific per-platform implementations, which is what I've attempted to do here.

I should be able to resolve this by adding a compile definition to my root CMakeLists.txt file, since it's responsible for composing the specific sub-project inclusion. I wish I could put it directly into the metal sub-project's CMakeLists.txt using target_compile_definitions(), but that would only propagate the definition up to alfredo and scampi, but not back down to linguine. Therefore, I'll have to do it at the global level. It's not such a terrible thing, since the top-level CMakeLists.txt already knows that Alfredo and Scampi use Metal, due to the compositional nature of the file. While I'm at it, I'm going to set GLM_FORCE_LEFT_HANDED as well, so that I can use my preference of a left-handed coordinate system, rather than GLM's right-handed default.

I'll also need to rearrange the order of adding the sub-directories so that the global definitions get propagated properly.

cmake_minimum_required(VERSION 3.21)
project(linguine)

set(CMAKE_CXX_STANDARD 17)

set(THIRD_PARTY_DIR "${CMAKE_SOURCE_DIR}/third_party")
set(RENDERERS_DIR "${CMAKE_SOURCE_DIR}/renderers")

add_compile_definitions(GLM_FORCE_LEFT_HANDED)

if (APPLE)
    add_compile_definitions(GLM_FORCE_DEPTH_ZERO_TO_ONE)

    if (IOS)
        add_subdirectory(scampi)
    else()
        add_subdirectory(alfredo)
    endif()

    add_subdirectory("${RENDERERS_DIR}/metal")
endif()

add_subdirectory(linguine)

Just like that, our quad is now a square moving its way up in the world! As one last optimization, I'll add a viewProjectionMatrix member to the Camera, and pre-multiply the view and projection matrices together. This way, I can reduce the amount of data copied into the uniform buffer, and reduce the amount of instructions that the shader has to perform for every vertex.

Projection Matrix

4.4 Conveying Depth

Thus far, I've refrained from referring to the game engine as "2D" or "3D". The fact of the matter is that, mathematically, it can support both types of games. I've only drawn 2-dimensional shapes because 3-dimensional objects don't look very good without lighting, unless you specifically make every surface of the shape a different color, which is somewhat tedious and has nothing to do with getting to the prototyping phase.

It can be easy to think that a 2D engine wouldn't require any 3D logic, but in reality, most modern 2D games have some concept of depth to them. Whether that's making sure your player character is always "in front" of your background, or implementing a parallax scrolling effect with layered background images, 2D game engines often utilize the third dimension in some way.

Even though my engine is capable of drawing geometry with varying depth values, it doesn't actually know when to discard fragments which are supposedly "behind" existing fragments. As it is implemented, whichever object is drawn last will always be rendered in front, regardless of their Z values. This "depth checking" does not come for free. To illustrate this, I'll add a blue quad that should be rendered behind the red one.

Obviously my QuadFeature is currently only capable of rendering shades of red. I'll replace its float value with a glm::vec3 color, and adjust my feature renderer to utilize the color in the shader.

In creating a second renderable, I've discovered a bug! In the process of re-writing the Renderer, I neglected to give each Renderable a unique ID in the create() method. The ID was simply hard-coded to 0, so when I created a second renderable with the same feature, the feature renderable thought that it already knew about the renderable with that ID. If I had used a separate feature, then the QuadFeatureRenderer would have incorrectly unregistered the existing renderable. I'll just add a _nextId member to the Renderer class and increment it every time a new Renderable is created.

Now that I can have multiple renderable objects, I'll set the camera's location back to the center of the world, and place the renderables in its vision.

Snippet from the Engine constructor

// Set the camera's position to the center of the world
auto camera = _renderer->getCamera();
auto cameraModelMatrix = glm::mat4(1.0f);

// Update the view matrix with the new camera position
camera->viewMatrix = glm::inverse(cameraModelMatrix);

// Create the first renderable's feature
auto feature = std::make_shared<QuadFeature>();

// Set the position to { 0.0f, 0.0f, 1.0f }
feature->modelMatrix = glm::translate(feature->modelMatrix, glm::vec3(0.0f, 0.0f, 1.0f));

// Set the color to red
feature->color = glm::vec3(1.0f, 0.0f, 0.0f);

// Create the first renderable
_renderable = _renderer->create(feature);

// Create the second renderable's feature
auto feature2 = std::make_shared<QuadFeature>();

// Set the position to { 0.0f, 0.0f, 0.5f }
feature2->modelMatrix = glm::translate(feature2->modelMatrix, glm::vec3(0.0f, 0.0f, 0.5f));

// Set the color to blue
feature2->color = glm::vec3(0.0f, 0.0f, 1.0f);

// Create the second renderable
_renderable2 = _renderer->create(feature2);

In the code, I'm setting the position of the blue quad to be closer to the camera than the red quad, and it shows up correctly. However, if I instead translate the blue quad by glm::vec3(0.0f, 0.0f, 1.5f), then you would expect the red quad to be "in front". This, however, is not the case. In fact, even though the position of the blue quad has changed, the result doesn't look any different than before.

Incorrect Depth Rendering

If I swap the order in which the red and blue quads are created, then the red quad always appears in front of the blue one. This is because my implementation currently stores the renderables in a vector in the order in which they were created, so the order in which they are drawn is very easy to predict.

Solving the Problem

So far, I've only drawn to a single image, which gets presented directly to the user. The solution to these depth issues actually involves "drawing" to an offscreen image, and then configuring the pipeline to read from that image in order to detect if a "closer" fragment was already drawn to that image. I say "draw" in quotes because we're not actually rendering any color data to this offscreen image; we'll be storing a single value representative of how far away the object is from the camera. If you were to want to view this image, you would have to choose what color that single-dimensional data should be rendered as. Depth buffers are commonly viewed in grayscale for debugging purposes, which means the value in the depth buffer is utilized as the red, green, and blue components.

Apple's documentation perfectly lays out the steps necessary to utilize a depth buffer using the Metal API, so this should be fairly simple. Let's see if I can itemize these changes:

  • For the MTL::View:
  • setDepthStencilPixelFormat() (I'll use MTL::PixelFormatDepth32Float for now)
  • setClearDepth() to `1.0f1
  • For the MTL::RenderPipelineDescriptor:
  • setDepthAttachmentPixelFormat() to the same MTL::PixelFormatDepth32Float
  • Construct a MTL::DepthStencilState:
  • Define a MTL::DepthStencilDescriptor with:
    • setDepthCompareFunction() to MTL::CompareFunction::CompareFunctionLessEqual
    • setDepthWriteEnabled() to true
  • In the draw() method for QuadFeatureRenderer:
  • Set the MTL::DepthStencilState after setting the MTL::RenderPipelineState

After doing that, I can now render the red quad in front of the blue quad, even though the blue quad is the last to be drawn.

Correct Depth Rendering

4.5 Rotating Objects

The ability to rotate objects has actually already been implemented, since rotations can be encoded into the model matrix. But since I explicitly stated dynamic rotations as a requirement for this chapter, I'll prove that it's possible.

This process of demonstrating my application fulfilling a specific requirement reminds me (begrudgingly) of a "sprint" within the Agile framework, which concludes with a demo of all the new developments achieved over the course of the sprint. I both love and hate the Agile methodology. I might go into more detail on that later, but right now I'm hyper-fixated on concluding this chapter.

This is all I'm going to add to the update() method of the Engine class:

if (_renderable2->hasFeature<QuadFeature>()) {
  auto feature = _renderable2->getFeature<QuadFeature>();
  feature->modelMatrix = glm::rotate(feature->modelMatrix, glm::radians(90.0f) * deltaTime, glm::vec3(0.0f, 0.0f, 1.0f));
}

That code expresses "rotate 90 degrees about the Z-axis every second". The blue quad will rotate counter-clockwise accordingly. If I wanted it to rotate clockwise instead, then I could set the Z component of the vector to -1.0f.

Rotating Quad

Powering Through

I appreciate you following me during my descent into madness and back. Sometimes developers just have "off" days. Sometimes they have "off" weeks. Unfortunately, I've been having an "off" couple of months, but writing this book has generally been a source of relief. As tedious as it can be at times to document every little thing that I'm doing, it actually helps me organize my otherwise chaotic thought process.

The part of this chapter entitled "There Was an Attempt" is hard to read, and I'm very aware of that. Programming languages are designed for humans to understand them, but that section lacks any actual code. Instead, it just contains my own ramblings about how different classes and methods are wired together, and how the design didn't end up working, I like that I was forced to express the reasoning for my discontent at the time, for the sake of the reader, which ended up being used as proof that the new design was ultimately better.

Frankly, I was questioning the entire premise of writing this book - but I press onward.

The source code for the current state of the application can be found at this commit. I don't like relying so heavily on the dynamic casting of features, but otherwise I'm happy with everything so far. It will be a huge relief when I can stop hard-coding everything into the Engine class, so that's what I intend to tackle next.