Skip to content

Input Management

The thing that distinguishes a video from a video game is the player's ability to interact with the world being shown on the screen. This interaction is most often achieved via aptly-named input devices. Input devices can take on many forms, and I'm sure you are familiar with the most common ones: keyboards, mice, controllers, and touchscreens.

However, inputs are not limited to these common devices. Many other devices exist with the sole purpose of detecting some human action using sensors, and feeding that data into the computer. Microphones, cameras, motion sensors, eye-tracking devices, and much more. Some sensors aren't obvious, such as your phone's accelerometer or its ability to utilize the Global Positioning System (GPS), but can also be used as inputs to a video game.

Furthermore, even the word "controller" is a relatively vague name, since it's just a means for a person to "control" the game. Controllers can take many shapes beyond a typical game pad. Within my lifetime, I've seen controllers shaped like guitars, guns, sticks, nun-chucks, steering wheels - I've even seen controllers you can dance on! The possibilities really are endless, it's just a matter of making the input device feel intuitive to the player. Since these custom controllers are expensive to produce, they were much more common in the era of arcades, in which the player didn't have to directly pay for a custom controller to play a game they might not ever play again.

Nintendo has heavily utilized specialized input devices for their consoles and handhelds for a very long time. The Nintendo DS and Nintendo Wii are obvious examples, but even the original Nintendo Entertainment System could utilize a gun-shaped device to shoot the ducks in Duck Hunt. The Nintendo 64's controller even had a slot that could augment its abilities, such as the Nintendo Game Boy cartridge reader used by Pokémon Stadium, or the microphone used by Hey You, Pikachu!. The Nintendo GameCube allowed connecting a Game Boy Advance directly to the console, and enabled a console game to interact directly with a corresponding handheld game.

These types of input gimmicks are a bit like putting the cart before the horse - they can provide for some neat ways to have fun with a game, but they don't inherently make a game fun just by having them, so it's a bit risky to put resources into developing a custom input device if there's a possibility of making fun games without it.

In our case, we know our target device contains a touchscreen, but typically does not have a keyboard, mouse, or controller (though they can be connected to an iPad). It seems fairly obvious that I'll need to support one basic primitive: where the user is currently touching the screen. From that information, I can likely derive all sorts of touch-based inputs, such as taps, long-presses, swipes, and drags. Since I need to be able to test the game from my computer, I'll also need to implement a mouse-based version of the interface in Alfredo.

By the end of this chapter, I would like to be able to interact with the objects within my game in the following ways:

  • Tapping a rotating quad should reverse its rotational direction (negate its speed)
  • Tapping a rising quad should make it fall (remove the Rising component and add a Falling component)
  • Long-pressing any quad should destroy it
  • Dragging should pan the camera around the world

If I can detect these events, then I can punt on more complicated gesture recognition (such as swipes or geometric patterns) until I actually need them (which I might not). It's nice to have specific requirements in-place beforehand. I have a feeling that I'll struggle with that in the prototyping phase.

6.1 Detecting Touch Events on iOS

At this point in the project, we already have an InputManager interface, a service locator so that scenes can access that InputManager, and systems that are capable of receiving that InputManager as a dependency so that they can detect input events on a frame-by-frame basis. All of the wiring is already in-place, and all I need to do is add methods to the interface, and implement them from the AlfredoInputManager and ScampiInputManager classes. Since our target platform is iOS, I will start with Scampi, and emulate the behavior from Alfredo later.

Since this is the first iOS project I've ever worked on, I legitimately have no idea how to access information about touch events. Obviously I will resort to Google searches and favor results from Apple's own documentation, starting with this page on "Touches, presses, and gestures". It appears that there are two ways of handling touch inputs: manually sorting through UITouch events, or utilizing "gesture recognizers".

There are benefits to both of these options. Gesture recognizers do a lot of the heavy lifting for you, and can easily detect taps, long-presses, drags ("pans", as the documentation refers to them), and swipes. At first glance, it seems obvious that you'd want to take advantage of such an API so that you don't have to do all the complex gesture recognition yourself. It's important to question every option, however. Can these same gesture recognizers be used on macOS? What about Android or other possible future platforms? Will I end up manually implementing my own gesture recognizers within Alfredo just for development purposes? If I do, how different will the game "feel" between the different implementations?

Utilizing the raw touch (or click) events enables Linguine to query the current state of touches and perform its own centralized gesture recognition logic. Even though I would end up implementing it myself, it would be consistent across all platforms.

Wiring it Up

I was able to very quickly identify and log the location of touch events in Scampi.

scampi/src/uikit/ScampiViewController.mm touch events

- (void)touchesBegan:(NSSet<UITouch *> *)touches
           withEvent:(nullable UIEvent *)event {
  for (UITouch *touch in touches) {
    auto location = [touch locationInView:touch.view];
    NSLog(@"began{%f, %f}", location.x, location.y);
  }
}

- (void)touchesMoved:(NSSet<UITouch *> *)touches
           withEvent:(nullable UIEvent *)event {
  for (UITouch *touch in touches) {
    auto location = [touch locationInView:touch.view];
    NSLog(@"moved{%f, %f}", location.x, location.y);
  }
}

- (void)touchesEnded:(NSSet<UITouch *> *)touches
           withEvent:(nullable UIEvent *)event {
  for (UITouch *touch in touches) {
    auto location = [touch locationInView:touch.view];
    NSLog(@"ended{%f, %f}", location.x, location.y);
  }
}

- (void)touchesCancelled:(NSSet<UITouch *> *)touches
               withEvent:(nullable UIEvent *)event {
  for (UITouch *touch in touches) {
    auto location = [touch locationInView:touch.view];
    NSLog(@"cancelled{%f, %f}", location.x, location.y);
  }
}

Because I'm using the iOS simulator using my cursor instead of a real device using my fingers, I'm unable to test whether this works for multiple concurrent touch events. The documentation suggests that views don't receive multi-touch events by default, but says nothing about view controllers. As you can see, the methods used to receive the touch events use a set of UITouch pointers. In theory, any pointer received by the touchesMoved, touchesEnded, and touchesCancelled methods should match up to a pointer that was previously received by touchesBegan.

My InputManager interface currently has a single pollEvents() method, which is unimplemented in the IosInputManager class, containing a completely unhelpful // TODO comment. The reason this interface is oriented around a polling mechanism is to provide stability over the course of a single frame. It's up to the Engine to tell the InputManager that it's ready to receive a new batch of input events so that nothing changes while it's executing the systems for a single frame. Again, in theory, I can aggregate these touch events into a pending state until the Engine calls pollEvents(), which then copies the relevant details into a state that can be exposed by the InputManager. If a frame takes too long to process, then it could completely miss the beginning and end of a touch event. However, since Scampi's game loop is already tied to the platform's event-driven MTKView, it's unlikely to miss anything that the platform itself was aware of.

scampi/src/platform/IosInputManager.h

#pragma once

#include <InputManager.h>

#include <unordered_map>

namespace linguine::scampi {

class IosInputManager : public InputManager {
  public:
    struct TouchEvent {
      float x;
      float y;
      bool isActive;
    };

    void pollEvents() override;

    [[nodiscard]] const std::unordered_map<uint64_t, Touch>& getTouches() const override;

    void enqueue(uint64_t id, const TouchEvent& touchEvent);

  private:
    std::unordered_map<uint64_t, Touch> _active;
    std::unordered_map<uint64_t, TouchEvent> _pending;
};

}

scampi/src/platform/IosInputManager.cpp

#include "IosInputManager.h"

namespace linguine::scampi {

void IosInputManager::pollEvents() {
  for (const auto &entry : _pending) {
    if (!entry.second.isActive) {
      _active.erase(entry.first);
    } else {
      _active[entry.first] = Touch { entry.second.x, entry.second.y };
    }
  }

  _pending.clear();
}

const std::unordered_map<uint64_t, InputManager::Touch>& IosInputManager::getTouches() const {
  return _active;
}

void IosInputManager::enqueue(uint64_t id, const TouchEvent& touchEvent) {
  _pending[id] = touchEvent;
}

}  // namespace linguine::scampi

Simple enough. In the ScampiViewController, I've pulled the shared pointer to the input manager out into a member variable so that the various touches* methods can enqueue() the relevant touch events. In touchesBegan and touchesMoved, the isActive flag is set to true, and in touchesEnded and touchesCancelled, it's set to false. Whenever pollEvents() iterates over the pending events, any inactive events will remove the corresponding event from its active state. In Scampi's implementation, I'm setting the ID of each touch event to the raw memory address of the originating UITouch. These can technically be duplicated over time as memory is re-used, but they will never overlap for concurrent touch events, and that's all we actually care about.

I've created an InputTestSystem and added it to the TestScene. This new class depends upon the InputManager and Logger platform abstractions, which are passed into it using the service locator.

linguine/src/systems/InputTestSystem.cpp update()

void InputTestSystem::update(float deltaTime) {
  for (const auto& entry : _inputManager.getTouches()) {
    _logger.log(std::to_string(entry.first) + " -> [ " + std::to_string(entry.second.x) + " : " + std::to_string(entry.second.y) + " ]");
  }
}

It appears to work as expected:

default 11:31:20.556816-0500    scampi  140222049328048 -> [ 99.333328 : 376.000000 ]

6.2 Detecting Mouse Events on macOS

Alfredo's MacInputManager class already has access to the application events as they happen, including mouse clicks and movements. All we have to do is store the stuff we care about into a std::unordered_map and return a reference to that map from getTouches() so that our systems can access it.

I'm not really a fan of calling these objects "touches" from macOS, since obviously they are mouse clicks. It's kind of hard to come up with good abstractions for input devices. The best possible abstraction would be game-specific, in which the interface exposes the user's intentions, which are then implemented differently for each specific input device.

In this case, we don't currently have a game for which the user can have any intentions, so we're left exposing an interface that works "good enough" on all platforms. Since our primary target is touchscreens, I'll concede to the idea of calling these objects "touches", and justify Alfredo's mouse-based implementation by saying it's an emulation of the real platform. If we get to the point where this abstraction is no longer suitable for the game, we can always introduce a new game-specific platform abstraction.

From within the pollEvents() method of our MacInputManager, I've added a switch statement to handle the different event types (such as NSEventTypeLeftMouseDown). Whenever we detect a left-click "down" event, I can simply construct a Touch object using the locationInWindow property of the event. I'm always using 0 as the ID of the Touch, because I'll never have more than one mouse clicking at a time (using multiple mice on a computer just controls the same cursor). Likewise, whenever a left-click "up" event is received, I'll just remove the Touch from the map.

Handling the left-click "dragged" event is a little weird, because macOS forwards these events even while the cursor is outside of the window. For example, if the cursor is to the left of the window, then the x value of the locationInWindow property will be negative. To make sure the game systems never receive a touch location that doesn't make sense, I'll just break out of the switch statement if the X or Y components are out of the window's bounds.

alfredo/src/platform/MacInputManager.cpp pollEvents()

void MacInputManager::pollEvents() {
  @autoreleasepool {
    NSApplication *app = NSApplication.sharedApplication;

    NSEvent *event;
    while ((event = [app nextEventMatchingMask:NSEventMaskAny
                                     untilDate:nil
                                        inMode:NSDefaultRunLoopMode
                                       dequeue:TRUE]) != nil) {
      switch (event.type) {
      case NSEventTypeLeftMouseDown: {
        auto mouseLocation = event.locationInWindow;
        auto touch = Touch {
            .x = static_cast<float>(mouseLocation.x),
            .y = static_cast<float>(mouseLocation.y)
        };
        _active.insert({0, touch});
        break;
      }
      case NSEventTypeLeftMouseDragged: {
        auto window = [app mainWindow];
        auto frameSize = window.frame.size;
        auto mouseLocation = event.locationInWindow;

        if (mouseLocation.x < 0.0f || mouseLocation.x > frameSize.width
            || mouseLocation.y < 0.0f || mouseLocation.y > frameSize.height) {
          break;
        }

        auto& touch = _active[0];
        touch.x = static_cast<float>(mouseLocation.x);
        touch.y = static_cast<float>(mouseLocation.y);
        break;
      }
      case NSEventTypeLeftMouseUp:
        _active.erase(0);
        break;
      default:
        break;
      }

      [app sendEvent:event];
    }
  }
}

Again, everything appears to work just fine:

0 -> [ 140.582031 : 194.199219 ]

6.3 Object Selection

Other than the fact that the touch identifier is always 0 for Alfredo, there is one major difference between the behavior of the two apps: Alfredo's { 0.0, 0.0 } is in the bottom-left corner, but Scampi's { 0.0, 0.0 } is in the top-left.

In order for our game to behave correctly, we need to be consistent with our coordinate system between the different implementations so that the game systems can make sense of the values. Which coordinate system is the "right" one? If anything, I'd make the argument for the iOS coordinate system, since it's our target device, but what happens if we choose to support a different target later?

Let's think about this for a moment. What do these values actually mean to the game? The game and the Engine don't actually care about the size of the viewport, much less which pixel within it was touched. The only somewhat-related thing they care about at all is the aspect ratio, and that's only used to calculate the projection matrix. It's actually the Renderer that cares about the viewport size, so that it knows how big the on-screen textures need to be. In the case of both Alfredo and Scampi, it's the MTKView which actually allocates these textures - Alfredo tells it how big those textures need to be using the initWithFrame:frame initialization method, while Scampi defines the frame size in its Main.storyboard.

So when the player touches the screen, how do you determine what object they actually selected? There are two common mechanisms for doing so.

The first is a physics-based solution, in which you perform a "raycast" from the selected point into the world to determine the closest object that exists on that ray. In order to perform the raycast, you first need to know the ray's starting point in world space, which involves multiplying the screen coordinates through inverted matrices in the reverse order of the rendering pipeline. The rendering pipeline, after all, is just converting positions in world space into screen space!

The second solution utilizes the renderer, which renders all selectable objects using a color based on the entity's ID. Whenever the user touches the screen, you can use the image rendered during the previous frame and select the specific pixel that the user touched in order to determine the ID of the object at that location.

The physics-based solution is generally more performant. The CPU is really good at doing all of these mathematical operations very fast. It does, however, rely on one critical piece that I haven't bothered to implement yet: a physics system!

Meanwhile, we do have a rendering system, but submitting all of our geometry to yet-another rendering pipeline will effectively double the cost of the renderer, which is already the most expensive part of our engine (after optimizing the entity querying system). One of the biggest benefits of the rendering-based solution is that it can be incredibly accurate even for insanely complex geometries.

We can always implement the physics-based solution in the future, if the rendering-based solution limits us for some unknown reason, but I really don't anticipate that happening.

Updating the Renderer

Currently, our renderer supports rendering two types of "features" - quads and triangles. These are actually not features at all, and were only used to illustrate the extensibility of the renderer's design. The problem with separating features based on the geometry being rendered is that a single geometry might need to be rendered by multiple different features. I can't currently make a new SelectableFeature - I'd instead have to make separate SelectableQuadFeature and SelectableTriangleFeature renderers.

To reconcile this, I'm going to merge the quad and triangle feature renderers into a single ColoredFeatureRenderer. The ColoredFeature will contain a model matrix (as the QuadFeature does currently), but will also contain some sort of mesh type to indicate which geometry should be drawn.

The QuadFeatureRenderer and TriangleFeatureRenderer classes are currently responsible for creating the vertex buffers for their geometries, but that will need to be extracted out into a mesh "registry" of some sort, so that the ColoredFeatureRenderer and future SelectableFeatureRenderer can both draw the same geometries.

I created a Mesh interface within the Metal-specific renderer module which contains a single bind() method, and created QuadMesh and TriangleMesh implementations of it. A new MeshRegistry can retrieve mesh pointers using a MeshType enum, which currently just has Quad and Triangle values. It's unlikely that my tiny game will need much more than that, but even if I had a super complex mesh stored in some file format, I could create a derived Mesh type that loaded the mesh from a file, and add it to the registry.

I renamed the QuadFeatureRenderer to ColoredFeatureRenderer, added the MeshRegistry as a dependency, and removed the internal logic to create the quad mesh. I deleted the TriangleFeatureRenderer because it's no longer used. Now our objects can dynamically choose which mesh to use.

renderers/metal/src/features/ColoredFeatureRenderer.cpp draw() snippet

auto feature = renderable->getFeature<ColoredFeature>();

auto& mesh = _meshRegistry.get(feature->meshType);
mesh->bind(*_context.renderCommandEncoder);

Finally, I removed all the quad-specific logic from the engine systems, and updated the TestScene to create a 50/50 split of quads and triangles.

ColoredFeatureRenderer

I actually just realized that this abstraction is drawing 6 vertices even for triangles (because I modified the QuadFeatureRenderer, and quads have 6 vertices). So I'll need to add a draw() method to the Mesh interface so that I can properly extract that detail from the ColoredFeatureRenderer. I'm a little surprised nothing broke because of that, but I'm glad I happened to see the error.

Now for the tedious task of adding the new SelectableFeatureRenderer. This feature renderer will contain its own render pass, as well as its own "attachment" (a fancy word for a texture to which it will draw colors). As you might recall, we didn't actually create any attachments for the initial renderer. The MTKView did that for us, and I've been pleasantly surprised with how far its gotten us. The new attachment will utilize a different pixel format, suitable for storing entity IDs within each pixel. Since our entity IDs are 64-bit unsigned integers, I'll need a format that is capable of storing 64 bits within a single pixel. These formats are oriented around a certain number of bits per color channel, as well as what type of number is represented by those bits. Since my entity IDs are integers, I don't want to represent them as floating-point numbers, or else I might run into precision problems while parsing the selected ID out of the image. Metal doesn't have a format suitable for storing 64-bit integers into a single color channel, but it does have PixelFormatRG32Uint, which stores 32-bit unsigned integers within the red and green channels. I can easily convert our uint64_t entity ID into a simd::uint2 vector type and back again, so that's what I'll use.

After spending some time writing shaders that are suitable for rendering the entity IDs, I now need to implement a new SelectableFeature class that this feature renderer can utilize. Unfortunately, with the way things are currently implemented, I'll have to duplicate some data and create separate Renderable objects to represent a single entity's visual representation and its off-screen selectable representation. The SelectableFeature contains a model matrix and MeshType, just like the ColoredFeature, but rather than a color, it contains the entity's ID. The way this looks within our TestScene is pretty ugly, but I'll tolerate it for now.

auto drawable = entity->add<Drawable>();
drawable->feature = std::make_shared<ColoredFeature>();
drawable->feature->color = glm::vec3(normalDist(random), normalDist(random), normalDist(random));

auto selectable = entity->add<Selectable>();
selectable->feature = std::make_shared<SelectableFeature>();
selectable->feature->entityId = entity->getId();

if (componentDist(random) > 0) {
  drawable->feature->meshType = Quad;
  selectable->feature->meshType = Quad;
} else {
  drawable->feature->meshType = Triangle;
  selectable->feature->meshType = Triangle;
}

drawable->renderable = renderer.create(drawable->feature);
selectable->renderable = renderer.create(selectable->feature);

This does allow the selectable shape to differ from the visual representation, which might come in handy, but it would be much cleaner if they could share some of that information. Oh well, maybe I'll come back to that later.

I've exposed a new getEntityIdAt(float x, float y) method from the Renderer interface. I don't love it, but at least it makes sense. I've decided that I want the values for that method to be normalized floating-point values within the range of 0 to 1, where [0, 0] is the bottom-left corner. For Alfredo, I just need to divide the X and Y coordinates in its MacInputManager by the view's width and height, respectively. For Scampi, however, since it's origin is in the top-left corner, I'll need to subtract its normalized Y value from 1.0 in order to invert it.

Within the MetalRenderer implementation of the new method, I actually need to invert the Y value again, because Metal textures have their origin in the top-left as well. Woops. At least I have defined a consistent standard for my engine!

At this point I can multiply the normalized coordinates by the size of the texture in order to retrieve a pixel at a specific location. Metal's way of doing so is the getBytes method of the MTLTexture, which relies on the texture being accessible by the CPU, which shouldn't be a problem as long as I create the texture using StorageModeShared.

Since I don't actually have a good way to know what size to make the texture or when to resize it, I've updated the drawableSizeWillChange method of my ViewDelegate implementations for both Alfredo and Scampi to invoke a new resize() method on the base Renderer. This method is now responsible for resizing the Viewport object, but it also invokes equivalent resize() methods on any feature renderers. Now I can move my texture creation logic into the SelectableFeatureRenderer's resize() method.

Finally, I can update the InputTestSystem to get the entity ID at the selected location and log it out. After a little bit of additional plumbing in the MetalRenderer, I expected this to work, but it doesn't. For whatever reason, only the center of the screen yields any results, and that result is always 10,000 - the number of entities that I've been creating in my TestScene. It took me an embarrassingly long amount of time to realize that my SelectableFeature contains its own model matrix, which I was not updating with my positions or rotations! All of my entities were being drawn directly in the middle of the screen.

But why is it always 10,000? Well, because I haven't implemented any depth testing into this new render pass, the entities are drawn in the order that they were created, with 10,000 coming out on top.

I spent a frustratingly long amount of time trying to reuse the depth buffer from the colored render pass, but because the vertex positions are being re-calculated by the selectable feature's shader, slight differences in the floating-point precision of the calculations between the two renderers were yielding very inconsistent results. Technically, I could modify the colored feature's shader to render to both the visual texture, as well as the off-screen entity ID texture, but for now I've just decided to create a separate depth buffer altogether, so that the selectable feature can remain entirely separate.

Okay, now it works. The InputTestSystem can access the ID of the entity that the user selected, but currently has no way of retrieving a reference to the actual Entity. I added a getById() method to the EntityManager, as well as an equivalent getEntityById() method in the base System class, which just forwards its call to the EntityManager.

As a final test before I move on, I'll set the color of any selected entity to red.

linguine/src/systems/InputTestSystem.cpp

#include "InputTestSystem.h"

#include "components/Drawable.h"

namespace linguine {

void InputTestSystem::update(float deltaTime) {
  for (const auto& entry : _inputManager.getTouches()) {
    auto id = _renderer.getEntityIdAt(entry.second.x, entry.second.y);

    if (id.has_value()) {
      auto entity = getEntityById(id.value());
      auto drawable = entity->get<Drawable>();
      drawable->feature->color = glm::vec3(1.0f, 0.0f, 0.0f);
    } else {
      _logger.log("No entity found");
    }
  }
}

}  // namespace linguine

These GIF file sizes are getting unwieldy, so let's give WEBP a shot.

Entity Selection

Frame rates are definitely much lower using this technique, down to around 180 FPS on my laptop. This is still a stress test though, so I'm not particularly worried about it.

6.4 Gesture Recognition

Touch (and mouse) inputs are typically broken down into sub-phases, which is pretty obvious from how Apple's APIs are defined:

  • The initial "down" event, when the user first touched or pressed the mouse button
  • Intermediary "drag" or "move" events, indicating that the user is still holding their finger or mouse button down, but has moved their finger or cursor
  • A final "up" event, indicating that the user has released their finger or mouse button
  • Some APIs, such as Apple's, further provide a "canceled" event, indicating that the gesture was interrupted. This allows your code to abandon any input-based logic rather than treating it the same as an "up" event. In this case, the application will never actually receive an "up" event.

A gesture is a repeatable sequence of these events to which an application can assign meaning. Gestures can vary greatly in complexity, ranging from simple "taps" to intricate shapes drawn across the screen. As I defined at the beginning of the chapter, I just want to support some relatively simple gestures to get us into the prototyping phase: taps, long-presses, and drags. To achieve this, I'll be creating a new GestureRecognitionSystem, which is responsible for making sense of the raw inputs exposed by the InputManager.

The GestureRecognitionSystem needs a little bit more information from the InputManager before it can do its job effectively - information that we previously had, but got rid of because we didn't need it yet! I'm going to refactor Alfredo and Scampi's implementations to provide a "state" of the touch, in order to distinguish between Down, Hold, and Up events on a frame-by-frame basis. The previous implementation only provided a binary state: is the user touching the screen or not? With the new implementation in place, we can log the touch position on the down and up events rather than every single frame.

linguine/src/systems/GestureRecognitionSystem.cpp

#include "GestureRecognitionSystem.h"

namespace linguine {

void GestureRecognitionSystem::update(float deltaTime) {
  for (auto& touch : _inputManager.getTouches()) {
    if (touch.second.state == InputManager::Down) {
      _logger.log("Down: [" + std::to_string(touch.second.x) + ", " + std::to_string(touch.second.y) + "]");
    } else if (touch.second.state == InputManager::Up) {
      _logger.log("Up: [" + std::to_string(touch.second.x) + ", " + std::to_string(touch.second.y) + "]");
    }
  }
}

}  // namespace linguine

In order to implement the simple gestures that we want, we'll need to actually define what they actually mean.

  • A "tap" is when the an "up" event occurs within some duration of its corresponding "down" event, and the positions of the two events are reasonably close.
  • A "long press" is when there is no corresponding "up" event within the maximum duration for tap, and the current position of the touch is reasonably close to the originating "down" event.
  • A "drag" is when the position of a touch changes significantly within some duration of the originating "down" event.

These definitions are intentionally vague so that the values can be tweaked to achieve the best "feel" - a term that is annoyingly common in game development. Still, the terms lack some clarity in regards to their intended purpose. The meaning (and therefore the implementation) of a gesture can change depending on its scope. For example, we have declared that we would like to be able to pan the camera around using drag gestures, but there is no specific object on the screen from which the gesture should originate - the target of the gesture is the entire screen. On the contrary, we could have said that we would like to be able to move our objects around the world using the drag gesture - the target of the gesture would then be a specific object on the screen.

I'm being long-winded, but I promise I have a point. The InputManager is completely agnostic to the design of the game. Its job is to tell you when the user is touching the screen. The GestureRecognitionSystem, on the other hand, should be implemented using game-specific details. I don't need to build a fully generic gesture recognition library capable of forwarding events to different target objects - I just need to build one that does what I want it to do.

The biggest problem I can think of is that our objects are actually moving, which makes long presses difficult to track. If we were already building out the actual game, it would be fairly obvious not to require the user to long press on moving objects. In order to prove out the flexibility of our input handling, I'll go ahead and implement long pressing on any object as I originally stated.

Taps

I'll start by detecting when an object has been tapped. On the "down" event, I'll store the current time, which entity was selected, and the screen position of the event. Then in the "up" event, I'll determine how long it has been since the "down" event occurred, as well as the distance between the screen positions. If the duration and distance are under some constant thresholds (I'll use 0.5 seconds and 0.05 normalized screen units), then I'll add a new Tapped marker component to the entity that was originally selected. I'll also remove the metadata from the internal map.

Other systems can simply query for relevant entities that contain the Tapped component in order to react to tap gestures. At the beginning of the GestureRecognitionSystem, I'll query for all Tapped entities and remove their Tapped components to clean things up. With this in place, I can implement the first two requirements!

Tapping a rotating quad should reverse its rotational direction (negate its speed)

linguine/src/systems/RotatorSystem.cpp

findEntities<Rotating, Tapped>()->each([](const Entity& entity) {
  const auto rotating = entity.get<Rotating>();
  rotating->speed = -rotating->speed;
});

Tapping a rising quad should make it fall (remove the Rising component and add a Falling component)

linguine/src/systems/RiserSystem.cpp

findEntities<Rising, Tapped>()->each([](Entity& entity) {
  const auto speed = entity.get<Rising>()->speed;
  entity.remove<Rising>();

  auto falling = entity.add<Falling>();
  falling->speed = speed;
});

As you might have guessed, I also added a new FallerSystem, which just makes entities containing the Falling component move downward.

I actually discovered quite a few bugs within the entity management system as part of this exercise.

  • Adding or removing a component from an entity while iterating over the entity's of an archetype would invalidate the internal entity ID iterator, since the entity would have switched archetypes. To fit it I have to allocate a stack variable to copy the entity IDs before iterating over them.
  • The archetype graph query was not considering parents of nodes. Depending on the order of an entity's composition, it's entirely possible for an archetype to not be a child of any other archetype, and would therefore never be queried!
  • Even after fixing that, my "optimization" of recursively adding children once I found a matching child actually resulted in another path for parent nodes to be skipped entirely. I removed the optimization, and frame rates have actually not been affected by it.
  • The ArchetypeEntityManager had an actual copy/paste bug in which, upon removing a component, the new archetype for an entity would contain itself as a child instead of the old archetype for that entity.

Long Presses

While I'm in the area, I actually need to add support for destroying an entity. Just like the underlying Store class, I'll keep around a std::queue<uint64_t> containing indices that are no longer being used. I'll modify the create() method to check for any used indices in the vector of entity archetypes before adding one to the end if none are available. The new destroy() method simply moves the entity to the root archetype and adds the entity's ID (which also happens to be its index) to the queue.

This technically works, but there's one major issue: there's no way to clean up resources which were created for that entity, such as its Renderable. Simply destroying the entity just abandons the reference to those resources, but doesn't do anything to clean them up. The Renderable still exists within the renderer and will continue to get drawn. The way that this currently manifests itself in the game is that objects stop moving (since they are no longer being processed by the systems that make them move), but they just freeze in place indefinitely with no way to interact with them (because the renderer doesn't know that it should stop rendering it).

I'm going to add a handy little setRemovalListener() method to the Component<T> class. Accessing any field of a component uses the pointer syntax (component->field), but adding a removal listener will use the regular accessor syntax (component.setRemovalListener(...)). The new method, like everything else we do with the Entity/Component architecture, will forward the request to the underlying EntityManager. I've refactored the ArchetypeEntityManager to store an EntityMetadata struct for each entity, rather than just the current archetype. The EntityMetadata struct contains the current archetype, as well as a std::unordered_map of removal listeners for various component types. Finally, I added a simple lookup into the map whenever a component is removed, and invoke any function that happens to be stored there. In the new destroy() method, I'm just iterating over the current archetype's types and looking up the removal listener for each type. The biggest downside to this is that you can only have a single removal listener per component type per entity (otherwise I would have named the method addRemovalListener). I only need a single listener for now, so I'll roll with it and refactor later if I need more. Using this new method is pleasantly simple:

drawable->renderable = renderer.create(drawable->feature);
drawable.setRemovalListener([drawable](const Entity e) {
  drawable->renderable->destroy();
});

Come to find out, I never actually added a destroy() method to the Renderable class either! I quickly whipped up the method, which calls an onDestroy() method on the Renderer, which, in turn, triggers an equivalent method on each of the feature renderers.

The FeatureRenderer base class used to store a vector of Renderable pointers, and a separate map containing the indices for each ID. I realized that I was never updating the map of indices whenever I removed a pointer from the vector, so this has been broken for a while and I just haven't noticed, simply because I haven't been removing and adding features from renderables very often. I've refactored the class to just store a map of pointers by their IDs - this will provide for faster lookups by ID, and we weren't gaining much by using a vector anyway, since it was storing pointers, which would end up ruining cache lines.

I've been heavily utilizing "smart" pointers. The Renderer returns a std::shared_ptr<Renderable>, and each renderable contains a std::shared_ptr<RenderFeature>. These pointers are stored internally in the renderer, but I've been storing them within components as well, which increases their usage count. I'm actually not sure how this works with my low-level component storage system constantly using memcpy to move things around. In theory, there can be multiple instances of a shared pointer all managing the same underlying pointer. Rather than figuring out exactly how this is working, I'm just going to make a new rule for my engine: components may not contain smart pointers.

To achieve this, I need to make it clear that the Renderer will own any memory by passing in std::unique_ptr<RenderFeature> objects. Furthermore, the Renderer now manages its own internal map of std::unique_ptr<Renderable> objects, but only ever returns the raw pointers to the application. I'll also return the Camera and Viewport by reference instead of a std::shared_ptr<T>. The HasCamera component was storing a shared pointer to the internal camera, but that's not really necessary, since any system can just access the camera by injecting the Renderer. I've renamed the HasCamera component to CameraFixture, which contains no data. Its only purpose is to be queried by the CameraSystem so that the camera can be updated using a Transform on the same entity.

Drags

The finally stretch - panning the camera around using drag gestures. As I mentioned before, handling screen drags is somewhat of a global process, rather than local to a specific object on the screen. You could make the argument that the target of the gesture is the camera itself, but the user is not actively selecting a camera object.

I had previously inserted log lines into my GestureRecognitionSystem as placeholders for what I believed to be drag events, and I wasn't too far off, other than some edge cases. The actual camera movement logic is contained within the gesture system itself, rather than the gesture system adding some arbitrary component, which is queried by an external system. It makes sense to add components to selectable entities from the gesture system, but doesn't make as much sense when the gesture represents a global action.

The actual camera movement code is rather simple:

linguine/src/systems/GestureRecognitionSystem.cpp updateCamera()

void GestureRecognitionSystem::updateCamera(glm::vec2 direction, float deltaTime) {
  findEntities<CameraFixture, Transform>()->each([this, direction, deltaTime](Entity& entity) {
    auto cameraFixture = entity.get<CameraFixture>();
    auto transform = entity.get<Transform>();

    transform->position += glm::vec3(direction, 0.0f) * _inputManager.getSensitivity() * cameraFixture->speed * deltaTime;
  });
}

Take note of the new getSensitivity() method on the InputManager. I found that Alfredo and Scampi behaved rather differently, so I decided to make a platform-level abstraction that just returns a magic number to adjust the sensitivity of movement gestures - so far this is only drag gestures. Alfredo returns 1350.0f while Scampi only returns 300.0f. It's a drastic difference, but then again, I've only tested Scampi through the iOS simulator so far.

One of the edge cases I ran into was when a drag gesture was already initiated, but the cursor wasn't moving for some time, making the dragged distance for that frame a length of 0. I was attempting to normalize the distance, which resulted in division by 0. I got rid of the normalization, since I want the camera to move faster for quicker gestures anyway, but I decided to filter out events where the cursor didn't move to prevent the entity query.

Another edge case that I think I've fixed (though I can't be sure) is that iOS devices can support multi-touch, so it was possible for multiple fingers to be dragging the camera in different directions at the same time. For now, I've added a std::optional<uint64_t> containing the current drag gesture's touch ID, and if it's already set, I ignore the initialization of a new drag event. Only one drag at a time! At least for now, I can rest easy knowing that it doesn't break anything on my locally-running applications.

Gestures

A Quick Breather

Implementing these inputs has been a wild ride. I don't think a single part of the engine has remained untouched throughout this chapter. At the beginning of the chapter, I specifically defined the requirements such that tapping on a rotating object should just be a simple mathematic negation, but tapping on a rising object would flex the capabilities of the entity system. I found some pretty critical bugs in the entity management architecture as a result of that distinction that I'm very happy to have fixed. The render feature abstraction has already proved to be useful when implementing new GPU-based functionality. We've also added some new engine features like the ability to destroy entities, and listen for the removal of components to properly clean up resources.

However, there's still one more thing I'd like to do before concluding this chapter.

6.5 Deploying to an iPhone

The number of times that I've pondered how the app will react to multiple concurrent touches without the ability to actually test it has been driving me nuts. Let's see how hard it actually is to get this thing running on my phone.

Apple's documentation is somewhat useless here, since I'm not using Xcode directly. It was actually this Stack Overflow answer that helped me figure out what I needed to add to my CMake configuration. Unfortunately it is entirely unclear how you find your "team ID" if you haven't officially signed up and paid for Apple's developer program.

I created a new CMake profile within my IDE named "Release-iOS", and set the PLATFORM flag to OS64, as per the documentation for ios-cmake. At first, I attempted to build the project without setting a team ID at all, to no avail. The Stack Overflow post mentioned a command used to find the "identity", whatever that means:

$ /usr/bin/env xcrun security find-identity -v -p codesigning

This returned a couple of IDs, along with the name "Apple Development" and my email address. Perhaps one of these is my team ID? One of them resembles the "code sign identity" mentioned in the post, while the other one certainly looks like a team ID. Trying both resulted in no luck.

The build gets far enough to generate an Xcode project, so let's try to open the project in Xcode and see what we can figure out. In the "Signing & Capabilities" section of my project settings, it claims that the team ID is invalid, but allows me to select my name and email from a dropdown. Doing so appears to resolve any errors. Back in CLion, if I reload the CMake configuration, the Xcode project breaks again. That makes sense, considering both IDEs are dealing with the same files.

After about an hour of fiddling around in both IDEs, I decided to resolve the issue within Xcode, but then build the project from CLion without reloading the CMake configuration. This actually worked! Scrolling through the build output, I can see another cryptic ID that resembles the format of a team ID, aptly labeled com.apple.developer.team-identifier, within a section named "Entitlements". Plugging that value into my CMake configuration and reloading seemed to do the trick. I can now successfully build and sign my iOS application from CLion, but how do I actually deploy it to my phone without using Xcode?

I stumbled upon yet another Stack Overflow answer which mentions a project named ios-deploy, which I've installed via Homebrew. Installing the app from the command line seemed simple enough, but I had to enable "developer mode" on my phone and restart it before the command would succeed. Even then, my phone complained that the developer of the app was untrusted, and I had to dig through the iOS settings to find the deeply hidden option to trust myself. Finally, the app was successfully installed, but it crashed immediately.

I used the Console application to dig through logs on my phone. Apparently the didModifyRange() method for MTL::Buffer objects is not supported on a real device, even though it worked fine from the simulator. The documentation states that this method is only actually required for MTLStorageModeManaged buffers. Since I'm currently using MTLStorageModeShared for everything, I think I can safely remove the call altogether.

Finally, the application is successfully running on my phone! This is the first time I've run an engine that I build from scratch on anything other than a desktop OS, so this is a pretty big achievement for me. Still, there are some things I'd like to fix before relishing in my success.

First of all, multi-touch is clearly not working, even though I was careful to support it in the code. The documentation claims that you need to set the multipleTouchEnabled of your view to true. My touch events are handled within a UIViewController rather than the view itself, so I assumed that wasn't actually a requirement. In any case, it's a simple change to test. After rebuilding and reinstalling, the change seemed to work! I can perform multiple gestures at the same time - for example, I can begin long pressing a quad, but tap a triangle to change its direction before the quad disappears under my first finger. I can also long press multiple objects at the same time and watch them all disappear in tandem. Pretty cool! Until the app crashes.

Lucky for me, ios-deploy gives some pretty useful debug information, and even attaches an LLDB debugging client to the app so that I can dig into it further - too bad I don't know it well enough to use it effectively like that. The crash information it provided was sufficient enough for me to find the issue though: if a long press gesture contained no object, then I was trying to parse an ID out of the empty optional returned by the renderer. A simple fix, and I can't seem to reproduce the crash anymore.

Rather than having to click the "build" button in CLion, and switch over to my terminal window to deploy the app every time I want to test a change, I'm going to modify Scampi's run configuration inside of CLion to invoke ios-deploy. I'll leave the target set to scampi, but set the executable to /opt/homebrew/bin/ios-deploy. After some fiddling around, these are the program arguments I came up with:

--justlaunch --bundle $CMakeCurrentLocalBuildDir$/$CMakeCurrentBuildTypeName$-iphoneos/scampi.app

--justlaunch makes it so that the LLDB session is ended as soon as the app starts up, and the complicated variables just make it so that this command works for any CMake profile that I make for Scampi. If I need to get deeper with LLDB, I'll start fresh from the command line and integrate it with my IDE when I'm more comfortable with it. If I use --debug instead of --justlaunch, my app's logs (at least the ones that I'm printing directly) appear in the IDE's console, which is pretty neat. I'll think about the pros and cons of using that instead of the Console app and probably just tweak it as I need it.

Now that I can run the app on my phone directly from CLion, I'll tackle the next "issue": the logs say that my app is only running at 60 FPS, but I know this phone is capable of refreshing its screen at 120Hz. I remember the MTKView allowing you to set its preferredFramesPerSecond property, but setting it to 120 seems to have no effect. Evidently I need to set the CADisableMinimumFrameDurationOnPhone property to true in my Info.plist. Doing so does the trick - Scampi is rendering 10,000 objects at 120 frames per second on a mobile device. Not too shabby!

Honestly, the dragging logic is very glitchy, and the camera tends to get "stuck" if you move it around too fast. I don't think I actually care right now though. This chapter was never about polish, it's about creating the building blocks that will enable the prototyping phase later. I don't even know if I'll need a gesture-based camera panning system, so why should I spend time working out the kinks?

I will say that it's oddly satisfying to just destroy the objects. I let my 13-year-old poke around at it, and she managed to destroy at least 50 objects without getting bored. I wish I could say there's some spark there that could turn into something more, but really it's just Jerry's Game from Rick and Morty.

Jerry's Game

This commit is where I'm at now. It's been exactly one month since I first rendered a triangle to the screen. I can't say my progress is particularly fast, but considering how limited my schedule is, I'd say I'm doing pretty well.