Zelda, Mario and Donkey Kong fighting in Super Mario Smash Brothers

Synchronous RTS Engines and a Tale of Desyncs

July 9, 2011

Have you ever played a game like Starcraft or Supreme Commander and gotten an error message that says “Desync Detected” followed by the game closing? Do you wonder what that means? It stems from certain engine architectures commonly used by RTS games1.

My experience in this area comes from working with the Supreme Commander engine at Gas Powered Games. Starcraft and Warcraft 3 have had desync bugs during beta periods so it’s safe to say they work in a similar manner. For simplicity’s sake I’m going to discuss the SupCom engine specifically from this point forward. Finding similarities between the SupCom engine and other games shall be left as an exercise for the reader.

Requirements

First things first, what are the requirements for our game? To help give an idea, here’s the announcement trailer for Supreme Commander 1 (2006).

It must support 8 player multiplayer, on the internet, with hundreds of units per army. That’s thousands of units in a single game. Holy crap. The typical FPS client-server architecture will clearly not work. With so many units it would require multiple orders of magnitude more bandwidth than is acceptable.

How can we accomplish this feat?

Synchronous Engine Architecture

With a fully synchronous lockstep architecture! In a synchronous engine every client executes the exact same code at the exact same frame rate. Let that sink in for a moment. In an 8 player game of SupCom every player has an identical copy of the game state and follows an identical code path. Instead of transferring over per unit state information (position, health, etc) over the network only player input needs to be sent across the networks. If all players have an identical game state and process the same input then their output state should also be identical.

It’s the same principle as instant replays in many games, including shooters. Have you ever wondered why file sizes of instant replays are so tiny? It’s because the replay file only needs to store player inputs. Simply re-run the game feeding the inputs from the replay file and you’ll get the exact same result. This is also why replays stop working when the game is patched and why you frequently can’t rewind them. It’s also why some RTS games don’t allow join in progress. For a player to join mid-game the entire game state would have to be sync’d. If the game has 3000 units that’s just too much.

Layers

Take a look at the video if you haven’t already. What frame rate do you think the game is running at? The correct answer is 10 frames per second. Wait, what? It looks far smoother than 10 fps you say! It is and it isn’t. The game is actually running at two separate frame rates.

The SupCom engine uses two layers — simulation and user. The simulation layer runs at a fixed 10 fps all the time. This is what you could consider the “real game”. All units, all AI, and all physics are updating within a SimTick function running at a mere 10 hz. Each SimTick needs to run within 100ms or the game will play in slow motion. In a multiplayer game if one player is unable to fully process the SimTick in 100ms then all other players can become held up and have to wait wait.

The user layer runs at full frame rate. This layer can be thought of as strictly visual. User interface, rendering, animation, and even unit position can run at a silky smooth 60fps. Each UserTick updates at a variable time delta which is used to interpolate the game state (such as unit positions). This is why the game can look and feel smooth when it’s secretly slow in the background.

Determinism

Hold on a moment the clever readers cried! If each player is independently updating the game state does that mean the game simulation must be fully deterministic? It sure does. Isn’t that hard? Yep. It’s even more difficult in the modern world of multi-threading.

A lot of pain in creating a deterministic game stems from floating point numbers. Rather than cover this topic in-depth I refer readers to Glenn Fiedler’s fantastic post on the subject — Floating Point Determinism.

In the comments Elijah specifically discusses Supreme Commander. Setting the cpu to strictly follow the IEEE754 standard gets the job done. It comes with a performance cost and the game can never perform an operation with an undefined result, but you shouldn’t be doing that anyways now should you?

Inherent Latency

There are some distinct downsides to a synchronous multiplayer game. Aside from the complexity of creating a massive deterministic simulation there is also some required latency on input. I went over how each user in a multiplayer game is updating an identical game state and they only need to process input. This means for any new piece of input it can’t be processed until all clients agree on which tick to process it!

For example three players — player A, B, and C — are all running SimTick 1. During this time Player A issues an attack command. The UI instantly flashes in response as UserTick updates at 60hz. In a single player game this attack command would be processed in SimTick 2 (0–100ms latency). However all three players must execute the command during the same SimTick to get the same results. Instead of attempting to process the command on SimTick 2 Player A sends a network packet to Player B and Player C with data to execute on SimTick4 (200–300ms latency). This gives enough time for all players to receive the command. The game may be forced to stall if input information is not received and/or acknowledged in some form. I don’t know what that mechanism was exactly for SupCom, but I’ll update this post if I can find out. The exact number of SimTicks into the future to execute a command can be dynamically determined based on the peer-to-peer topology.

The latency from player click to unit response is always going to be at least 0–100ms (the next SimTick). This can be masked in a few ways. The UI response, usually something flashing, is immediate. There is frequently an audio response as well. “My life for Aiur!” “Zug Zug”

In a single-player game this is fine, but in multiplayer it can become noticeable as the delay is likely several hundred milliseconds. I always wanted to experiment with immediate UserTick animation responses. For example if you issue a move command the unit could start slowly moving in the user layer immediately and then blend into it’s “true” sim layer location when the command actually executes. This would be extra helpful to more twitchy games such as Demigod or DOTA. There are some pretty ugly edge cases to handle though so I’ve never had the chance. I’d love to hear what other folks have done in the comments.

Desyncs — The Bugs from Hell

One of the most vile bugs in the universe is the desync bug. They’re mean sons of bitches. The grand assumption to the entire engine architecture is all players being fully synchronous. What happens if they aren’t? What if the simulations diverge? Chaos. Anger. Suffering.

In SupCom the entire game state is hashed once per second. If any clients disagree on the hash that’s it. Game over. The end. An error box pops up that says “Desync Detected” and you have to quit. Something in their SimTicks varied and now the games are different. They have diverged and they will only get further apart from this point on. There is no recovery mechanism.

Desyncs are usually programmer error. A desync may repro 5% in games lasting more than 60 minutes. Fixing a desync generally involves a binary search of printf-ing the current memory hash as the state is walked. On low repro-rate desyncs this usually leads to a massive spamming of the hash while a half dozen machines loop the game as fast as they can waiting for it to break. Adding insult to injury, one of the most common cases is an uninitialized variable.

A Demigod Tale

A lot of my work with the SupCom engine was actually on Demigod, which used a modified version of the engine.

Near the end of development there was a long-standing, but highly infrequent desync, that was handed to me. In Demigod there are dozens of tiny fodder that run across the map. On extremely rare occasion the location of a fodder would vary by a few centimeters on different machines. Sounds innocuous but like the wings of a butterfly a hurricane of woe can ensue.

I distinctly recall not being sure I could fix it and my lead saying “I’m trusting you to get this fixed. I know you will.” Geez, no pressure right? Every morning we had a 10am stand up and every day my response was the same — “desync hunting.” After almost a week of spiraling into madness I found the issue. If you watched the trailer you’ll notice some hero abilities that knock fodder into the air. When the giant walking castle smashes his hammer down all the fodder go flying. The bug was a pointer to a steering based pathfinding component that dangled until the fodder crashed into the ground and disappeared.

For a desync to occur it wasn’t just that simple. First the fodder had to be killed by one of only a few special abilities. This deleted the pathing component and dangled a pointer. With our memory allocator the deleted component’s memory block was simply moved to a free list, but otherwise unchanged. Then, before the fodder landed, a new memory allocation needed to occur. That allocation needed to be handed the same memory block used by our just deleted memory component. Then and only then would a desync occur. Appropriately setting the pointer to NULL fixed the issue.

Final Thoughts

This has been a very brief overview of a synchronous engine as used by Supreme Commander. I know many older games have worked in similar manner. The latest generation may be much fancier, particularly in terms of handling input latency. I know that Starcraft 2 can desync so it’s likely similar. Other games to look at would be Heroes of Newarth or League of Legends. They aren’t nearly as massive as SupCom and feel highly responsive but I haven’t investigated in depth to see what clever tricks they pull.

If anyone has a good desync war story please share in the comments.