Overwatch Gameplay Architecture and Netcode

hello everybody this is overwatch gameplay architecture and netcode standard rules apply silence your phones fill out the session in feedback form switch off hans only get on the fucking payload my name is tim ford on the lead gameplay programmer on overwatch I've worked on overwatch in that capacity since its inception in the summer of 2013 before that I worked on Titan this talk is not about Titan the goal of this talk is to share some techniques for reducing complexity in an ever-growing codebase we achieve this goal by adhering to a strict architecture finally we'll demonstrate an example of managing complexity by talking about an intrinsically complex problem netcode overwatch for those of you who aren't familiar with the game is a team based online hero shooter set in the near future it features a diverse cast of heroes each with their own unique over-the-top abilities overwatch uses what is called an entity component system architecture which I will say and mumble as ECS from here on out ECS is different from the component model popular in several off-the-shelf engines and much different from the classic actor model that dominated the late 90s and early 2000s our team had several years of experience with these other architectures so choosing ECS instead was a bit of a grass is greener move we did audit a prototype first so the decision wasn't entirely emotional that said the idea that ECS architectures can manage complexity on a quickly growing codebase was discovered over three years of development I'm happy to espouse ECS virtues but know that I do so today with the clarity of hindsight the canonical ECS architecture looks like this you have a world and it is simply a collection of systems and entities an entity is really just an ID that corresponds to a collection of components components store a game state and have no behaviors systems have behaviors in store no game state here's the shocking thing when I say they have behaviors that maybe they have you know some parents have no functions and systems have no fields here are the systems that and components of a simple ECS engine that we architect this is what it looks like on the left-hand side here you can see the systems in tick order these are the different components that different entities have the components lighting up on the right light chords of a piano refer to what we call tuples of components the system tick iterates through the tuples and performs operation that's the behavior on their state remember the components have no functions their state is laid bare the overwhelming majority of systems care about more than one component you can see here transform components pretty popular here's an example of what a system tic looks like from our prototype engine this is the physics system tic it's pretty straightforward you basically have an internal underlying physics update could be havoc to be box2d or domino which is our proprietary physics engine after you run the world sim the the physics world Sam you iterate over a set of tuples you use whatever proxy was stored in this the dynamic physics component to pull out the underlying physics representation and you copy it across to the transform and contact components a system has no idea what each entity is it only cares about a tiny slice of the components and executes a set of behaviors common to that slice of components some entities might have 30 components I might have 2 or 3 the systems don't really care they just care about the subset of components on which they their behavior operates so here in our prototype engine this is an entity that's the player-character that can do a bunch of cool behaviors this is like a bullet that the player can shoot each each of the systems as they run don't know or care what those entities are they just operate on the sub set of components that are relevant to them over watches implementation looks like this mostly the world is something we call the entity admin its stores and array of systems and a hash map of entities that are keyed by entity ID the entity ID is just unsigned 32 that uniquely identifies this entity in the entity admin array on ops the entity stores that entity ID and this optional resource handle that points back to the asset what we call the entity definition that finds that that entity component simply a base class with hundreds of subclasses each subclass component has member variables required for the behaviors that will be run against it from systems polymorphism is is used almost exclusively for lifetime we override like a create function and a destructor but that's pretty much it you know the only other functions that might make their way into an actual instantiation of a component would be little helper functions that make accessing its internal state easier but they aren't really behaviors these are just simple accessors so the end of the admin is gonna call update on a personal system and then each system they're gonna do some stuff so here the way we work instead of operating over these fixed tuples of components we choose like a primary component we're gonna iterate over and then our behavior invariably is gonna involve other components we've grabbed them through a sibling so here some system operates on tuples of entities that have the derp component and the herp component the overwatch client system and component breakdown looks like this here we show about 46 different systems and 103 component types this is just designed to impress you this is the server you can see some systems operate on a lot of components some systems operate on very few ideally we try to make sure that systems that work on lots of components do so by reading them as pure functions as opposed to mutating all those things there are handful of systems that do need to mutate a lot of those components and by virtue of that they have to kind of manage that complexity themselves here's an example of what a system actually looks like this is the player connection system it's responsible for enforcing afk behavior on all of our game servers the system iterates over connection components connection is the component that corresponds to the player and network connection on the server it exists on the entity that represents the player the entity itself could be an active game participant could be a spectator or some other player controlled role this system doesn't know or care it's job is just to enforce afk for each connection component here's our tuple connection component that has an input stream of stats we're going to read your input stream make sure you did something you press the button read your stats component make sure you contributed to the game in some some sort of way as long as you do that well reset FK timer otherwise we will use the connection handle that state stored on the connection component to send you a message to move so in order for this behavior to run an entity that's going to be cast against the system must have the entire tuple for example an AI bought has a stats component but it doesn't have a connection component for an input stream so it's not going to be subject to this behavior again these the system behaviors look at those slices and you must have the whole set and if we afk doubt AI that'd be kind of wasteful let's be honest okay so the system update function raises this question so why not just do in a traditional object-oriented programming component model update have the connection component override a virtual update function that does all the afk tracking well connection fulfills multiple behaviors it corresponds to the subject of an afk it corresponds to the list of connected players who are subject to a broadcasted Network message it stores the state by which to determine a player's name it stores the stage by which you can get like a player's persistence records so it unlocks they have so which behaviors should be in that components update where would you put the rest in object-oriented programming a classes both behavior and state but connection component is not a behavior it's only state connections not an object in the object oriented programming sense it means different things to different systems at different times so what are the conceptual advantages of this separation between behavior and state bear with me for a second so imagine these are the cherry blossoms in your front yard the these trees in your front yard means something subjectively different to you the president of your HOA a gardener a bird a property tax assessor and a termite each observer sees different behavior in the state that describes that tree that is the tree is a subject that is dealt with differently by various observers to complete the analogy the player entity and more specifically the connection component therein is a subject that is dealt with differently by various systems to play a connection system that we saw before views connection as the subject of an AF K kick the connection you till the use connection as the subject of a broadcast to player's network message on the client the UX game system used connection as the subject that populates the UIL on the scoreboard with the players name why author behaviors this way it turns out it's much easier to describe all the behaviors of a tree when you compartmentalize individual behaviors by their subjective perceptions and this is also true of game objects anyway as we dug out our industrial-strength ECS architecture we ran into a couple problems first we struggled with this rule that components have no functions and systems have no state surely systems can have just a little bit of state right a few a few legacy systems were ported over to overwatch from other non ECS architectures they had member variables so what's wrong with that for example the input system right you can store the input state in the input system and any system that needs to know if a button is pressed we can just grab a pointer to the input system and ask it seems silly to store a global input in a single component surely there should be like more than one instance of a component if you're gonna make a new component type there's no need to substantiate like writing that code components are usually accessed through these iterator just like we saw before it's kind of bizarre to iterate over a component whose domain is exactly one anyway this worked for a while we've stored this one-off State in systems and then make global accessors see here this global variable accessing the system that one system could call from another it was kind of crummy for compile times because the systems were being included right yet other systems let's say I was refactoring input system and moving some functions around and modifying that header well now every single system that needed to get that state was gonna get recompiled that's just annoying I also made of a bunch of coupling you have systems behavior leaking into other systems so here we have this like post build player command that the input system is responsible for doing if I need to add new stuff to this function call this command systems job is to fill out this struct with a bunch of bits based on player input that'll be sent up to the server if I want to extend that add new things do I add it to the command system or do I add it to this funky little function over here are we leaking behaviors from command system into other systems as the systems grow naturally choosing where to offer the behavior becomes ambiguous here the Command sistas behavior fills out those structs so you know why why mix it why put it in one system or the other and maybe we did this this way for a while and a work decent until killcam killcam we're gonna have two different you know simulation environments one the live games one the kills I'll show you how that works spree straightforward you add a second pristine ECS world one for the live game and one for the replay the way the replay works is the server is going to send down a big fat network stream of 8 to 12 seconds then we're gonna spin up and point to start render the replay admin and give it that network stream as if it came off the wire and that all those systems all those components all those behaviors don't know that they're not being predicted on that client they're just running a network stream as if it was normal gameplay it's kind of cool if you guys wanna learn more about this I suggest going to Phil or weeks talk tomorrow I believe in this room at 11 o'clock anyway what we learned after doing that was all these call sites where we had these globally accessed systems were suddenly wrong there wasn't a single global entity admin anymore they were to the system I couldn't grab the global system be it now had to grab system be through their shared entity admins somehow and that's just a key well after killcam we took a long look in the mirror and between the bizarre access patterns the compile overhead and most dangerously this inner system coupling we had a problem the solution was to come to terms with the fact that it's ok to define a component type that will only ever have one instance per entity admin we created this notion of singleton components these are components that live on an anonymous single entity and are usually accessed directly through the entity admin we moved most of the state that was in systems into these Singleton's I should mention that it's very rare for a singleton state to be accessed by exactly one system moving forward we got in this habit where we were right a new system and realized that system was begging for some state we would go ahead and make a singleton for that that system to store that state in and almost every single time some other system was gonna want that state so I really got ahead of this kind of intrinsic coupling that the previous architecture was demonstrating here's an example singleton input all the button press information is stored into that singleton and Popa just moved it out of the input system any system that wants to know whether buttons up or down just grabs a component and this immediately removed some nasty coupling and aligned us more with the ECS philosophy the systems have no state and components have no behavior the button state is not behavior the local player movement system has a behavior that it uses this singleton to verdict local player movement the movement state system has a behavior that packages this input up to the server to be consumed the pattern of Singleton's turned out to be so common that about 40% of our components are actually singleton components once we move some system state into Singleton's we adjust a bit more coupling by breaking out shared system functions into utility functions that operate on those Singleton's we'll talk about that next the input system still exists it's responsible for reading input from the OS and filling out singleton input and then other systems downstream we just reading with an input to do what they need to do it's responsible for other stuff like applying the button bindings the per hero button settings but it's no longer coupled at all with the command system we also move this little post build player command function into command system that's where they belong to anyway and now you can guarantee that all of the mutations to player command this important structure that'll be networked and used for simulation is all modified in this one spot at the time we adopted singleton components we didn't know we were establishing patterns like this to reduce reduced coupling and therefore reduce complexity in this example command system becomes the only place that generates side effect on this player command struct any programmer can easily understand mutations to player commands because it all happens in this one file imperative Lee at one time in one system update call it's also clear to any programmer that any new mutations you need to add to the player command happen in this one file in this one update function all that ambiguity goes away let's talk about another problem we had this the idea of shared behavior the way this works is if you have some behavior that's invoked from multiple system updates sometimes to observers of a subject are interested in the same behavior going back to the the tree analogy the president of your HOA and your gardener Bay may both want to know how many leaves are gonna fall out of this tree during the spring right they'll each do something different with that output like the president we actually yell at you and the gardener will just get back to work but the behavior is the same for example a lot of code is curious about relative hostility is entity a hostile to entity be hostility is determined by three optional components filter bits pet master and pet filter bits stores and entity teams index the pet master stores a unique key that matches all of his corresponding pets you would use pet on liked or burns turret if either entity has no filter bits they aren't hospital so two doors or not hostile to each other they don't have you know teams set up on their filter bit components if they're on the same team they also aren't hostile that's pretty easy if they are on the always hostile team they will check their pet pet master pair make sure that they are you know related to one another this solves the problem of if you're alia if you're on the hostile to everyone team and you just want to turn a melee start attacking you I mean it did but we fix that bug when you want to check hostility for a project a projectile on flight you simply fall back to the instigator the guy can shop that projectile spree straight forward anyway the example I've described above is this function called combat util is hostile to and it takes two entities and returns true or false if there's hostility and autonomous systems call this function so here's a bunch of systems that call it but as you can see it only reads these three components that I enumerated so its surface area is fairly low and more importantly it's pure like it's not gonna actually mutate these guys at all it's just gonna read them as an example using that as an example we have a couple different rules when it comes to these utility functions that shared behavior if you want to invoke the utility from several call sites the function should read very few components and have very few hopefully little sight of error no side-effects if you have a utility function that reads several components and has several side-effects try to limit the number of call sites so one example of that is what we call the character move util this is a host of functions that moves the player one tick in the simulation and that's called in two spots once on the server to simulate your input and once on your client to predict your input so we continue to replace these inner system calls with utility functions and moving to move state out of systems into these Singleton's if you replace an inner system function call with a shared utility function you don't magically avoid complexity it's mostly syntactic and organizational just as you can hide a lot of side effects behind a publicly accessible system function you can hide a lot of side effects behind the utility function as well so if you're calling that utility function from several sites you're invoking several major side effects all over your game update loop it may not be obvious because it's behind a function call but it's still pretty horrible coupling if you take away one lesson from this talk let it be this behaviors are much less complex if they are expressed in a single call site in which all major behavioral side effects are localized to that call side let's explore some techniques we discovered to help reduce this type of coupling when you discover that a big side effect has to be executed in response to some behavior ask yourself if that big chunk of work has to happen right now the best singleton components solve inter system coupling with deferment deferment is the act of storing the state required to invoke a major side effect but putting off the side effect invocation until later at a better single moment in frame the several call sites in the code want to spawn surfaced impact effects yeah these hits can projectiles shot with a projectiles with explosions you have your Zarya beam that is like a channel defect along the wall and it has to maintain that contact as she fires you have spray creating an impact effect qualifies as a very large side effect you're creating a new entity in the scene that has repercussions with lifetime threading scene management and resource management the lifetime requirement for impact effects is that they show up before the scene renders that doesn't mean they have to show up in the middle of the game simulation and doesn't different call sites though it was just under a quarter of the code that resolves impact for effect creation it does contact resolution based on transform impact type the material struck it also does calling LOD see management prioritization and it ultimately creates the effect it takes care of making sure persistent hit pad effect effects like bullet holes and scorch marks don't stack in weird ways for example if you shoot a wall as tracer and do a bunch of pock marks and then in fare fires or rocket and puts a huge scorch mark over it you want to delete those pock marks otherwise you get ugly Z fighting and the effects artists yell at you all right I don't want to do that math all over the place I want to do that in one spot if I had to change this code right there's a lot now and I had a dozen different call sites invoking it I got a test all those call sites invariably more call sites would be added as focused pattern match say oh I have some cool cool you know ability that needs to create a new effect I'll just copy-paste this one function call it's okay it's just a function call no it's not it's this nightmare I would a lot of large side effects can be invoked from multiple different call sites programmers tend to spend a lot more manner at mental energy maintaining a cognitive model of how the code works that's what code complexity is you want to avoid that so singleton contact it contains a array of pending contact records each record has enough information to create the effect later in the frame when you want to spawn an impact effect you just add a new entry and fill it out later in the frame before the scene update and the render prep the things going to draw that frames work the resolved contact system churns through the array of pending contacts and spawns the effects with all those LOD rules and and overrides and stuff the big side effects are invoked entirely from one call site every single frame aside from the reduced complexity of the solution the deferment has a couple other advantages you get a perf benefit because of data and instruction cache locality you can place a perf budget on impact effect creation imagine you have like twelve divas all shooting a wall at the same time and they want to spawn like hundred of impact effects you don't have to spawn on like now you want to spawn like you're divas impact effects but you could defer the rest and smear them out over multiple frames smooth out spikes there's a bunch of really cool advantages to this right you can do all the really complex stuff even our resolve contact system there's a fork and join multi thread checked in order to do figure out how to orient all those all the little particle systems it's really cool you can defer all this stuff utility functions Singleton's deferment these are just a few of the patterns we established over three years of working on an e CS architecture in addition to the constraints of the omitting state from systems and behaviors from components these techniques further constrain how we solve problems on overwatch adhering to these constraints means you have to solve problems in a very specific way however these techniques result in a consistent maintainable and decoupled and simple code we constrain you we throw you in this pit but it is a pit of success I hope that mine let's talk about one of the real hard problems and how easy s makes it simpler this is the most important problem that code for for gameplay engineers we had to solve like our objective is to make a responsive Network action game in order to make the game responsive we have to predict player action nothing's going to feel responsive if you have to wait for the server to tell you what happened this has been true of the genre for twenty years despite that requirement we really can't trust the client with any simulation authority other than their input because some clients are jerks things that make the game feel responsive yeah movement you have the ability use weapons earn ability as far as we're concerned hit registrations in all cases that comes down to this the player hits a button the player to see an immediate response this should work as well as possible even at high latency the students are running at a quarter second ping all my button presses are immediately responsive everything is working just fine no delay mispredictions are a side effect of server authority and lag if you're gonna have a predictive client mispredictions are easy you didn't you didn't do what you thought you did that's pretty much what they are the server needs to correct you but not in at the expense of further responsiveness we try to reduce the chance of miss prediction with determinism and so here same context quarter set you know 202 millisecond pain we thought we left the server said no we got yanked back down to where we were before and frozen you can even see how the prediction work the prediction tried to get us up in the air even Winston's cooldown goes off I think it's properly reset but we don't want a nine nine point nine at the time that prediction is gonna work just fine so we want to make it as responsive as possible and if you happen to be playing from Sri Lanka and you get frozen by May you're gonna be a little little Little Miss predict direction alright so let me put some ground rules in place first we're gonna discuss the novel techniques and how we leverage CCS to reduce complexity here we're not going to cover like general replication of entities remote entity interpolation or the details of backwards reconciliation we very much stand on the shoulders of giants and use well-established techniques covered by other literature the subsequent slides do however assume some familiarity with those techniques all right our deterministic simulation relies on a synchronized clock a fixed update and quantization both the server and the client operate over the synchronized clock and quantized values time is quantized into what we call command frames each command frame lasts a six at fixed 16 milliseconds and in our tournament configuration is a fixed 7 milliseconds simulation is a fixed is fixed time so it has to translate the loop clock time whenever the computer clock says every single render frame into fixed frames we use an accumulator with rollover a remainder to accumulate command frames within our ECS framework any system that predicts on the client or authoritative Lee assume lights the player based on player input uses a slightly different API doesn't use update it uses update fix update fixed adjust done for every single fixed command frame assuming a steady output stream the clients clock is always ahead of the server by half our TT plus one buffered command for him and our TT here really is just paying ping plus processing time so in this example our our TT is about a hundred and sixty milliseconds so half of that's 80 plus one buffered command frame the command for him is 16 milliseconds that's how far ahead of time the client is from the server so in this little diagram the vertical bars here are the command frames being executed what's gonna happen at the for example the clients gonna stimulate and send the input for frame 19 and then sometime later based on that half hour TT and buffer time the server's gonna stimulate against that same input this is what I mean by the client is always ahead of the server alright because the client is always gobbling up player input as fast as it possibly can as close to now as they possibly can based on your latency if it has to wait to consume input it will result in slower server response time to your input and that makes the game less responsive you want to keep this one this buffer here as small as you possibly can for context we run at 60 Hertz so this is about 1/100 speed predicting systems on the client consumed this input and they simulate movement so here for controlling tracer the joysticks are the input that I'm using and sending down the tracer here is my current movement state that I've predicted and the tracer that's gonna come back to us the full RTT plus the buffer size later is the server authoritative snapshot of our movement for that tracer side effects from the service simulation are authoritative takes that other half RTD fidus have to arrive the reason players maintain this ring buffer of movement this guy appear this is a all of our moves we did in the past is so they can compare the results with the server results from the past if the client computed the same result as the server the client will continue on its merry way to simulate the next input if the client and server disagree on the results we've mispredicted and they have to reconcile naively we could just overwrite the clients results with the server's results but these results are old this is a server result from you know several hundred milliseconds ago in addition to this ring buffer a movement state we also store a ring buffer of inputs you give it if we because the character movement code is very deterministic if you have a starting movement state that you want to run an input against it's very reliably going to reproduce that input every single time so what we do is when you get a misprediction from the server we're going to replay all of your inputs to catch back up to what you believe now is so on frame 17 we thought we were running the server figured out we were stunned somewhat we threw a flashbang attacks or something what's gonna happen is when our client receives the packet that describes this movement state we are going to basically restore our movement state back to the authoritative server moment in time and we're going to replay all of our inputs to catch back up to what we believe now is so now for the client is about frame 27 we're getting results for frame 17 once we synchronize we are pretty much back in lockstep again we will know exactly how long we're stunned so by frame 33 here we know we're no longer stunned that the server Simla it's the same thing and it already agrees there's no there's no weird synchronization catch up once you get that movements date you can resend your input will be caught back up to now the client outgoing network stream is inconsistent and lossy however all of our game did is sent over UDP with an optional custom reliability layer as a result client input packages fail to reach the server from time to time which is loss the server tries to keep this tiny buffer of unstimulated input but it tries to keep it as small as possible to make the game as responsive as possible if the server has to starve out this little buffer it's just gonna take a guess it's just gonna duplicate your last input by the time that real input arrives you look how to reconcile that and make sure you don't lose any buttons but they're gonna miss predict alright here's the tricky part so here are losing some packets the server realizes that they'd had no input to send that frame but it will have done is use the previous input duplicate it and hope for the best it's gonna send a message back up to the client telling hey by the way I lost some input something's wrong this is where things go really weird what's gonna happen is the client is going to start to dial eight time I guess contract time in this case and start to simulate slightly faster we said before everything's fixed time so if a fixed time step was 16 milliseconds the clients just going to pretend that a fixed time step is now fifteen point two milliseconds it's gonna advance much much faster as a result of that you're gonna see these inputs show up quicker for the for the server by virtue of that the server's gonna have a much bigger buffer of inputs to play with while it waits for you to weather the storm of this loss this stuff so this technique actually works really well on the internet where they have tiny fluctuations in lost tiny fluctuations in ping if you were playing on the international space station this probably works because of general relativity so I think it's a prequel solution alright so guys take some take some note here we are receiving a message now we start dilating time notice that we're actually ticking faster look at the slope of inputs here it is it is literally pooping out and puts much faster than before the buffer gets bigger it'll weather that loss if it was lost in here you probably get the input anyway once the server realizes that you're healthy it'll send you messages saying hey you know it's fine the client will do the opposite it'll be a little dilate time back down the other direction and and spit out inputs at a slower rate to reduce the size of that buffer and this feedback loop is happening constantly and the goal of it is to try to keep you on that razor's edge and try to minimize but at the same time trying to minimize mis-predictions because of input duplication I mentioned earlier there when this is when the server is starved for player inputs and duplicate that input right once the client catches up the input that was skipped is in danger of being lost to solve this problem the client always sends up a sliding window of inputs this is a technique that's been around like since quick world I think so we just send the one input for the frame we just simulated frame 19 we've sent all of the inputs that we have simulated from the last acknowledged in Lubin state from the server so the last acknowledgments David got was for command frame 4 we just simcha main frame 19 we're gonna bundle every single input along every single frame into one packet of course players don't hit buttons as you know aggressively as 60-hertz so it's compress is really really well it's pretty tiny structure right because you you probably had the same W held down before you just sit a bit saying you have W still held down the result of this is if you have loss the next packet up still has all those inputs and once it arrives it's gonna fill all those holes before simulation and you're good to go so this feedback loop and to grow the buffer size as well as the sliding window of inputs basically is to make sure that you don't have loss so you don't have mispredictions when you do have loss so again just to show off yours double speed from before so this is 150th normal speed here's your ping fluctuating you having lost dilates time on the client the window of inputs is still going to fill any holes before you miss simulation you have server corrections I'm just combining all the animations together and one thing to show off so I won't go over this in too much detail here since it's the subject of Dan Reid's talk which I very highly recommend because this is the opening act and his is like the best thing ever he's right after this in this room so make yourself comfy all the abilities are authored in this proprietary declarative scripting language called state script one novel feature of the scripting system is that it can scrub back and forth through time this allows scripts to be to be predicted on the client and then validated just like movement where we rewind you back and replay all your inputs suffice to say that abilities work under the same roll back and roll forth principle as movement right this rewind back to the authoritative snapshot replay and puts back to now if you remember this movements done example we had before a tracer getting stoned and being corrected works the same way the client and server both stimulate input against abilities deterministically the clients ahead of time from the server so the client will do it and then the server will get it later the client deals with mispredictions by rolling back applying the server snapshot rolling for don't show here so this is a video of us coming out of Wraith form as Reaper these states here representation of the state of weight form it says it make me invincible that make me play this cool effect maybe play this cool animation when we're done with Wraith form we're gonna turn all these guys off so in one frame this little animations in a show each of these states turning off right after this this is us predicting coming out of Wraith form soon after this we'll get an update from the server saying ok here's how I predicted you you came at a rate form it's actually gonna rewind it turn all the states back on again and then we simulate all your input to turn those states back off so this is constant roll back and roll forth that we're doing whenever you get these server updates cool thing is just like we can predict movement means we can predict every single ability that you do we have to opt out of predicting abilities which also means opt out of predicting weapons in males so let's talk about predicting and acknowledging hit registration ACS comes in handy here remember an entity will be a subject of a behavior if it has the tuple of components required by the behavior if you're an entity that is hostile remember that is hostile to check we talked about and you have a modified health cue you can be shot by a player and subject to hit registration those are the two components that you have the idea the ones required for the set required for hostility and and the modified health cue component modify health cue is a component that on the server accumulates the set of records to damage or heal you similar to the singleton contact we defer accumulated damage done or healed in multiple call sites because it's a big side effect to kill you and we defer to run it later just like we don't want to spawn a bunch of particle effects right now in the middle of the game of the projectile simulation just deferred same thing here damage by the way is not at all simulated on the client because they're cheaters however here registration is predicted on the client so on the client if you have a movement state component and you're a remote object you're not the locally controlled player you will be positioned by the movement state system by some interpolated transform between the last to receive movement States this is the standard interpolation technique that's been around since quake the system doesn't care if you're a platformer a turret a door or fara you just have to have the movement state component that's all you got the moving state component is also responsible for storing that ring buffer we showed before of those little tracer positions if you have movement state this is now describing the tubule for hit for hit valid or hit registration if you have movement state the server will have to rewind you to the players frame of reference that's background reconciliation before it computes it registration this is totally orthogonal to whether or not you have a modify health cue whether you can take damage right we have to rewind doors platforms payloads doesn't matter we have to see if the bullets were blocked naturally if you're hostile and you have a modified health cue and an ax boob mistake component you'll be rewound and you'll be potentially damaged being rewound is one behavior handled by one set of utility functions and being damaged is a different behavior handled by processing the modified health cue component defer later in the frame and we still isolate those the rewind behavior is its own thing that operates on its own slice and doing damage is its own thing that operates on its own slice shots a bit abstract so I'll break it down what we're seeing here are the bounding volumes for each of these entities so the bounding volumes here are basically how far it represents like a bunch of snapshots in time for this Genji so this is the bounding volume that corresponds to the last like half second of movement that person had if I were shooting a raid you know down my reticle I would intersect with his bounds first before I rewound the guy because he could have been anywhere in here based on my pain in this case if I'm shooting instruction I'm only gonna rewind onna my array of my bullet is an intersect her her bounds we're not gonna rewind Reinhardt or his shield or the payload or the door back over here shots the shots can miss predict just as movement can miss predict so here you'll see the the green ragdoll that I'm drawing here is the client's view of this Reaper whereas the yellow one is the server view this tiny little little green dot back here is what a client thought my bullet hit you can see this little green line is the bishop the path of my bullet but when the server actually validated it this little purple blue sphere corresponds to where it actually hid this is a super contrived example the deterministic simulation is so reliable that in order to reproduce this misprediction on hit I had to set my packet loss to 60% and shoot at this asshole for 20 minutes before I was able to produces I should mention that one of the reasons this is so precise is we have a bunch of very talented QA people that will not take no for an answer and while there are other games that don't try to have this level of precise prediction for hit registration our QA guys didn't believe me or care and it just kept going back with bugs and more bugs and more bugs and every single time we do back into it to try to find out if there was a defect there there always was and I thank them deeply for not letting us get away with this cool stuff okay if you have a real high ping hit prediction is not reliable anymore once you get above about 220 milliseconds on your RTT we're going to start to defer some of the hit impacts as well we're not going to predict them we're gonna wait to the server acknowledges it the reason we do that what we do instead when you get when you start rewinding targets that far back in time as we extrapolate them on your client we don't want the victim to feel like they're being rewound way behind a wall that they ran behind for cover so we put clamps on it so we're only gonna rewind you a certain amount after that we're gonna start to extrapolate I'll show you a video here that demonstrates that so this is at zero pain alright as you can see the hidden packs are predicted the hidden pip and health bar are not pretty good you wait for the server for those buts inspecting is zero it shows up almost 300 milliseconds pings you don't predict the impact because we're strapped away and works rapidly in this target right he's not exactly right there we're on dead reckoning it's pretty close but he's not exactly right there there's the situations were when that Reaper doubles back you might have totally missed predicted that extrapolation and we're not going to honor you you're paying us crap this is really obvious when you're paying is one second [Applause] Reapers doing the exact same movement as that first video so this is us extrapolating note by the way even the listing is one second everything I'm doing on my clients totally predicted and totally responsive and mostly wrong I should a dead-eyed here really easy killer yeah alright so other examples of mis-predictions this is never back to decent paying 150 millisecond think you'll get hit mispredictions whenever you have movement mis-predictions okay so in slow-mo alright we saw blood we did not see a health bar or did not see a hit pip so we miss predicted the impact effect the server denied it that it wasn't actually a legit hit the reason we missed predicted the impact effect is because we just got nice walls yeah raised up so art we thought we were down here on the ground when we fired but when the server went to go simulate us we were actually elevated slightly above that position so that's what caused the miss predict when we were trying to fix all these little hit miss prediction problems a majority of them actually came down to make sure your position was agreed upon it was exactly right with the servers so we spent a lot of time making sure those things lined up so that's that's a movement related miss predict here's a gameplay related miss predict we're gonna shoot this Reaper again we have about 150 millisecond pain we're gonna shoot this Reaper but he's in a Wraith form right as the arrow hits him so on our client will predict it will do blood he'll be no crop they'll know a hit pip and no health bar we didn't actually hit him because the Reaper was invulnerable first right this is an example of we favored the shooter most of the time unless the victim does something to mitigate that shot in this case the Reaper rate form which makes him invincible for three seconds all right so we did not actually damage that River I'm a philosophical standpoint I imagine you're that Reaper and you got that Wraith form off in fact the server like told you and all the effects started playing and then you died you would be on the forum so fast ECS simplifies the netcode problem the system's involved in net code understand when they're executing on behalf of the player it's really straightforward basically if the entity is controlled by something with a connection component it's a player systems also know what targets need to be rewound back to the frame of reference of the shooters or movers and in a that has movement state component was gonna be rebounding the behavior inherent in the relationship between entities with these components is that movement state can be scrubbed along a timeline that can match the frame of reference of the player as you can see here within this large universe of systems and components only a handful are responsible for the behaviors of netcode it's one of our most complex problems now two of these systems actually the net event system and the net message system those actually they're involved in net code in the sense that they actually like receive inputs and send outputs they'll actually do game simulation net code stuff so really you're talking about just a handful of these interpolate movement weapon state script and boom state and I want to delete this system because I don't like it so really we're talking like three systems actually are involved in in you know net what we call net code from a gameplay standpoint and only these components and the majority these components are read only for the sake of net code like the only ones that are truly modified are things like the modified health cue for example cuz you actually gonna do damage to somebody here are some of our lessons learned and insights after using in CS for a couple years I kind of wish we required systems and utilities to go back to that canonical example of UCS to operate on tuples the ad-hoc technique we use where we iterate over one component and then grab siblings really obscures component access the tuple model you have to be really explicit about what you can possibly access instead of grabbing components really nilly if you surface those tuples like yeah if you want to do this behavior you need this tuple of 40 components so that tells you something like I mean that system is too complex so kind of like the friction that the tuples do other cool side effect about tuples is that you have a priori knowledge of what systems can touch what states so back in our prototype engine which use tuples we knew that two you know two or three systems could touch a different set of components because we knew by their tuple definitions what they could possibly do we've made it really really easy to multi thread that guy so same animation from before but you see multiple systems light up in parallel because they're touching a different set of components your system ticket just you naturally multi-threading gameplay code because you can know a priori what what components are gonna read the right to I should mention that like you can see transform components still really popular only a few systems actually mutate transform component most systems read transform component and when you define these tuples in an upright sense you can tag components with now this one's read-only which means if you have five systems that are only reading that guy they can still operate in parallel all right any lifetime is tricky particularly when you create entities in the middle of the frame early on we deferred creation and destruction so you say hey I want to create this entity it wouldn't actually be created till the end the frame while deferring destruction turn out to be totally fine different creation had a bunch of annoying side effects specifically if you requested the creation of a new entity in system a and you really wanted to read it in system B if you do the fruit if you do further creation you're gonna have these off by one frame hair it's just really irritating this added a bunch of internal complexity we wound up changing the code to when you create an entity we actually like create it in the middle of that frame so it can be used immediately afterwards and we did that after ship which is kind of terrifying I was patched like 1/2 or 1/3 I did not sleep that night when we pushed it live yeah add a bunch of complexity to the component iterator Zubaz just kind of a key so this is still I think it's kind of an open problem that I'm still trying to or worship we're still trying to wrap our heads around it took us a good year and a half to come up with our ECS rules like we knew the canonical ones but we were taking some existing code and trying to mutate it into this new architecture these rules are like components have no function systems have no state put your shared code in utils defer complex side affects buying queuing them in components particularly singleton components systems shouldn't call functions and other systems even our naming convention those are things we evolved over the course of a couple years there's still plenty of old code that doesn't follow these rules and unsurprisingly unsurprisingly they're the source of a lot of complexity and maintenance issues if you look at it in terms of how many changes they have in perforce or how many bugs show up in those in that code so if you have some legacy code that doesn't actually fit well into ECS you shouldn't shoehorn it at all right keep that subsystem intact they then create like a proxy component that wraps back to it different systems want to solve problems in different ways ECS is a tool for integrating a bunch of systems together it shouldn't force as design principles where isn't welcome Cinci CS is trying to solve the problem of integrating and decoupling a bunch of different large modules many systems and the components they operate on tend to be iceberg shaped expert components have very little surface area to the rest of the ECS systems but they have a whole bunch of state that's internal under their proxies or in some other data structure that the ECS layer can't really touch the body of these icebergs is pretty obvious in our threading model so most ECS work like updating systems happens up here on the main thread and we use a bunch of different multi-threading techniques we have like Forks and joins so here's example here this frame someone was shooting a lot of projectiles so here the scripting system said it would need to spawn some projectiles we spun up a couple of worker threads to start chewing through those here the resolved contact system wants to create a bunch of impact effects we see we spawn up a couple of worker threads to do that work all the underlying work for projectile simulation is isolated and pretty much not visible up at the highest level VCS and that's good another cool example of this is our pipe a theta system it's a good example of a fork and join style model we're at the ECS level it just has a couple hooks to say hey this this breakable broke or this door opened you might want to rebuild pathdata in these regions but under the hood it's doing a whole bunch of like take all these triangles vox lies them and compress the crap out of them has nothing to do with the ECS right and we shouldn't you shouldn't shoehorn ECS on to that problem space is supposed to solve it on its own so here's a cool video of our patent validation system the path that here is these blue chunks these represent like surfaces AI can walk on actually mentioned we use pathway to not just for AI we also use it for a bunch of hero abilities so we actually need to keep this fairly in sync between the server and the client though zenyatta here is gonna destroy these crates you'll see the surface that was on the crates and drop down below and then this door over here is gonna open up when the door opens we're gonna need to nip that back into place the path invalidating system just has folks saying hey these triangles changed and then this iceberg does look bottom half of the iceberg goes through insurance through all that data to to redo all the pepid so including ECS is the glue of overwatch ECS is cool because it helps you to integrate many disparate systems with minimal coupling if you're going to use ECS define your rules of engagement in fact if you're gonna use any architecture define your rules of engagement quickly only a handful of engineers are gonna touch your physics code or your scripting engine or your audio library but everyone's gonna touch the glue code that integrates every system together enforce constraints on this glue code dig a pit of success net code turns out really tricky so the couple it as much as possible as you can from the rest of your engine ECS is a handy solution to that problem before we take some questions I want to thank all the engineers on team for especially the gameplay engineers for having to deal with this craft for three years we worked together to kind of come up with these rules and evolve where this art picture was going to go and I'm happy with how it turned out all right we have about 10 minutes for questions right right so in your components in the instances did you use any kind of here's my component stays for frame and for frame and plus one I have the second copy over here and therefore I when I do modifications on them for the next frame I don't have to modify in place but I'm we don't do we don't do a double buffer I don't work for school because you can do that to do multi-threading right and deferment it's it's really easy to do but we had another ECS prototype that did do exactly that with straight double buffering right it wouldn't be hard to add the one long gotcha there with well so what's gonna happen there is you're gonna you're gonna read last frame state in variably and for some system that works fine but for a lot of highly inter interaction systems it's gonna introduce one frame delays and that's gonna hurt responsiveness in general so it's a your mileage may vary a person area but it's very easy to do in ECS in general for us we have two components the input stream component stores a ring buffer of all of your inputs for the last two seconds and the movement state component that stores all of your bloomin state at movements say for any mover for the last type like second or two or something bad and those ones you can you can go back in time and read immutable versions without trouble but not it's not a general solution for ours but it's a hard ahead I had a question regarding looking at parent state and sort of instead of storing it locally so you mentioned that you know if you shoot a projectile you could just look at who the owner was and I guess I was cute but the philosophical decision to do that versus storing that state in the projectile you get kind of weird things where remember there was a bug where if you change teams mid projectile flight it could damage your X team or in these case of mercy where you damage buff it cares where whether it's happening now for the damage that's a those examples are exactly right the whole notion of life you could store it but then again G happens so you fire a rocket at me and I deflect it back it was your rocket now it's my rocket so now I have to go I would have to go into that projectile and copy off a bunch of hostility information we just say well let's not that's not trying to maintain that let's just save the instigator or all God this is a specific title for those things but who is that of simplicity yeah it's supposed to be out and if it's I mean what you're describing the technique you're describing would be probably for a perf benefit and we'll also some weird interesting things where Genji could reflected yeah offed rocket and that leave that yesterday I shouldn't I shouldn't say it's just you're right it's not just perfect is there's like design sensibilities behind that decision but certain complexity and simplicity or usually like highest priority big question I thank you so I guess my question is as you did over three years you were building the rules to help with constraint and do the pit of success how I guess how does that factor into refactoring systems that you've already developed systems before those yes for 2013 or as we as we went yeah as you went like I'm assuming you had like a block of legacy code or she bunch of the folks that were there during that window at I'm on overwatch will very fondly remember us trying to hold on to a bunch of legacy systems and then we actually had this process we called mothballing we said well I don't want to like delete this code I'm trying to pound to find it out under a mothball define and then we'll come back to it in a couple weeks and we just deleted all that code honestly it was like laughs correct we were super lucky like this is the situation that we had the the kind of protection we had as a game team that had every right to be canceled and kicked to the curb was really fortunate but I think I can answer your question more broadly with respect to refactoring there's this is that this is just a benefit of modularity in general if all of your behavior is isolated to systems if you wanna completely rewrite that system that's not hard to do at all right yeah so I think it's less of a less of the value was about refactoring old legacy systems and more about refactoring the new stuff as we wrote it so we rewrote whole systems multiple times for perfor for complexity or for organizational purposes or features that's right yeah that's that software development right yeah thank you actually said it's mature software developing which we rarely ever get a chance to do hi you mentioned that you switched from deferring creating entities to doing them do it live yeah fucking the whole thing sucks yeah it was yeah did you have any clever solutions or tips or tricks for dealing with like okay so you create this enemy entity halfway through the frame and like this stuff that those systems that we're simulating earlier didn't get to touch it yeah and then it renders or so that's actually like my fear the reason we made the choice early on to do deferment was well what's gonna happen if the entity gets created halfway through and it doesn't you know get systems A through F run against it and just like said well you know you should be able you should be a fully formed functional thing by the time you create is done you should need some other system run against you and we very slowly took pieces of entity creation and made it happen and synchronously the last one we changed after ship was whether or not we added components to these component integrators which is what most systems run over and that was the scary one that we just waited it turns out it just worked although I think a lot of that was holding my fingers and hoping for the best I will say though what sucked about that is let's say you're in the component iterator iterating over a component foo and you add a new entity that has component foo but you're iterating you know over that array chute so yeah we have these custom iterators where you have the this sorted array of components that currently exist and then a another array that sits on the back for any new components that got added while the first array was processing which kind of sucks because this first array is sorted by memory address it's super cache local it's really fast and these dudes over here aren't so you can get a bit of a pervert there for the one or two new entities that showed up during that thing but against the other 40 who cares and then when the frames done you kind of merge those guys sorting back in position you're gonna go but yet that was scary and terrifying and not thread safe and yeah right thank you great stalks thank you and my question is you mentioned the server runs 60 frames per second and when the crayons is running roar for him rate the sub Ernie's seven needs sixty command in in a second so so grant is also needs to run at 60 yeah I use your question about like the does the client must the client always run at 60 or what happens if it runs at 30 hey yeah it's a fixed simulation you must run at 60 you don't have to render at 60 but the simulation has to run at 60 so simulation you know here's here's the render part of the frame here's the simulation part of the frame if you're running at 30 Hertz you're gonna run two simulations two time steps and the Civic game simulation is much cheaper than the render part so it turns out it works out just fine if you're if your CPU isn't powerful enough to keep up though you're gonna enter a death spiral like oh shoot I had to run two frames this frame and that caused the next frame to be 48 milliseconds crap I have to run three frames this frame and that caused the next frame to be 72 milliseconds wrapping around eight frames is frame that's a desperate situation that's you're definitely Blom in spec on our our our hardware if you're doing that so it is we definitely wrestled with that and you know we need to make sure we were optimized fortunately for the client the this the fixed simulation is kind of dominated by locally predicting your client your client not remote guys and we do some cool tricks where hey for remote guys a lot of the script stuff the high level gameplay stuff has a budget like we're all gonna run 1 and 1/2 milliseconds for other people and we'll just you know smooth spikes out that way so works okay but yeah you're right you have to be careful you have to get that work done thank you well my question is you mention about hits prediction do you use it's also for slow Pro Rockets yeah yeah Rockets yeah yeah we that was that was a fun day holy crap future we couldn't think of other FPS that did that in fact we went to GDC talks by other FPS –is whoo we greatly respect and are better than us telling us and they shouldn't predict rockets and one of these and one day but we realized we could and we did like oh dude this is the best thing ever it causes some weird side effects right a rocket is a big honking thing in the world not like a not like a long tracer so the rocket can disappear because of a misprediction so you fire the rocket but you got stunned by McCree the rocket just vanishes and then youtube video forum yelling it but it was so worth it like predicting Rockets is rad it feels really good so if you're making a new shooter out there just just do it and you'll get one forum posting a rocket disappeared yeah it doesn't matter it's totally worth it it's really really good it's like the one thing we did to the genre at the predicted Rockets over Watts Game of the Year yeah thanks for the fair on Pharaoh duels for our tribes fans right right exactly the love note to the genre tribes especially thank you my questions about I guess spacial quantization how it kind of functions with the other systems first part is about like how fine is your spacial quantize one millimetre well one divided by 1024 and but one meter divided by 1024 but you know we're engineers by a thousand what yeah how well does the physics engine handle that and like because that can introduce errors right and how much of the important gameplay elements are actually affected by the physics engine well it's tough as the physics engine as we call the physics engine is more for cosmetic stuff then there's like the game physics engine at the game simulation physics which calls the same functions as the physics engine because it's written by the same guy that guy right there it's it's table it's really good in fact it's stable across platform between AMD Intel Linux like we worried about like when you into quantizing a float down to millimeter resolution we worry about well crap if if clang compiles it differently and you have out of order execution on this set of floating-point operations would you get different results and because of the nature of the quantization we don't if you have specific questions on that go to fill or Wiggs talk tomorrow he doesn't talk about that problem specifically but he wrote the code so you can harass it perhaps them afterwards and he'll tell you all about I Triple E and fun stuff cool thanks okay so I think it's might be the last question though so your client-side prediction seems to rely on determinism and real quick sorry the last question here I'm gonna go to the rap room around the corner so if you have more questions just follow me but sorry go ahead yeah so your client-side prediction relies on determinism and you have to support multiple platforms and it looks like for example your dynamic and navmesh system seems like it could be asynchronous how do you what techniques have you developed to preserve determinism in your simulation well so the techniques really come down to the fixed time step like everything is on that fixed time step no matter what at the end quantization if those those are those are the main things in terms of keeping us honest like we are not a strictly deterministic game like Starcraft or most RTS or even like Halo we're deterministic enough to not have mispredictions which is to say there are certain situations where you will miss predict under what would normally be like a reasonable series of inputs and because of misprediction smoothing we don't notice in general mr. mark wall igor over there actually spends a lot of time he wrote a debugger to isolate when those mispredictions happen because of a failure of determinism in the math and he has this cool little like timeline where you'll see yep server and client disagreed on this frame you can go scrub to that frame and it has like all the movement state that precipitated that you can pone it and realize why the array that tries to find where your feet are screwed up because it's almost always the Ray that I have to find what your feet are so like when the door opens and that much changes it happens at the same time step as a service oh yeah so the navmesh thing floating point yeah the knavish thing doesn't certainly the result lines up because it's given the same input but the amount of time it takes to recompute the navmesh is different on the server and client and that's okay very few player movement abilities rely on the navmesh and if you miss predict it's not it's not horrible and again the server's the authorities you'll get you'll get caught back up to the to the right state no matter what but their client rolls back from the previous state so they're pretty does yeah but I won't we won't roll back to path because that's that's like 40 Meg's of memory that I want to copy right okay thanks all right thank you very much guys [Applause]

Author Since: Mar 11, 2019

  1. And yet the hit detection has gotten so much worse. I have major doubts about these "TEDS" talks only to see even worse results.

  2. Finally some insight on why that fkn Hanzo arrow hit me behind cover or why that shatter knocked me down while I had my shield up…
    51:55 And then Baptiste happened. Now that state IS stored in the projectile.

  3. I watched the whole video and still don't understand how could they make the world's worst ever AFK check system in the whole gaming industry.

  4. I did not really get how you manage predictions in case when some entity (call it A, for example a slow-mo areal effect) was deleted for example 5 (simulation) frames ago and client received a message from the server that the prediction 25 frames ago was incorrect. In this situation client tries to simulate all 25 frames, but now entity A is removed, so for sure (in case A was interacting with other entities) the simulation will be incorrect again and for the consequent 20 frames the client will be receiving "invalid state" messages from the server.

  5. In 24:32 he mention well-established networking techniques covered by other literature, can someone provide references to those sources?

  6. All this talk about how good your netcode is and yet there still is the issue where you as the person with low latency will be punished for actions of a player with high latency.
    For example: I have 22MS RTT and my target is the enemy Widowmaker. I am Hanzo and I hit an arrow in her face I see blood and all effects. 0 Damage is done to the Widow. The widow then continues to headshot me while I am behind a corner. This is NOT correct netcode even though you think it is. Ofcourse the better solution is to just now allow people with high latency to be in a game together with people with low latency. I then asked the Widowmaker what his latency was and the answer I got back was 97ms.
    TL:DR your netcode punishes low latency.
    There is proof out there that this is the case and it is the only reason I stopped playing on a regular base.

  7. I think honestly that the games industry's spark is slowly dying due to shady, greedy and illegal practices from major game publishers more studios have been shut down developers being layed off, more than making games. The only thing I see saving the industry is virtual reality its at least something new and different and exciting.

  8. Anyone using ECS similar to this? i'm interested to know the major downsides, and when it becomes an overkill.

  9. OverMEH. Bad class-based shooter… not gonna bother with this, especially about its "gameplay architecture". 😛

  10. Eh, zis iz 2 hard 4 mi smol brayn. Nut understand you. Gamerz unite! Lettuce cumbine OUR brayn pahwer!

Related Post