Hi, my name is Leigh Beattie and currently I work on a video game project called 'Depth' for a company called Digital Confectioners. I'm planning on using this blog as a means for me to keep track of what I learn and how I plan to go about fixing the problems encountered during my time working on Depth. In my personal time I tend to work on my own game engine and accompanying game, a hobby that I find very satisfying, particularly as a learning experience. Through my personal struggles I have found it has given me insight into other engines where different design decisions have been made, usually for very valid reasons. I also enjoy reading research papers and exploring new methods for real-time rendering. High performance 3D graphics is a fascinating world where there's lots of exciting new ideas surfacing every year.
Sharks are only really scary when they're played by humans |
Genesis
Depth uses the latest incarnation of Epic's Unreal Engine 3, and while the engine is very mature and well thought out it doesn't translate well to the content we're using in our game. Depth is already available and playable on steam today, and while it's an incredibly fun and original game it's visuals aren't quite at parity with the performance you expect to get on certain hardware.
The poster-child of UE3 is a little bit on the ugly side |
The idea behind any black box engine is that the renderer is meant to be data driven. What I mean by that is that game developers need to have minimal knowledge about how their art assets go from the model pipeline to being displayed on the monitor. The renderer is literally driven by the data in the meshes and materials. This has the added benefits of reducing complexity, encouraging consistent results, providing a highly optimised experience and significantly reducing development time. But, like with anything in computer science, this black box approach comes with certain caveats, Unreal Engine 3 is no exception.
To really achieve a high performance output you need to work within the limitations of what the renderer was designed to do, and those limitations harken back to the time period the engine was designed. Having an initial release date of 2006 means it had to run well on the hardware of the time, which means Direct X 9, but also fit to the level of hardware performance of the time. This boils down to the renderer being mostly a forward renderer (with the exception of some deferred elements), which at the time of the engines initial release was great as many graphics cards simply didn't have the memory bandwidth to appropriately handle a deferred rendering pipeline, and without that over head meant the engine ran super slick.
2006 gave us the ATi X1950XTX and not too much later, theamazing Geforce 8800GTX |
The Struggle
Here in lies our problem. With Depth being an underwater game, dynamic lighting takes on a role of importance, and while Unreal Engine 3 certainly has a very mature forward renderer, it's simply too much overhead when running a good number of dynamic lights. It's not really a problem with Unreal Engine 3 so much as the method chosen to calculate multiple lights.
When profiling depth we find that we spend a good amount of CPU cycles iterating over primitives and lights. A lot of this is to do with the fact that we have so many dynamic lights in our scene and to properly optimise the graphics pipeline the engine needs to first determine which lights effect what geometry. This doesn't scale well with number of lights and has forced use to make concessions in our art quality in terms of light size, with some dirty hacks thrown in to decrease size at distance. From a GPU perspective a forward render means that for each light affecting a primitive that primitive needs to be completely re-rendered (including re-doing the vertex processing stage) for each light and blended to produce the final output, which means we've been forced to reduce the poly count of geometry in our scenes as to not be vertex throttled.
Unreal engine 3 was recently updated (2011) with a Direct X 11 deferred renderering pipeline, which was used in the Samritan demo https://www.youtube.com/watch?v=RSXyztq_0uM . Ideally if this rendering pipeline actually functioned with our game it'd close to the perfect solution. But alas it's non-functional and it doesn't receive any support updates from Epic themselves who are instead, and understandably, focused on developing Unreal Engine 4.
The Graphics Demo character everyone actually wants to see in a game. |
The Solution
Luckily we have a license for the source code of Unreal Engine 3, and using this we can mould the engine to better fit the art assets we present it. Our plan is to take a Forward+ approach to rendering the art assets on screen, mostly because it allows us to remain DirectX 9 compatible but also reuse a lot of code already present. And here is where my journey really beings as this can potentially be a pretty big, but exciting, undertaking.
That's not to say we actually have the solution yet, we're just starting on this journey to find out what best suits us. First things first, we need to discover precisely how Unreal Engine 3 runs it's forward shading pipeline so we know precisely where to squeeze our new lighting metaphor in and what we need to cut out.
Hardware Abstraction
Being a low level graphics programmer I decided the first thing to learn about Unreal Engine 3 was the way it abstracts away it's graphics commands. Unreal Engine calls this it's "Render Hardware Interface" or RHI for short. Most of the code relevant to these interfaces exists in RHI.h and RHIMethods.h. The idea here is to define high level drawing methods that the base engine will use for controlling the state of the hardware it's running on top of, e.g. Generating textures, Drawing Primitives, Creating a Viewport, Setting Shader parameters. The RHI.h is just an interface though and implementations of this interface can be found for Direct X 9, 11 and Open GL in the UE3 source code. Generally when modifying Unreal Engine 3, it looks like you'll want to use RHI methods for controlling graphics card state such that your code remains consistent and general purpose. It also means that any state changes you make are more likely to be tracked if you're running profiling tools and any state caching that might happen also doesn't become corrupt by bypassing the UE3 state tracker.
The Rendering Thread
UE3 has been designed to be highly adaptive, at least in terms of it's renderer, and this is reflected by it's main rendering thread. The rendering thread is as light weight as possible, being a circular buffer that iterates over abstract commands, it's also completely optional and you can just choose to have all the commands executed in the current game thread instead of being queued on a separate thread
Through design it looks to have some level of indirect coupling to the scene objects that use it to render. The render thread is not aware of any scene tree or stack but instead just runs over a circular buffer of commands and parameters.
Queueing a command on the render thread involves calling one of 4 C macros that will go about building the actually render command object (FRenderCommand) for you. All you need to provide is the name of the class you want to create, it's parameters and the code you want it to run in plain text. The macros will put in the appropriate code to get your command into the command queue and to ensure that you have indeed got threaded rendering enabled ( other wise it'll just execute the command right away instead of actually doing any queueing ). This looks like me in when it comes to modifying the rendering state tracker, where my custom graphics pass should run it's graphics commands.
In the context of our game, using this multi-threaded renderer has huge performance benefits, allowing the game thread to complete work quicker and not being blocked by render calls. At least, that's what I guess is happening. In most situations you can see a doubling of frame rate in what are considered problematic areas.
Going Forward
Even though I've read through quite a bit of source code there's still a bunch of assumptions I've had to make so I could be wrong about some of the elements discussed here, these are just the conclusions I've come to after my brief session of exploration. Hopefully over time I'll become more familiar with the renderer such that I'll be able to manipulate it at my whim.
Going forward I'll be investigating from the other end, where rendering commands are batched and organised for rendering efficiently. I've had a quick look at the low-level rendering and I THINK I understand it, now I'll look at it from a high-level and see if I can understand that.