Archive for September, 2011

Building a Better Build System

My role on Space Marine was on the Tools team, administering the build farm.  I really didn’t know much about pipelines going into it, though on my 2007 Co-op at EA, I did have some exposure to the central build group for sports games in Burnaby, called Orca (at the time, I know they’ve changed names a few times since then.)  So I dove into being build guy at Relic, and learned quickly.  Now that I’ve been through a full product cycle, I have a better understanding of how things work, and the role of a build system.  For the same of this topic, I define:

A build system is an automaton existing to convert a set of source files of varying formats and produce a set of output files representing the product in development or to be released.  These source files are usually the output of members of a project’s production staff (programmers, artists, animators, audio designers, level designers, etc.) but can also be previously generated output from the build system. The output files can consist of loose files, archives, install packages, disc images (anything which could run the product, conceivably), tools produced from the same set of input files (not to be shipped), and log files for the build operations.

That definition is all well and good as a starting point, but it’s obviously insufficient.  How do you determine what needs to be built?  How do you provide the input and retrieve the output? What if there are special cases (e.g. a tool has changed, requiring a rebuild of all animations) or dependencies between assets?  Complicating all this is the wetware producing the input data – humans don’t always follow the pipeline properly, for a variety of reasons.  Sometimes it’s a problem with the pipeline, they forget, there’s a new step they haven’t amalgamated yet, and so on.  As well, third-party tools are not always reliable.  They have bugs, and so will your build system.  We also don’t want to release nonfunctioning output to the team.  So, let’s come up with a more concrete set of requirements.

A build system should:

  • Detect changes to the set of input files and automatically rebuild output data as necessary
  • Test the output at the completion of an operation for some basic level of functionality
  • Have the throughput to build each submitted change individually for finely-grained detection of failures (continuous integration)
  • Have redundancy in place to tolerate bugs in the system which may erroneously report failures
  • Have a facility in place to request rebuilds of certain data sets as required
  • Distribute output data to users, at their convenience, promptly.

Each of these requirements looks simple at first, but expanding each requirement quickly expands the scope of the system.  It is no longer just a process running on a server, but an integrated part of the pipeline and  tool chain, with deeper integration for different parts of the system.  To expand on the complexity of some requirements:

  • Detecting changes to the input files requires some kind of notification that files have changed.  A straightforward way of doing this would be to set up a trigger in your source control system to notify the build system that files a, b, and c have changed.  It is naive to assume that just because a, b, and c have changed, you only need to rebuild three files.  What if a is a texture shared between five other assets?  Then you also need to rebuild d, e, f, g, and h.  How do you know this programmatically? Some sort of asset tracking system is now needed.
  • Automated testing of basic functionality can fall into two categories: smoke testing and unit testing, the former seen in use more commonly than the latter in my experience. A smoke test is a very basic test to determine if the software can be started and some basic operations performed without a crash.  If that fails, there’s not really any point sending the build out to the team for consumption. On the other hand, a unit test is used to validate specific classes and functions, in much finer detail, and would be useful to find bugs before QA does.  A failed unit test should throw warning flags, but not prevent the entire build from being released to the team.
  • Output data from a build process is more than the program and its dependencies, it is also the logs and reports from each stage of building and auto-testing.  Networks are fast today, but on a large team with a large build, the speed of the drives containing output data will become a bottleneck.  Thus, an intelligent distribution system is needed to perform load-balancing and deliver files to users in a manner such that they can pull their data from multiple sources. The logging system should parse the return code, stdout, and stderr from each process ran, extract the relevant data, and present a human-readable log to the users for troubleshooting.  The full logs should always be available in case the summary is insufficient.

This concludes a very high-level overview of what a build system can do to improve the productivity of the team using it.  In future posts, I will discuss the varying subsystems in greater detail and propose algorithms to address their requirements.


Reflecting on Space Marine

My time on Space Marine has ended. My first game as a professional game developer launches tomorrow night in North America (though street dates seem to have been broken in Europe already), and I must admit I am glad to have a break from being build guy. The last few months have been very intense: running release candidates, preparing submissions to MS/Sony, dealing with several 11th-hour emergencies, and all the extra hours of finaling. I knew finaling would be tough, but I definitely underestimated just how tough it would be.

I hope Space Marine does well – general reaction to the demo from those who have played it, as far as I can see, is quite positive. The people claiming we’re a Gears clone are finally quieting down, and a following seems to be starting up. I hope it translates into more sales.

What did I learn? Well, I certainly put my Distributed Systems course to work, as I took the build farm from a fault-intolerant eight-node system to a significantly more tolerant 33-node system. The biggest nagging issue I was unable to solve was that for some asset types, on a blade server with no physical GPU, a tool would occasionally fail on a given asset (and then succeed in subsequent executions), and the logging I had in place was not sufficient to track down the issue. Then we went into pipeline lockdown and we let the issue lie.

Besides that, I learned a bit about the file systems on the Xbox 360 and Playstation 3, and the different methods of distributing content on those platforms. I learned how to publish on Steam, and while I think Microsoft has the best publishing tools of the three, I have to give Valve credit on how approachable they were for any publishing/distribution issues I had. The PS3 and X360 discs, as well as the Steam digital download, are my largest user-facing contributions.

I learned a bit about Flash too, as I fixed some UI audio issues in the endgame.

Overall, Space Marine was great to work on. I proved myself as a programmer, rose pretty quickly from junior to intermediate, and will be moving into gameplay on my next project, on what looks to be a pretty cool feature. I won’t say what, yet, until I’m sure it’s fine to talk about it. THQ has announced and shown it, so it’s probably fine, but I want to play it safe.


Copyright © 1996-2010 Longair.ca. All rights reserved.
iDream theme by Templates Next | Powered by WordPress