My role on Space Marine was on the Tools team, administering the build farm.  I really didn’t know much about pipelines going into it, though on my 2007 Co-op at EA, I did have some exposure to the central build group for sports games in Burnaby, called Orca (at the time, I know they’ve changed names a few times since then.)  So I dove into being build guy at Relic, and learned quickly.  Now that I’ve been through a full product cycle, I have a better understanding of how things work, and the role of a build system.  For the same of this topic, I define:

A build system is an automaton existing to convert a set of source files of varying formats and produce a set of output files representing the product in development or to be released.  These source files are usually the output of members of a project’s production staff (programmers, artists, animators, audio designers, level designers, etc.) but can also be previously generated output from the build system. The output files can consist of loose files, archives, install packages, disc images (anything which could run the product, conceivably), tools produced from the same set of input files (not to be shipped), and log files for the build operations.

That definition is all well and good as a starting point, but it’s obviously insufficient.  How do you determine what needs to be built?  How do you provide the input and retrieve the output? What if there are special cases (e.g. a tool has changed, requiring a rebuild of all animations) or dependencies between assets?  Complicating all this is the wetware producing the input data – humans don’t always follow the pipeline properly, for a variety of reasons.  Sometimes it’s a problem with the pipeline, they forget, there’s a new step they haven’t amalgamated yet, and so on.  As well, third-party tools are not always reliable.  They have bugs, and so will your build system.  We also don’t want to release nonfunctioning output to the team.  So, let’s come up with a more concrete set of requirements.

A build system should:

  • Detect changes to the set of input files and automatically rebuild output data as necessary
  • Test the output at the completion of an operation for some basic level of functionality
  • Have the throughput to build each submitted change individually for finely-grained detection of failures (continuous integration)
  • Have redundancy in place to tolerate bugs in the system which may erroneously report failures
  • Have a facility in place to request rebuilds of certain data sets as required
  • Distribute output data to users, at their convenience, promptly.

Each of these requirements looks simple at first, but expanding each requirement quickly expands the scope of the system.  It is no longer just a process running on a server, but an integrated part of the pipeline and  tool chain, with deeper integration for different parts of the system.  To expand on the complexity of some requirements:

  • Detecting changes to the input files requires some kind of notification that files have changed.  A straightforward way of doing this would be to set up a trigger in your source control system to notify the build system that files a, b, and c have changed.  It is naive to assume that just because a, b, and c have changed, you only need to rebuild three files.  What if a is a texture shared between five other assets?  Then you also need to rebuild d, e, f, g, and h.  How do you know this programmatically? Some sort of asset tracking system is now needed.
  • Automated testing of basic functionality can fall into two categories: smoke testing and unit testing, the former seen in use more commonly than the latter in my experience. A smoke test is a very basic test to determine if the software can be started and some basic operations performed without a crash.  If that fails, there’s not really any point sending the build out to the team for consumption. On the other hand, a unit test is used to validate specific classes and functions, in much finer detail, and would be useful to find bugs before QA does.  A failed unit test should throw warning flags, but not prevent the entire build from being released to the team.
  • Output data from a build process is more than the program and its dependencies, it is also the logs and reports from each stage of building and auto-testing.  Networks are fast today, but on a large team with a large build, the speed of the drives containing output data will become a bottleneck.  Thus, an intelligent distribution system is needed to perform load-balancing and deliver files to users in a manner such that they can pull their data from multiple sources. The logging system should parse the return code, stdout, and stderr from each process ran, extract the relevant data, and present a human-readable log to the users for troubleshooting.  The full logs should always be available in case the summary is insufficient.

This concludes a very high-level overview of what a build system can do to improve the productivity of the team using it.  In future posts, I will discuss the varying subsystems in greater detail and propose algorithms to address their requirements.