Monday, January 26, 2015

The Game Outcomes Project, Part 4: Crunch Makes Games Worse

This article is the fourth in a 5-part series.
The Game Outcomes Project team includes Paul Tozour, David Wegbreit, Lucien Parsons, Zhenghua “Z” Yang, NDark Teng, Eric Byron, Julianna Pillemer, Ben Weber, and Karen Buro.

The Game Outcomes Project, Part 4: Crunch Makes Games Worse

Extended overtime (“crunch”) is a deeply controversial topic in our industry.  Countless studios have undertaken crunch, sometimes extending to mandatory 80-100 hour work weeks for years at a time.  If you ask anyone in the industry about crunch, you’re likely to hear opinions stated very strongly and matter-of-factly based on that person’s individual experience.

And yet such opinions are almost invariably put forth with zero reference to any actual data.

If we truly want to analyze the impact of extended overtime in any scientific and objective way, we should start by recognizing that any individual game project must be considered meaningless by itself – it is a single data point, or anecdotal evidence.  We can learn absolutely nothing from whether a single successful or unsuccessful game involved crunch or not, because we cannot know how the project might have turned out if the opposite path had been chosen – that is, if a project that crunched had not done so, or if a project that did not employ crunch had decided to use it.

As the saying goes, you can’t prove (or disprove) a counterfactual – you’d need a time machine to actually know how things would have turned out if you’d chosen differently.

Furthermore, there have undeniably been many successful and unsuccessful games created both with and without crunch.  So we can’t give crunch the exclusive credit or blame for a particular outcome on a single project when much of the credit or blame is clearly owed to other aspects of the game’s development.  To truly measure the effect of crunch, we would need to look at a large sample, ideally involving hundreds of game projects.

Thankfully, the Game Outcomes Project survey has given us exactly that.  In previous articles, we discussed the origin of the Game Outcomes Project and our preliminary findings, and our findings related to team effectiveness and many additional factors we looked at specific to game development.  We also wrote up a separate blog post describing the technical details of our methodology.

In this article, we present our findings on extended overtime based directly on our survey data.

Attitudes Toward Crunch

Developers have surprisingly divergent attitudes toward the practice of crunch.  An interview on gamesindustry.biz quoted well-known industry figures Warren Spector and Jason Rubin:

“Crunch sucks, but if it is seen by the team members as a fair cost of participating in an otherwise fantastic employment experience, if they value ownership of the resulting creative success more than the hardship, if the team feels like long hours of collaboration with close friends is ultimately rewarding, and if they feel fairly compensated, then who are we to tell them otherwise?" asked Rubin.

[…] "Look, I'm sure there have been games made without crunch. I've never worked on one or led one, but I'm sure examples exist. That tells me something about myself and a lot about the business I'm in," said Spector.

[…] "What I'm saying is that games - I'm talking about non-sequels, non-imitative games - are inherently unknowable, unpredictable, unmanageable things. A game development process with no crunch? I'm not sure that's possible unless you're working on a rip-off of another game or a low-ambition sequel.

“[…] Crunch is the result of working with a host of unknown factors in creative mediums. Since game development is always full of unknowns, crunch will always exist in studios that strive for quality […] After 30 years of making games I'm still waiting to find the wizard who can avoid crunch entirely without compromising at a level I'm unwilling to accept.”

On the other side of the fence is Derek Paxton of Stardock, who said in an interview with Gameranx:

“Crunch makes zero sense because it makes games worse. Companies crunch to push through on a specific game, but the long-term effect is that talented developers, artists, producers and designers burn out and leave the industry.

“Companies and individuals should stop wearing their time spent crunching as a badge of honor. Crunch is a symptom of broken management and process. Crunch is the sacrifice of your employees. I would ask them why crunch isn’t an issue with other industries. Why isn’t crunch an issue at all game studios?

“Employees should see it as a failure. Gamers should be concerned about it, because in the long term the hobby they love is losing talent because of it. Companies should do everything in their power to improve their processes to avoid these consequences.”

So who is right – Spector and Rubin, or Paxton?

[Full disclosure: team member Paul Tozour leads Mothership Entertainment, whose flagship game is being published by Stardock.]

In the Game Outcomes Project survey, we provided 3 text boxes at the end that respondents could use to tell us about their industry experiences.  Where they mention crunch, they invariably mention it as a net negative.  One respondent wrote:

“The biggest issue we had was that the lead said ‘Overtime is part of game development’ and never TRIED to improve. As sleep was lost, motivation dropped and the staff lost hope ... everything fell apart.  Hundred-hour weeks for nine months, and I'm not exaggerating.  Humans can't function under these conditions  ...  If you want to mention my answer feel free. I'm sure it'd be familiar to many devs.”

Another developer put it more bluntly:

“Schedule 40 hours a week and you get 38.  Schedule 50 and you get 39 and everyone hates work, life, and you.  Schedule 60 and you get 32 and wives start demanding you send out resumes.  Schedule 80 and you’re [redacted] and get sued, jackass.”

In this article, we will be getting a final word on the subject from the one source that has yet to be interviewed: the data.

The “Extraordinary Effort” Argument

We’ll begin by formulating the “pro-crunch” side of the discourse into testable hypotheses.  Although no one directly claims that crunch is good per se, and no one denies that it can have harmful effects, Spector and Rubin clearly make the case in the article above that crunch is often (if not usually, or even always) a necessary evil.

According to this line of thinking, ordinary development with ordinary schedules cannot produce extraordinary results.  We believe an accurate characterization of this viewpoint from the gamesindustry.biz article quoted above would be: “Extraordinary results require extraordinary effort, and extraordinary effort demands long hours.”

This position (we’ll call it the “extraordinary effort argument”) leads directly to two falsifiable hypotheses:

1. If the “extraordinary effort argument” is correct, there should be a positive correlation between crunch and game outcomes, and higher levels of crunch should show a measurable improvement in the outcomes of game projects.
2. If the “extraordinary effort argument” is correct, there should be relatively few, if any, highly successful projects without crunch.

Luckily for us, we have data from hundreds of developers who took our survey with no preconceptions as to what the study was designed to test, and which we can use to verify both of these statements.  We’ll agree to declare victory for the pro-crunch side if EITHER of these hypotheses remains standing after we put it in the ring with our data set.

Crunching the Numbers

We’ll approach our analysis in several phases, carefully determining what the data does and does not tell us.
Our 2014 survey asked the following five questions related to crunch, which were randomly scattered throughout the survey:
  • “I worked a lot of overtime or ‘crunched’ on this project.”
  • “I often worked overtime because I was required or felt pressured to.”
  • “Our team sometimes seemed to be stuck in a cycle of never-ending crunch / overtime work.”
  • “If we worked overtime, I believe it was because studio leaders or producers failed to scope the project properly (e.g. insufficient manpower, deadlines that were too tight, over-promised features).”
  • “If I worked overtime, it was only when I volunteered to do so.”
Here’s how the answers to those questions correlate with our aggregate project outcome score (described on our Methodology page).  On the horizontal axis, a score of -1.0 is “disagree completely” and a score of +1.0 is “agree completely."

Figure 1Correlation of each crunch-related question with that project’s actual outcome (aggregate score).  Each of the 5 questions is shown, as an animated GIF with a 4-second delay.  Only the horizontal axis changes.

The correlations are as follows: -0.24, -0.30, -0.47, -0.36, +0.36 (in the same order listed in the bullet-pointed list above).  All five of these correlations have statistical p-values well below 0.001, indicating that they are statistically significant.  Note how all the correlations are strongly negative except for the final question, which asked whether crunch was solely voluntary.

“But wait,” a proponent of crunch might say.  “Surely that’s only because you’re using a combined score.  That score combines the values of questions like ‘this project met its internal goals,’ which are going to give you lower values, because they're subjective fluff.  Of course people who are unhappy about crunch are going to give that factor low scores – and that’s going to lower the combined score a lot.  It’s a fudge factor, and it’s skewing your results.  Throw it out!  You should throw away the critical success, delays, and internal goals outcomes and JUST look at return on investment and I bet you’ll see a totally different picture.”

OK, let’s do that:

Figure 2Correlation of each of the 5 crunch-related questions with that project’s return on investment (ROI).  As with Figure 1, each of the 5 questions is shown, as an animated GIF with a 4-second delay.  Only the horizontal axis changes.  Note that many of the points shown represent multiple coincident points.  See our Methodology page for an explanation of the vertical axis scale.

Notice how the lines have essentially the same slopes as in the previous figure.  The correlations with ROI are as follows (in the same order): -0.18, -0.26, -0.34, -0.23, and +0.28.  All of these correlations have p-values below 0.012.

Still not convinced?  Here are the same graphs again, correlated against aggregate reviews / MetaCritic scores.

Figure 3Correlation of each of the 5 crunch-related questions with the project’s aggregate reviews / MetaCritic score (note that the vertical axis does not represent actual MetaCritic scores but is a normalized representation of the answers to this question; see our Methodology page for more info).  As with Figures 1 and 2, each of the 5 questions is shown, as an animated GIF with a 4-second delay.  Note that many of the points shown represent multiple coincident points.  Only the horizontal axis changes.

The results are essentially identical, and all have p-values under 0.05.

So if our combined score has a negative correlation with ALL our crunch questions except the one about crunch being purely voluntary (which itself does not imply any particular level of crunch), that means that we’ve disproven the first part of the “extraordinary effort argument” – the correlation is clearly negative, not positive.
Now let’s look at the second testable hypothesis of the “extraordinary effort argument.”

In Figure 4 (below), we’re looking at the two most relevant questions related to overall crunch for a project.  The vertical axis is the aggregate outcome score, while the horizontal axis represents the scale from “disagree completely” (-1) to “agree completely.”  The black lines are trend lines.  As you can see, in both cases, higher agreement with each statement corresponds to inferior project outcomes.

Figure 4The two most relevant questions related to crunch compared to the aggregate project outcome score.

We’ve added horizontal blue and orange lines to both images.  The blue line represents a score of 80, which will be our subjective threshold for “very successful” projects.  The orange line represents a score of 40, which will be our threshold for “very unsuccessful” projects.

The dots above the blue line tell a clear story: in each case, there were more successful games made without crunch than with crunch.

However, these charts don’t tell the full story by themselves; many of the data points are clustered at the exact same spot, meaning that each dot can actually represent several data points.  So a statistical deep-dive is necessary.  We’re particularly interested the four corners of the chart – the data points above the blue line on the extreme left and right sides of each chart (below -0.6 and above +0.6 on the horizontal axis) and below the orange line on the left and right sides.

Looking solely at the chart on the top of Figure 4 (“I worked a lot of overtime or ‘crunched’ on this project”), we observed the following pattern.  Note that the percentages are given in terms of the total data points in each vertical grouping (under -0.6 or above 0.6 on the horizontal axis).


We can see clearly that a higher percentage of no-crunch projects succeed than fail (17% vs 10%) and a much larger percentage of high-crunch projects fail rather than succeeding (32% vs 13%).  Additionally, a higher percentage of the successful projects are no-crunch than high-crunch (17% vs 13%), while a higher percentage of the unsuccessful projects are high-crunch vs no-crunch (32% vs 10%).

Here’s the same chart, but this time looking at the bottom question, “Our team sometimes seemed to be stuck in a cycle of never-ending crunch / overtime work.”


These results are even more remarkable.  The respondents that answered “disagree strongly” or “disagree completely” were 2.5 times more likely to be working on very successful projects (23% vs 9%), while the respondents who answered “agree strongly” or “agree completely” were, incredibly, more than 10 times more likely to be on unsuccessful projects than successful ones (41% vs 4%).

Some might object to this way of measuring the responses, as it is an aggregate outcome score which takes internal achievement of the project goals into account – and this is a somewhat subjective measure.  What if we looked at return on investment (ROI) alone?  Surely that would paint a different picture.

Here is ROI:

Figure 5The two most relevant questions related to crunch compared to return on investment (ROI).

The first question (top chart) gives us the following results:


The second question (bottom chart) gives us:


These results are essentially equivalent to what we got with Figure 4 -- the probabilities have shifted a little bit but the conclusions haven't changed at all.  The same results hold if we look at MetaCritic scores or any of the other outcome factors we investigated.

For further verification, we did a deep-dive statistical analysis of the data in figures 4 and 5, treating the left and right sides of each graph on each figure (all data points < -0.6 and all those > +0.6) as two separate populations and performing a Wilcoxon rank sum test to compare them.


The p-values of all of these are highly statistically significant, with the top two rows having p-values under 0.006 and the bottom two rows with p-values of 0.

It should be clear that our data set contradicts both of the testable hypotheses that we derived from the “extraordinary effort argument.”  But before declaring victory for Paxton and the anti-crunch side, let’s take a look at the counter-argument.

The “Crunch Salvage Hypothesis”

The counter-argument goes something like this:

“Your correlation is bogus, because crunch is more likely to happen on projects that are in trouble in the first place.  So there’s already an underlying correlation between crunch and struggling projects, and this is skewing your results.  You seem to be saying that crunch causes poorer outcomes, but the causality actually works differently – there’s a third, hidden causal factor (“project being in trouble”) that causes both crunch and lower outcomes.  And although crunch helps improve the situation, it’s never quite enough to compensate for the problems in the first place, which is why you get the negative correlation.”

This position warrants further investigation.  As the Spector/Rubin interview linked above makes clear, there are some developers who are willing to demand crunch even in cases where their projects are not in trouble (“crunch will always exist in studios that strive for quality,” according to Spector), so it’s clear that at least in some cases, crunch is used on projects that are not yet having problems.  But the notion that crunch is more likely on struggling projects is entirely plausible.

Let’s test this counter-argument.  Let’s assume the causation is not A -> B but C -> (A and B), where “A”=crunch, “B”=poorer project outcomes, and “C” represents some vaguely-defined set of factors representing troubled projects.

We’ll call this the “crunch salvage hypothesis” – the idea that crunch is more likely to be used on projects in trouble, and that this “trouble” is itself the cause of the poorer project outcomes, and that when crunch is used in this way, it leads to outcomes that are less poor than would otherwise be the case.

We don’t really care about every part of this hypothesis: we’ll simply accept the first two parts (that trouble can arise on a project, and that crunch often happens as a reaction to this trouble) as self-evident truths (although whether they are correct or not isn't really relevant to this article).

What we really care about, and what we can test, is the third part of this hypothesis – that when crunch is used in this case, it leads to outcomes that are less poor than would otherwise be the case.  In other words, if a project is in trouble, is crunch an effective response?

If the “crunch salvage hypothesis” is correct, then crunch should provide an improved project outcome score beyond what we would expect to see if crunch were not used, all else being equal.

In order to test this conjecture, we calculated a linear regression model that specifically excludes all 5 questions related to crunch/overtime.  We’ll call this model the “crunch-free model.”

Figure 6Correlations for the “crunch-free model” (a linear regression that excludes crunch-related questions) with aggregate game outcome scores.

This “crunch-free model” correlates with our overall outcome score with a correlation value of 0.811 (and a p-value under 0.001).  This is, by any measure, an extremely strong correlation.

We then computed the crunch-free model’s error term – that is, we compared the actual aggregate outcome score to the predicted outcome score given by the crunch-free model for each response by subtracting the predicted score from the actual aggregate outcome score.  A high value indicates that the project turned out better than the model predicted, while a negative error value indicates that the project turned out worse than it predicted.

If we accept that the crunch-free predictive model is a good predictor of game outcomes (and the extremely high correlation and tiny p-value suggest that it is), then the “crunch salvage hypothesis” tells us that we should expect that it should improve the outcomes of game projects where it is used at least to some tiny, observable extent  …  and the more it is used, the more it should improve game project outcomes.

In other words, if crunch works, it should provide a “lift,” and for projects that involved more crunch, we should see a positive error term (that is, game projects that crunched should have turned out better than the crunch-free model predicts), while for projects that involved little or no crunch, we should see a negative error term.
So according to this worldview, there should be a clear, positive correlation between more crunch and a greater positive error value for the crunch-free model.

Here is the correlation for the error term with the answers to each of the two primary crunch-related questions:

Figure 7The two most relevant questions related to crunch, compared to the error value of the crunch-free model.  The vertical axis is the error of the crunch-free model (positive = better than model predicts; negative = worse), and the horizontal axis indicates agreement with each question (-1.0 = disagree completely, +1.0 = agree completely).

As you can see, there is a slight negative correlation.  However, it is not statistically significant (p-value = 0.24 for the upper graph, and 0.1 for the lower one).  And even if it were statistically significant, the correlations – at -0.07 and -0.1, respectively – are negative.

So where the “crunch salvage hypothesis” tells us to expect correlations that are strong, positive, and statistically significant, we see correlations that are weaknegative, and statistically insignificant.
Testing all of the other crunch-related questions in this way gives us similar results.

If we accept the assumptions that went into calculating these correlations, then we must conclude that more crunch did not, to any extent that we can detect, help the projects in our study achieve better outcomes than they otherwise would have experienced  …  and in many ways appears to have actually made them worse.
We are left to conclude that crunch does not in any way improve game project outcomes and cannot help a troubled game project work its way out of trouble.

Voluntary Crunch

But what about when crunch is voluntary?  Our analysis has already indicated that a when crunch is entirely voluntary, outcomes significantly improve.  Does a lack of mandatory crunch then eliminate the negative effects of the quantity of crunch?  In other words, do higher levels of voluntary crunch then turn crunch from a net negative into a net positive?

In short, no.  We compared the two extremes of our primary crunch question (we categorized the highest two answers to “I worked a lot of overtime …” as “High” crunch, and the lowest two as “Low” crunch) against our question about whether crunch was purely voluntary (where we condensed all 7 answers into three 3 broad categories -- the top two as “Voluntary,” the bottom two as “Mandatory,” and the middle 3 as “Mixed”).  We also compared these categories using Kruskal-Wallis to prove statistical significance.


Our analysis shows that although crunch seems to be significantly less harmful when it’s voluntary, low levels of crunch in each case above (voluntary, mandatory, and mixed) are consistently associated with better outcomes than high levels of crunch.

What Causes Crunch?

The conclusions above led us to ask: what actually causes crunch?  The Spector/Rubin interview above clearly illustrates the attitudes that cause at least some developers to demand extended overtime, but we were curious what the data said.

If crunch doesn’t correlate with better outcomes, what does it correlate with?  Does it really derive from a desire for excellence, or is it a reaction to a project being in trouble, or do its roots lie elsewhere?

To find out, we analyzed the correlations of all the input factors in our survey against one another, looking specifically at how factors outside of our group of five crunch-related questions correlated with the five crunch questions.  The four strongest correlations with our crunch-related questions were:
  • +0.51: “There was a lot of turnover on this project.”
  • +0.50: “Team members would often work for weeks at a time without receiving feedback from project leads or managers.”
  • +0.49: “The team’s leads and managers did not have a respectful relationship with the team’s developers.”
  • -0.49: “The development plan for the game was clear and well-communicated to the team.”
(The three positive correlations indicate that they made crunch more likely; the negative correlation is the one that makes crunch less likely).

This seems to indicate that crunch does not, in fact, derive from any sort of fundamental drive for excellence, which would have resulted in higher correlations with completely different input factors on our survey.  Rather, it appears to stem from inadequate planning, disorganization, high turnover, and a basic lack of respect for developers.

Conclusion: We Are Not a Unique And Special Snowflake

We should be clear that we are not attempting to write an academic paper, and our results have not been peer-reviewed.  Therefore, we walk a fine line between analyzing the data and interpreting it.

However, no matter how we analyze our data, we find that it loudly and unequivocally supports the anti-crunch side.  Our results are clear enough and strong enough that we believe it’s important to step over that fine line, and transition from objective analysis to open advocacy.

There is an extensive body of validated management research available showing that extended overtime harms healthproductivity, relationships, morale, employee engagement, decision-making ability, and even increases the risk of alcohol abuse.

An enormous amount of validated management research demonstrates that net employee productivity turns negative after just a few weeks of overtime.  Total productivity actually declines by 16-20% as we increase our work days from 8 hours to 9 hours.  Even just a few weeks of working 50 hours per week reduces cumulative output below what it would have been working only 40 hours per week – those 10 extra hours of work actually have a negative impact on productivity.  All of that while also increasing employee stress, straining relationships, and increasing product defect rates.

However, the game industry is remarkably insular for such a cutting-edge and successful industry, and it seems generally unaware of this data.  We tend to ignore such evidence or blithely assume it doesn't apply to us.  As a broad generalization, our industry tends to value industry experience highly while undervaluing fundamental management skills.  As a result, we usually promote managers from within while rarely offering the kind of management training that would enable insiders to perform their jobs adequately.

Is it any wonder, then, that we find ourselves completely cut off from the plethora of validated management research clearly showing that crunch is harmful?

The hundreds of anonymous respondents who participated in our survey answered various questions about game development factors and outcomes separately and individually, without any real clue as to the broader objectives of our study.  Simply correlating their aggregate answers shows overwhelmingly that crunch is a net negative no matter how we analyze the data.  It’s not even a case of small amounts of crunch being helpful and then turning harmful; we see no convincing evidence of hormesis.

It’s common knowledge that crunch leads to higher industry turnover and loss of critical talent, higher stress levels, increased health problems, and higher defect rates – and quite often, broken or deeply impaired personal relationships.  Those who feel that crunch is justified freely admit to knowing this, but they don’t necessarily care about any of these harmful side-effects enough to avoid using it, as they continue to cling to the notion that “extraordinary results require extraordinary effort.”

However, this notion appears to be a fallacy, and our analysis suggests that if the industry is to mature, we must cast it aside.

Our results clearly demonstrate that crunch doesn't lead to extraordinary results.  In fact, on the whole, crunch makes games LESS successful wherever it is used, and when projects try to dig themselves out of a hole by crunching, it only digs the hole deeper.

Perhaps the notion that “extraordinary results require extraordinary effort” is misguided.

Perhaps “effort” – as defined by working extra hours in an attempt to accomplish more – is actually counterproductive.

Our study seems to reveal that what actually generates “extraordinary results” – the factors that actually make great games great – have nothing to do with mere “effort” and everything to do with focus, team cohesion, a compelling directionpsychological safetyrisk managementand a large number of other cultural factors that enhance team effectiveness.

And we suggest that abuse of overtime makes that level of focus and team cohesion increasingly more difficult to achieve, eliminating any possible positive effects from overtime.

We welcome open discourse and debate on this subject.  Anyone who wishes to double-check our results is welcome to download our data set and perform their own analysis and contact us at @GameOutcomes on Twitter with any questions.

The Game Outcomes Project team would like to thank the hundreds of current and former game developers who made this study possible through their participation in the survey.  We would also like to thank IGDA Production SIG members Clinton Keith and Chuck Hoover for their assistance with survey design; Kate Edwards, Tristin Hightower, and the IGDA for assistance with promotion; and Christian Nutt and the Gamasutra editorial team for their assistance in promoting the survey.

Saturday, January 17, 2015

The Game Outcomes Project, Part 3: Game Development Factors

This article is the third in a 5-part series.
The Game Outcomes Project team includes Paul Tozour, David Wegbreit, Lucien Parsons, Zhenghua “Z” Yang, NDark Teng, Eric Byron, Julianna Pillemer, Ben Weber, and Karen Buro.

The Game Outcomes Project, Part 3: Game Development Factors

The Game Outcomes Project was a large-scale study of teamwork, culture, production, and leadership in game development conducted in October 2014.  It was based on a 120-question survey of several hundred game developers, and it correlated all of those factors against a set of four quantifiable project outcomes (project delays, return on investment (ROI), aggregate reviews / MetaCritic ratings, and the team’s own sense of satisfaction with how the project achieved its internal goals).  Our team then built all of these four outcome questions into an aggregate “score” value representing the overall outcome of the game project, as described on our Methodology page.

Previous articles in this series (Part 1Part 2) introduced the Game Outcomes Project and showed very surprising results.  While many factors that one would expect to contribute to differences in project outcomes – such as team size, project duration, production methodology, and most forms of financial incentives – had no meaningful correlations, we also saw remarkable and very clear results for three different team effectiveness models that we investigated.

Our analysis uncovered major differences in team effectiveness, which translate directly into large and unmistakable differences in project outcomes.

Every game is a reflection of the team that made it, and the best way to raise your game is to raise your team.
In this article, we look at additional factors in our survey which were not covered by the three team effectiveness models we analyzed in Part 2, including several in areas specific to game development.  We included these questions in our survey on the suspicion that they were likely to contribute in some way to differences in project outcomes.  We were not disappointed.

Design Risk Management

First, we looked at the management of design risk.  It’s well-known in the industry that design thrashing is a major cost of cost and schedule overruns.  We’ve personally witnessed many projects where a game’s design was unclear from the outset, where preproduction was absent or was inadequate to define the gameplay, or where a game in development underwent multiple disruptive re-designs that negated the team’s progress, causing enormous amounts of work to be discarded and progress to be lost.

It seemed clear to us that the best teams carefully manage the risks around game design, both by working to mitigate the repercussions of design changes in development, and by reducing the need for disruptive design changes in the first place (by having a better design to begin with).

We came up with a set of 5 related questions, shown in Figure 1 below.  These turned out to have some of the strongest correlations of any questions we asked in our survey.  With the exception of the two peach-colored correlations for the last question (related to the return-on-investment outcome and the critical reception outcome for design documents), all of these correlations are statistically significant (p-value under 0.05).

Figure 1Questions around design risk management and their correlations with project outcomes.  The “category score” on the right is the highest absolute value of the aggregate outcome correlations, as an indication of the overall value of this category.  “Not S.S.” indicates correlations that are not statistically significant (p-value over 0.05).

Clearly, changes to the design during development, and the way those design changes were handled, made enormous differences in the outcomes of many of the development efforts we surveyed.
However, when they did occur, participation of all stakeholders in the decision to make changes, and clear communication and justification of those changes and the reasons for them, clearly mitigated the damage.

[We remind readers who may have missed Part 2 that negative correlations (in red/orange) should not be viewed a bad thing; on the contrary, questions asked in a “negative frame,” i.e., asking about a bad thing that may have occurred on the project, should have a negative correlation with project outcomes, indicating that a lower answer (stronger disagreement with the statement) correlated with better project outcomes (better ROI, fewer delays, higher critical reviews, and so on).  What really matters is the absolute value of a correlation: the farther a correlation is from 0, the more strongly it relates to differences in project outcomes, and you can then look at the sign to see whether it contributes positively or negatively.]

Somewhat surprisingly, our question about a design document clearly specifying the game to be developed had a very low correlation – below 0.2.  It also had no statistically significant correlation (p-value > 0.05) with ROI or critical reception / MetaCritic scores.  This is quite surprising, as it suggests design documents are far less useful than generally realized.  The only area where they show a truly meaningful correlation is with project timeliness.  This seems to suggest that while design documents may make a positive contribution to the schedule, anyone who believes that they will contribute much to product success from a critical or ROI standpoint by themselves is quite mistaken.

We should be clear that our 2014 survey did not ask any questions related to the project’s level of design innovation.  Certainly, it’s much easier to limit design risk if you stick to less ambitious one-off games and direct sequels.  We don’t want to sound as if we are recommending that course of action.

For the record, we do believe that design innovation is enormously important, and quite often, a game’s design needs to evolve significantly in production in order to achieve that level of innovation.  Our own subjective experience is that a desire for innovation needs to be balanced against cautious management of the enormous risks that design changes can introduce.  We plan to ask more questions in the area of design innovation in the next version of the survey.

Team Focus

Managing the risks to the design itself is one thing, but to what extent does the team’s level of focus – being on the same page about the game in development, and sharing a single, common, vision – impact outcomes?

Figure 2Questions around team focus and their correlations.

The strong correlations here are not too surprising; these tie in closely with the design risk management topic above, as well as our questions about “Compelling Direction,” the second element of Hackman’s team effectiveness model from Part 2.  As a result, the correlations here are very similar.  It’s clear that successful teams have a strong shared vision, care deeply about the product vision, and are careful to resolve disagreements about the game’s design quickly and professionally.

It’s interesting to note that the question “most team members cared deeply about the vision of this game” showed a wide disparity of correlations.  It shows a strong positive correlation with critical reviews and internal goal achievement, but only a very weak correlation with project timeliness.  This seems to indicate that while passion for the project makes for a more satisfied team and a game that gets better review scores, it has little to do with hitting schedules.

Crunch (Extended Overtime)

Our industry is legendary (or perhaps “infamous” is a better word) for its frequent use of extended overtime, i.e. “crunch.”  But how does crunch actually correlate with project outcomes?

Figure 3Questions around crunch, and related correlations.

As you can see, all five of our questions around crunch were significantly correlated with outcomes – some of them very strongly so.  The one and only question that showed a positive correlation was the question asking if overtime was purely voluntary, indicating the absence of mandatory crunch.

Even in the area where you might expect crunch would improve things – project delays – crunch still showed a significant negative correlation, indicating that it did not actually save projects from delays.

This suggests that not only does crunch not produce better outcomes, but it may actually make games worse where it is used.

Crunch is an important topic, and one that is far too often passionately debated without reference to any facts or data whatsoever.  In order to do the topic justice – and hopefully lay the entire “debate” to rest once and for all – we will dedicate the entirety of our next article to further exploring these results, and we’ll don our scuba gear and perform a “deep-dive” into the data to ferret out exactly what our data can tell us about crunch and its effects.

At the very least, we hope to provide enough data that future discussions of crunch will rest far less on opinion and far more on actual evidence.

Team Stability

A great deal of validated management research shows clearly that teams with stable membership are far more effective than teams whose membership changes frequently, or those whose members must be shared with other teams.  Studies of surgical teams and airline crews show that they are far more likely to make mistakes in their first few weeks of working together, but grow continuously more effective year after year as they work together.  We were curious how team stability affects outcomes in game development.

Figure 4Questions around team stability and their correlations to project outcomes.

Surprisingly, our question on team members being exclusively dedicated to the project showed no statistically significant correlations with project outcomes.  As far as we can tell, this just doesn’t matter one way or the other.

However, our more general questions around project turnover and reorganization showed strong and unequivocal correlations with inferior project outcomes.

At the same time, it’s difficult to say for sure to what extent each of these is a cause or an effect of problems on a project.  In the case of turnover, there are plenty of industry stories that illustrate both: there have been plenty of staff departures and layoffs due to troubled projects, but also quite a few stories of key staff departures that left their studios scrambling to recover – in addition to stories of spiraling problems where project problems caused key staff departures, which caused more morale/productivity problems, which led to the departure of even more staff.

We hope to analyze this factor more deeply in future versions of the survey (and we’d like to break down voluntary vs involuntary staff departures in particular).  But for now, we’ll have to split the difference in our interpretation.  As far as we can tell from here, turnover and reorganizations are both generally harmful, and wise leaders should do everything in their power to minimize them.

Communication & Feedback

We included several questions about the extent to which communication and feedback play a role in team effectiveness:
Figure 5Questions around communication and their correlations.

Clearly, regular feedback from project leads and managers (our third question in this category) is key – our third question ties in very closely with factor #11 in the Gallup team effectiveness model from Part 2, with virtually identical correlations with project outcomes.  Easy access to senior leadership (the second question) is also clearly quite important.

Regular communication between the entire team (the first question) is somewhat less important but still shows significant positive correlations across the board.  Meanwhile, our final question revealed no significant differences between cultures that preferred e-mail vs face-to-face communication.

Organizational Perceptions of Failure

2012 Gamasutra interview with Finnish game developer Supercell explained that company’s attitude toward failure:

"We think that the biggest advantage we have in this company is culture.  […]  We have this culture of celebrating failure.   When a game does well, of course we have a party. But when we really screw up, for example when we need to kill a product – and that happens often by the way, this year we've launched two products globally, and killed three – when we really screw up, we celebrate with champagne. We organize events that are sort of postmortems, and we can discuss it very openly with the team, asking what went wrong, what went right. What did we learn, most importantly, and what are we going to do differently next time?"

It seems safe to say that most game studios don’t share this attitude.  But is Supercell a unique outlier, or would this attitude work in game development in general if applied more broadly?

Our developer survey asked six questions about how the team perceived failure on a cultural level:

Figure 6Questions around organizational perceptions of failure and their correlations.

These correlations are quite significant, and nearly all of them are quite strong.  More successful game projects are much more likely encourage creative risk-taking and open discussion of failure, and ensure that team members feel comfortable and supported when taking creative risks.

These results tie in very closely with the concept of “psychological safety” explained Part 2, under the “Supportive Context” section of Hackman’s team effectiveness model.

Respect

Extensive management research indicates that respect is a terrifically important driver of employee engagement, and therefore of productivity.   A recent HBR study of nearly 20,000 employees around the world found that no other leader behavior had a greater effect on outcomes.  Employees who receive more respect exhibit massive improvements in engagement, retention, satisfaction, focus, and many other factors.

We were curious whether this also applied to the game industry, and whether a respectful working environment contributed to differences between failed and successful game project outcomes as well.  We were not disappointed.

Figure 7Questions around respect, and related correlations.

All three of our questions in this category showed significant correlations with outcomes, especially the question about respectful relationships between team leads/managers and developers.

Clearly, all team members -- and leads/managers in particular -- should think twice before treating team members with disrespect: they are not only hurting their team, but hurting their own game project and their own bottom line.

Project Planning

We asked a number of questions around the ways different aspects of project planning affected outcomes:

Figure 8Questions around project planning and their correlations.

Clearly, deadlines and accountability are important, as the positive correlation of the last question shows.  Accountability is obviously a net positive.

However, teams that took deadlines too seriously, and treated them as matters of life and death (question #4) experienced a negative correlation with project outcomes, while having no statistically significant correlation with project timeliness.  This clearly indicates that treating deadlines as matters of life and death not only fails to make a positive contribution to the schedule, it is actually counterproductive in the long run.

This seems to be telling us that taking deadlines too seriously can be harmful, and high-pressure management tactics are likely to backfire.  We speculate that successful teams balance their goals for each milestone against the realities of production, the need for team cohesion, and the pragmatism to sometimes sacrifice or adjust individual milestone goals for the good of the overall project.

Surprisingly, daily task re-estimation (question #5) also shows no significant correlation with timeliness, although it does have a weak positive correlation with all the other outcome factors.

Furthermore, detailed planning (question #1), while positively correlated with both ROI and timeliness, seems to have no statistically significant correlation with critical reception or internal goal achievement: this is clearly useful for project timeliness, but we speculate that this can also lock the team into a fixed, brittle development plan that can tempt teams to focus on the lesser good of schedule integrity over the greater good of product quality, and can sometimes stifle opportunities for improving the game in development.

The most unambiguous findings here are that accurate estimation (question #2) and a reasonable level of accountability (question #6) both contribute positively to all outcome factors.

Technology Risk Management

We were also curious about risks around technology.  How did major technology changes and the management of those changes affect outcomes, and did the team participate in any sort of code reviews or pair programming?  The game industry has countless stories to tell of engine changes or major technology overhauls that either caused project delays or even contributed to outright project failure or cancellation.

Figure 9Questions around technology risk management, and their correlations with project outcomes.

Here, too, we see some very strong correlations.  Question #1 shows that major technology revamps in development can introduce a great deal of project risk, while question #3 seems to indicate that the communication of those technology changes is even more important.

However, whether these decisions are driven by internal or external changes does not appear to be relevant, and while code reviews and pair programming are clearly positively correlated and statistically significant, the correlation is a relatively weak one, at under 0.2, and shows no statistically significant correlation with the project’s critical reception or achievement of its internal goals.  Although there is significant evidence that code reviews reduce defects and improve a team’s programming skills, we were surprised that these correlations are not higher, and we suspect it may be related to the way the reviews are carried out in addition to the team’s experience level -- deeper analysis reveals this factor is much more significant with more experienced teams.  We plan to investigate these more thoroughly in the next version of the study.

Production Methodologies

In Part 1, we revealed the rather shocking discovery that the specific production methodology a team uses – waterfall, agile, scrum, or ad-hoc – seems to make no statistically significant difference in project outcomes.  However, we also asked a number of additional questions on the topic of production methodologies:

Figure 10Questions around production methodologies and their correlations.

Here, we can see clearly that training in production methodologies, efforts to improve them, and involving the entire team in prioritizing the work for each milestone are all significantly correlated (>0.2) with positive project outcomes.  However, our questions about daily production meetings and re-prioritization for each milestone showed relatively low correlations.

We see no statistically significant correlation of the last question (regarding re-prioritization in each milestone) with project delays, but a small positive correlation with the other three outcomes.  It seems reasonable to assume that this indicates that while re-prioritization at each milestone increases product quality, it sometimes does so at the expense of the schedule.

We also further attempted to verify our controversial finding that production methodology used makes no difference by re-evaluating production methodologies only for respondents that replied that their teams were well-trained in their studio’s production methodology (i.e. they answered “Agree Strongly” or “Agree Completely” to the first question in this category).  Here, too, we found no statistically significant differences between waterfall, agile, and agile using Scrum.

This analysis appears to reinforce our earlier finding that the particular production methodology being used matters very little; what matters is having one, stick to it, and properly training your team to use it.

Collaboration & Helpfulness

We’ve personally experienced many different types of team cultures where helpfulness was treated as a virtue or a vice.  Some encourage a “sink or swim” attitude, and deliberately force new hires to learn the ropes on their own; others go out of their way to encourage and reward collaboration and helpfulness among team members.  We were curious about the effect of these cultural differences on project outcomes.

Figure 11Questions around collaboration and helpfulness and their correlations.

Although the correlations to the individual outcomes here are relatively weak, the correlations with the aggregate outcome are unambiguously positive.

Note that there is no statistically significant correlation between the second question and the outcome factor for project timeliness.  We speculate that some teams may spend too much time and energy obsessing over their issues and challenges, which can become a time sink or a source of negativity if carried out to unhealthy extremes.

Outsourcing and Cross-Functional Teams

We asked two additional categories of questions.  One category related to the use of contractors, temporary workers, and outsourcing:

Figure 12Questions around outsourcing and their correlations.

We saw no statistically significant correlations regarding outsourcing, and as far as our data set can tell, this has no identifiable impact on project outcomes.  It seems much more likely that any effects of outsourcing have much more to do with the quality of the contractors or outsourced labor, the way outsourcing is integrated into the team, its cost, and the quality of the coordination of the outsourced labor, all of which were outside the scope of the 2014 survey.

The other category related to whether sub-teams were divided up by discipline (art, programming, design) or were organized into cross-functional sub-teams, each combining several disciplines.

Figure 13Questions around cross-functional teams and their correlations.

We also observed no correlations for cross-functional or per-discipline teams, leading us to conclude that there is probably no “right” answer here.  If there is any utility in adopting one team structure or the other, the factors involved were outside the scope of the questions asked in our study.

Conclusions: Best Practices

The previous article illustrated that three very different team effectiveness models all correlate strongly with game project outcomes.  We found that team effectiveness is tied to having a compelling direction and a shared vision, an enabling structure and supportive context, a connection with the mission of the organization, regular feedback, a deep level of trust and commitment within the team, belief in the mission of the organization, and the essential element of “psychological safety” that allows team members to feel comfortable taking interpersonal risks.

In addition to those factors, all but the last two of the factors outlined in this article showed significant correlations as well.  Those that showed no correlations are just as noteworthy as those that did.
For convenience, we deliberately ordered the section descriptions listed in this article in order from strongest to weakest correlation.  To summarize:
  • Design risk management showed the strongest correlations, with a correlation over 0.57.
  • Team focus came in a close second, at 0.50.
  • Avoidance of crunch was in third place, at 0.44.
  • After that, team stabilitycommunicationorganizational perceptions of failurerespectproject planning, and technology risk management were also very important, all with correlations between 0.36 and 0.39.
  • Production methodologies and collaboration/helpfulness came in last but were still significant, at 0.29 and 0.20, respectively.
  • Outsourcing and the use of cross-functional teams showed no statistical significance.  These do not seem to impact project outcomes in any general sense as far as our survey was able to detect.
Finally, in order to help teams make the best use of these results, we’ve created an interactive self-reflection tool to help teams conduct systematic post-mortems and identify their best opportunities for growth.

Self-Reflection Tool

The Self-Reflection Tool is an interactive Excel spreadsheet that includes the 38 most relevant questions from our survey, along with five linear regression models (and one for each individual outcome factor and one for the aggregate outcome score).  To use it, you can simply open it and answer the 38 questions highlighted in yellow on the primary worksheet.  It will then forecast your team’s likely ROI, critical success, chance of project delays, and chance to achieve the project’s internal goals.  It will also suggest your team’s most likely avenues for improving its odds of a positive outcome.



For an even better analysis, print out the questions, ask your fellow team members to take the survey anonymously, and then average the results.

You can download the self-reflection tool here.

Conclusion

By comparing hundreds of game projects side-by-side, the Game Outcomes Project has given us a unique perspective on game development.  In the process, it has uncovered quite a few surprises.  The study has shown clearly which factors contribute to success or failure in game projects and pointed the way toward many future avenues of research, which are listed on our Methodology page.

We believe that this kind of systematic, objective, data-driven approach to identifying the links between common practices on game development teams and discrete project outcomes points the way toward a new approach to defining industry standards and best practices, and hopefully helps lay some persistent fallacies and popular misapprehensions to rest.  In the future, we hope to extend the study with a larger number of participants and continue to refine and evolve it annually.

More than anything else, we hope that this project will help teams and their leadership see more clearly that the differences in teamwork and culture across teams are simply massive, and we have demonstrated that these differences have an overwhelming impact on the games we make and our success or failure as organizations.

Although there is always an element of risk involved, the lion’s share of your own destiny remains within your own control.



If you want to improve your team's odds of success, the factors we examined in this study are probably a very good starting point.

Future Work

Stay tuned for our fourth article, due in one week, in which we will tackle the tricky and pervasive topic of crunch.  We will analyze the data from a number of angles and we will see that it makes a clear and unambiguous case with regard to extended overtime.  Anyone considering subjecting their team to “crunch” in the hope of raising product quality or making up for lost time would be well-advised to read it carefully.

By popular demand, we will also be releasing an ordered summary of our findings as Part 5 one week after that.

The Game Outcomes Project team would like to thank the hundreds of current and former game developers who made this study possible through their participation in the survey.  We would also like to thank IGDA Production SIG members Clinton Keith and Chuck Hoover for their assistance with survey design; Kate Edwards, Tristin Hightower, and the IGDA for assistance with promotion; and Christian Nutt and the Gamasutra editorial team for their assistance in promoting the survey.

For further announcements regarding our project, follow us on Twitter at @GameOutcomes