act:ualise | technology

agile software development, software quality, scaling, testing and other tech

The curse of performance test automation?

| 4 Comments

In our last three projects we automated our performance testing extensively. This involved significant effort, which paid off for the 2 projects already delivered: meeting performance expectations owed a great deal to our performance improvements.

Interestingly after the bulk of those improvements were delivered and the product made live, the performance automation testing aspects fell slowly into dis-use from lack of maintenance. Why? Maybe because after the big performance wins had been made it seemed that the remaining value to be gained was far less than the effort required in maintaining the automation.

However, there are inherent risks to discontinuing performance test automation:

  • Developers could introduce poorly performing code into the system at any time
  • New code, while itself not patently poor in performance, tends to gradually erode performance over time
  • Unpredictable user behaviour generating unexpected load
  • External forces, eg:
    • Unforseen marketing/ad campaigns
    • Unanticipated user volumes

Without the performance test automation we once had, aren’t we flying blind, implicitly turning production into our performance test environment?

What we’ve always done

Its worth a quick digression to discuss how we’ve tested performance to date. We’ve generally tended towards playback driven load generation against a performance test environment which is as identical to live as possible (this is usually termed record/playback: our ‘recordings’ are typically simulated usage journeys described in a test, rather than recording UI interaction for a tool to playback).

agile_perf_testing-11

Even though this has delivered value in our previous performance improvement cycles, there are nevertheless issues worth mentioning:

  • High upfront costs
    • Load generation tools must be designed and built
    • The entire testing environment must be complete and in place
  • When load is generated against the monolith, the sum of all moving parts – when subject to concurrent load – creates the greatest possible variability in results
    • Hard to pinpoint gradually introduced problems; you still see spikes for major performance issues
    • This increased variability prevents our triggering performance threshold failures
  • Without failure triggers, developers must be responsible for routinely checking the generated results for changes to performance
  • Implementation of performance record/playback tests is complex and their execution heavyweight because they are usage simulations

Agile performance testing

I think we could improve on our performance testing by addressing some other issues inherent to the development process. Unit performance testing should be integral to the development process:

  • Ensure that performance acceptance criteria are defined as part of the story card where this need is identified
  • Develop unit performance tests to supplement the more heavyweight record/playback tests when performance criteria exist
  • Baseline both unit performance and record/playback tests as early as possible

To quote Alexander Podelko, “During unit testing different variables such as load, the amount of data, security, etc. can be reviewed to determine their impact on performance. In most cases, test cases are simpler and tests are shorter in unit performance testing. There are typically fewer tests with limited scope; e.g., fewer number of variable combinations than we have in a full stress and performance test.”

Since these are easier to write and more focused they can initially be developed in the absence of the full performance testing infrastructure or load generation tool. Furthermore, as the data generated will be considerably less variable, we could trigger performance threshold-based test failures prior to check-in.

Any such failure would mandate that developers review their changes. If the change justifies more intensive resource utilisation, the existing performance criteria defined for that function are modified, in agreement with the product owner and QA where appropriate. If the change exposes poorly implemented code, the respective performance improvements would be made.

An added benefit of automating both unit performance tests and record/playback tests is traceability. A record/playback test can be correlated to numerous smaller performance unit tests and vice versa – their combined value in the rapid diagnosis of performance problems before production is greater than the sum of the parts.

Another aspect of traceability comes from having the earliest possible baselines set by writing unit performance tests in parallel with code. This extensive history could provide an accurate and detailed picture of the performance of functions through the course of their development.

This article in two parts by Scott Barber (from which the above diagram is taken) provides a good overview of how to think about implementing an Agile performance testing strategy.

In my next article I explore ideas and techniques to assist developers in designing and writing meaningful unit performance tests by decomposing a seemingly monolithic system into smaller testable parts.

4 Comments

  1. I think your approach makes a lot of sense. The principle could is, I believe general: You need unit-level and system level variations of tests in general, and performance in just one such aspect.

    I want all errors to be caught be the earliest possible tests. Additional tests provide a safety net, but the unit tests (pre-checkin tests) are the cheapest and best way to catch defects.

    An approach that worked for us in a batch-oriented setting, but that might be generalizable to web applications: Record traffic from production and replay in test system. We did this for *all* requests in our system. Have you considered something like this?

    I was wondering if you could provide good examples of unit-level performance tests? Do you use *Unit for these, or some other tool? Are there libraries that you could recommend?

  2. With regard to your suggestion to use recorded behaviour from production to reply, we have used that data before, as you suggest. That said, our problem has often been one of being a new system with no prior usage information! :)

    A specific unit performance testing library I am currently re-evaluating is JUnitPerf. As and when we find and develop more concrete examples of unit performance tests I will update the blog to cite those examples

  3. Jerome, Sanjiv Augustine forwarded to me your blog announcement. It seems to me that there is another common performance problem that doesn’t seem to be taken into account here. That’s the problem of non-linear response as a result of reaching some system limit, such as the number of database connections. Perhaps there is some way to also track the usage of resources to detect when they are being held more frequently or for longer periods, and thus presenting a potential problem.

  4. Hi, Jerome (and sorry about the long delay in responding):

    I think it’s wonderful to look at performance in the granular way that you propose; after all, many problems get introduced without being noticed at the lowest levels of the application.

    That said, I share George’s concern. The units themselves are certainly potential sources of performance problems, but ultimately performance as it matters to your customer is a system-level notion. That means that not only the units themselves, but also the interactions between them are important. Worse, at a certain point “our system” starts to interact with “other systems”, such that we see unanticipated, unpredictable, and decidedly non-linear behaviours. For that reason, one thing that I think people should emphasize while doing performance testing is to simulate the abnormal. Drive lots of exceptional conditions, overwhelm the product with data, deprive it of resources that it needs, overfeed it, starve it, rain on it, overheat it, sunburn it, and huff and puff to blow it down.

    The big theme for me these days is not repeatability; that’s pretty easy. What we need to demonstrate for performance and reliability is adaptability, and in a performance context, that requires intense and intentional variation.

    If you’ve never read The Fifth Discipline, I’d recommend it. Senge describes The Beer Game, an exercise that he does with his workshop participants (you should find the title alone to be appealing). The big lesson of the game for me is to remember, for any system, the lag time between input and output. It’s easy to forget that it’s not simultaneous, and that the controlling function may vary its own performance in response to circumstance, which in turn has implications for the system. The game shows very well that systems can be set up to correct themselves, but there’s always a seed of over-correction which gets amplified by the lag. I suspect that more performance problems than we recognize appear there.

    I’d also suggest that you check out An Introduction to General Systems Thinking, and read it with an expansive notion of how you might apply it to performance testing. I have General Principles of System Design (formerly titled On the Design of Stable Systems), but haven’t read it; so many books, so little time.

    All that said, I don’t think it’s necessarily a bad thing to let maintenance of the performance tests slip as long as you don’t have ongoing questions that you want to ask about performance. If, as you say, “the remaining value to be gained was far less than the effort required in maintaining the automation” then, as a tester, I would suggest that you divert your attention to things that add value–and to questions about things that threaten that value.

    —Michael B.

Leave a Reply

Required fields are marked *.

*