The Future of Test and Evaluation: Looking Back from the Future

I didn’t notice the change. I couldn’t point to the time or place when it happened. But I know what our systems were like, and I can recognise them now. I just don’t know when they changed.

Yet it happened quickly. Systems even co-opted our language. Initially they were just integrated, but then integrated moved up a level to slip federated in under it. Complicated and complex were synonyms and now they are distinct. It’s not a bad thing, but now they are different. The change has been both insidious and too fast, challenging the language we use to express a common understanding.

Our systems have changed from being discrete tools, to being integrated into our workplaces – extending into our lives. Academics have led the charge, helping us grapple with the concept of machines doing things for us that we cannot do ourselves. Rasmussen (1997) noted the ability of data technologies to create complexity, identifying that complex systems have emergent functions that are not from any component part. Sterman (2000) described dynamic changes in a machine creating more complexity than merely more combinations, while Snowden & Boone (2008) differentiated clear from complicated, complex and chaos. But it was Leveson (2012) who delivered the clanging truth that genuinely complex systems border on the intellectually unmanageable.

Now we have researchers using prosthesis (Boy, 2014) as an adjective to differentiate from tool when a system extends capability beyond what was possible for unaugmented humans. Our systems have changed, crossing a line that I cannot draw. They were tools, but now they can do things that we didn’t intend.

What hasn’t kept pace are our assurance practices.

Exactly because our systems can do things that we didn’t intend, we need assurance of what they can. For Australia’s numerically small Air Force, assurance serves to confirm military capability against expectations since there are no squadrons of aircraft in reserve. For a small force at the end of supply chains, assurance ensures systems are supportable and available in our environment. Assurance is the unique differentiator for our small force, the certainty that allows us to forgo the reassurance of a larger force of just-in-case options; the check of quality that allows us to avoid pursuing quantity. That assurance of military capability has been provided through the test and evaluation process for decades; a product of the 1950’s evolution of Systems Engineering.

But 1950’s test and evaluation is not keeping pace. The use of waterfalls and spirals to accelerate the Systems Engineering V-process were 1990’s attempts to bring capability to bear faster in the face of test and evaluation being bogged down in complicated systems. Testing all the combinations of system modes in each environment takes a while, but at least we had an end point when each possible combination was identified. But the increasing level of complication didn’t stop, and Systems Engineering added Agile development methodologies to speed up the evolution even more.

Agile development methodologies challenge the ability of test and evaluation (T&E) practices to deliver assurance of military capability in operationally relevant acquisition timeframes. The challenge is two-fold: test and evaluation takes time, time that isn’t available in the development cycle before the system is obsolete. To that, add the second element: systems have achieved complexity, defined as exhibiting emergent properties.

They are no longer the sum of their parts – they are more. This conflicts with the decomposition premise of traditional T&E – that a complicated system can be broken down into constituent parts so that capabilities can be tested and evaluated individually to support an operational decision. Yet, breaking down a complex system occludes the emergent properties since they do not reside in any sub-component. Implementing traditional T&E upon a truly complex system becomes an exercise in brute force that is ultimately futile; enacted with compromises and an exponentially increasing work and time burden.

Behind the concept of assurance lies the question of risk management, with assurance practices being used to reduce risk that would require acceptance. Prior practice in the Australian Defence space, particularly aviation, was to adopt a conservative approach enacted through airworthiness regulations and implemented with a good deal of assurance activity, of which test and evaluation was the graduation requirement. This balance of assurance and risk acceptance has been undertaken in an environment where military technologies were unique, but this is also changing.

Civil technologies have evolved and their development processes have diverged from the military practices that were the benchmark. Civil technologies now present desirable capability, and civil development practices operate on a faster cycle that is enviable.

But there is an acute difference: civil development paths are tolerant of risk transfer from the developer to the operator, though arguably are also ignorant of that transfer (Tesla’s Autopilot function being the poster-child for public beta testing).

Beta versions, Agile development, and other similar contemporary development practices all encapsulate a risk transfer that wasn’t part of the originally envisaged assurance provided by test and evaluation on a complicated system. Pursuit of new development processes on new technologies might be desirable, but inherent in those technologies and processes are emergent functions and different risk acceptance profiles. They are always there, inseparable from the package, even if unnoticed.

Military T&E has stood as the vanguard against risk transfer from developer to operator for decades. But its traditional practices are ineffective against the rise of complex systems that inherently obscure their emergent attributes when decomposed. T&E is morphing although its final form is not yet known. The USAF Test Pilot School, the high-church of T&E for half a century, has begun its journey to train Systems Thinking Professionals rather than Flight Testers to answer the challenge laid down by true complexity and rapid development cycles with variable levels of risk acceptance (Montes, 2020). While the shortcomings of traditional T&E are apparent and the path to advanced T&E is not yet certain, initial steps toward a capability remain within the systems engineering domain.

Seeking assurance of capability is an integral part of our small-force Air Force, a valuable cultural norm that is worth retaining. Our path to Air Force 2121 will need a solution to the assurance problem because complex systems are needed, but current military T&E practice is severely constrained by complexity. We will look to model-based systems engineering (MBSE) to enable digital twins, using validated simulation environments, statistical analyses, and big data approaches. But however much we want them to, equipment manufacturers are not going to sign up to contracts to deliver complex systems using civilian development cycles.; especially those contracts with provision for capability assurance in the Australian environment. The overlay of civil development practice with conservative, military risk acceptance thresholds is irreconcilable.

Faced with the prospect of moving away from assurance, to take a chance with capability while it evolves through practical experience (following civil practice), Air Force 2121 will keep assurance by adopting a new approach to T&E.

Retaining assurance will necessitate a new approach to T&E that uses digital models, maths, and matrices rather than Excel and DOORS. We will learn to be comfortable with a level of uncertainty rather than making a subjective guess and labelling it probability when we really don’t know the denominator. We’ll become comfortable with bounded knowledge of what we don’t know, rather than an illusion of certainty generated by 1950’s T&E unsuited to 2121 systems.

Ben Luther is A graduate of the School of Air Navigation, Ben flew with Crew 4 at 10SQN, before converting to the AP-3C as the Crew 3 TACCO. Later selected for flight test training, Ben undertook military flight test programs with ARDU and a posting to Airbus with DMO for the KC-30 development. While serving as the Senior Flight Test Systems Specialist and Flight Test Safety Officer at ARDU, Ben was recruited to Gulfstream in the wake of the G650 flight test accident. After leading flight test teams to complete civil certification in the G500 and G600 programs, Ben recently returned to Australia and is employed in Advanced T&E concepts at Nova Systems. He is a PhD candidate undertaking research into complex socio-technical system risk at the University of Adelaide.

Bibliography

Boy, G.A. (2014). Dealing with the unexpected. Risk Management in Life-Critical Systems. Ed. Millot, P. Wiley& Sons Inc. London, England.

Leveson, N. (2012). Engineering a safer world: systems thinking applied to safety. The MIT Press, Cambridge, Massachusetts.

Montes, D.R., Hill, T.D., Cookson, J.L. & Cannon, G.E. (2018). The Evolution of the USAF Test Pilot School Education Paradigm toward a Systems-Engineering Foundation. AAIA Aviation, Flight Test Conference, 25-29 June 2018, Atlanta, Georgia. American Institute of Aeronautics and Astronautics, Reston, Virginia.

Rasmussen, J. (1997). Risk management in a dynamic society: a modelling problem. Safety science, vol. 27, no. 2, pp. 183-213.

Snowden, D.J. & Boone, M.E. (2007). A Leader’s Framework for Decision Making. Harvard Business Review, November 2007.

Sterman, J. (2000). Business dynamics: systems thinking and modelling for a complex world. Irwin/McGraw-Hill, Boston, Massachusetts.

This article was published by Central Blue on September 18, 2021.

The Future of Test and Evaluation: Looking Back from the Future

Bibliography

--> Building the Impact Force: Shaping a Way Ahead for the USMC

A Bridge Between History and Transformation: Reviewing The United States Marines: A History (Fifth Edition)

The Age of Chaos: Kill Web Warfare, Authoritarian Coercion, and the Democratic Advantage

Kenneth Maxwell: “This is always my recommendation: never underestimate the Portuguese.”

Two Judges Riding Circuit in Rappahannock County Virginia Just Make a Huge Public Mistake

Perspectives on the BRICS Meeting: July 2025

The High Cost of Appeasement: How Two Decades of Failed Deterrence Led to Ukraine and What It Means for Future Conflicts