Controversies in Military Ethics & Security Policy
Reliability Standards for (Autonomous) Weapons: The Enduring Relevance of Humans
Some critics of autonomous weapon systems (AWS) and military uses of artificial intelligence (AI) maintain that such systems may not be sufficiently reliable for their use to be morally or legally permissible. In particular, critics often cite concerns surrounding the principle of distinction, arguing that AI-enabled and autonomous weapons are unlikely to be able to reliably distinguish between combatants and noncombatants, or between combatants who may be targeted and those hors de combat (and thus protected from attack). However, this objection faces numerous challenges, most critically in that 1) it treats reliability as a simple measure which can be derived in a straightforward manner, 2) it fails to account for contextual and user-dependent factors which deeply impact on our understanding and assessment of reliability, and 3) it fundamentally ignores the fact that what constitutes “reliability” may be defined differently depending on use cases or users. Thus, rather than asking whether a system is reliable simpliciter, we should instead ask whether a system is reliable for some given task X when used by some particular user Y. When reliability is examined in this more nuanced fashion, we see that reliability assessments are less likely to substantiate blanket conclusions concerning particular autonomous and AI-enabled weapons, but instead will serve as contextual guides to permissible, effective, and responsible use of such systems. This recalibration of discussions of reliability furthermore tracks broader assessments of permissible (uses of) weapons in war.
The objection from (un)reliability
Weapons used in war must satisfy basic requirements concerning their predictability and reliability,[1] with international humanitarian law (IHL) making clear that states also have a responsibility to adequately test any new weapons being developed to ensure that they comply with all the laws of war.[2] For concreteness, let us focus on reliability with regards to one of the most fundamental laws of war, namely the principle of distinction. Distinction requires that one distinguish between legitimate targets (military objects and combatants who are not hors de combat) and illegitimate targets (civilian objects, noncombatants, and combatants who are hors de combat) and direct one’s attacks only at the former. Distinction rules out any attacks overtly directed at protected persons or objects, and also any attacks which are “indiscriminate”, with “indiscriminate attacks” being defined as those “not directed at a specific military objective”, or “which employ a method or means of combat which cannot be directed at a specific military objective”, or “which employ a method of means of combat the effects of which cannot be limited as required”.[3]
This understanding of “indiscriminate attacks” presents a problem for autonomous and AI-enabled systems which may be unreliable or unpredictable. This is because, if such weapons suffer from a significant degree of unreliability or unpredictability, it becomes difficult to see how they even could be adequately “directed at a specific military objective” as required, and it likewise raises worries that such AWS would constitute means of combat which cannot be directed and whose effects could not be limited as required. The central worry, echoed by many critics of AWS, is that using unreliable or unpredictable AWS risks significant harm to protected persons, and does so in a manner that is apt to undermine distinction, a fundamental tenet of both the ethics and laws of war.[4]
The potential (un)reliability of AWS is also apt to impact on a variety of other principles guiding conduct in war, but for brevity we may restrict ourselves to the principle of distinction. However, as I will argue in the remainder of this article, while unreliability or unpredictability for autonomous and AI-enabled systems will indeed impact on how, when, where, and in what ways these weapons may be used, it will not underpin any broad or blanket conclusions regarding such weapons. More importantly, I will show that assessments of reliability cannot be adequately made without us paying heed to both who is using a system and the contextual aspects of that system’s use. The upshot of this is that we cannot look simply at a sterile assessment of a system’s reliability under some laboratory conditions, nor can we directly compare machine reliability for some task against human reliability. Instead, we must always bear in mind the socio-technicity of autonomous weapons’ deployment, examining the reliability of humans using AWS as compared to humans operating with traditional non-autonomous systems.
Reliability as a contextual evaluation
All weapons used in war are (required to be) put through a testing and evaluation phase (T&E), where engineers, designers, programmers, military professionals, etc. determine whether a particular asset is reliable enough to be effectively and responsibly deployed in the field.[5] Rigorous T&E will usually result in a number of nuanced assessments of how a system functions, when it may be liable to break down or malfunction, and what its expected (and expectable) reliability will be in the field.[6] Now, one might imagine that the reliability of a system could be presented in some sort of simple score, perhaps signifying the likelihood or probability of the system malfunctioning when used in a generic operation. In simplest terms, we might imagine that reliability could come in an easy-to-understand rating from 0 to 1, where 0 indicates that the system is never expected to function properly, and 1 represents it never being expected to fail. However, on closer examination, it is clear that such a straightforward outcome for T&E is not possible.
For simplicity, suppose we are conducting tests on a rifle to determine its reliability, where “reliability” in this case is understood only in terms of how often the rifle is expected to jam when being fired. This would seem to be the most sterile type of reliability assessment, and one where there are almost no interfering factors which might impact on a simple 0-to-1 generic assessment. However, it should be clear that even in this scenario, one cannot simply say that the rifle has some level of reliability, simpliciter. This is because a rifle may have some very high level of reliability when fired in laboratory conditions, but be somewhat less reliable when used in the field (due to environmental factors). Further still, not all field conditions are the same. Thus, a rifle used in temperate forests may be expected to jam less often than one used in tropical forests, which may in turn be expected to jam less often than one used in a swamp, and which may still be less likely to jam than a rifle used in a sandy desert. Quite simply, where one is using a particular asset has just as much to do with its reliability as what the asset is at a basic level.
On top of this, there may be some systems which are on average less reliable, but in particular environments are more reliable. In fact, much research and design focuses on this exact fact: general purpose weapons and combat systems provide militaries with broad capabilities, but there is a strong need for specialized assets which are designed for specific combat environments, and which are able to overcome the challenges of those particular environments. Thus, reliability is not going to be determinable in a general way. Rather, reliability will by necessity be contextual, with each combat environment impacting on the reliability of a weapon or system. Moreover, highly localized or temporary conditions may also impact on reliability, and it will regularly be the case that a system which might have been highly reliable an hour earlier is suddenly highly unreliable because of, say, a sandstorm, snowstorm, or heavy winds. All of these possibilities for rapid changes to a system’s reliability, and changes which are necessarily contextual, show that some broad notion of “system reliability” is apt to lead us into a false sense of confidence in a system. Instead of looking to generic reliability assessments, we must recognize the importance of context, as well as the importance of the users of weapon systems and the role that humans play in not just responding to a system’s reliability, but indeed shaping and fostering its reliability through responsible use.[7] And this applies to all weapons and platforms used in war, be they rifles, tanks, aircraft, or the emerging classes of autonomous and AI-enabled systems we are increasingly seeing in modern militaries.
Reliability, (un)predictability, and misbehavior
If we shift our focus away from broad understandings of “system reliability” to more specific and contextual views which look to both the contexts of use and the users deploying autonomous weapon systems, we quickly see that reliability is deeply intertwined with predictability. This has important implications because predictability is rather straightforwardly connected to the individual humans who are making predictions. To take an everyday example, the behavior of a particular dog might be easily predictable for its owner but utterly unpredictable to someone unfamiliar with that dog.[8] Now, one might find this example unfair, in that biological organisms are much more complex and harder to predict in a straightforward manner than artifacts, but one can also imagine any number of complex manmade artifacts which have many interworking parts and which are more or less predictable depending on who is making the predictions; asking a layperson what a complex system might do in this or that environment is apt to result in shrugs or vague guesses, whereas an engineer or designer of that system is likely to be able to give precise answers as to what may be predictably expected. Critically, the knowledge, familiarity, and general competence of a given user deeply impacts on how well that user can predict a system’s behavior.
Connecting this to our main point, in order for a system to be deemed reliable, it must at least be predictable.[9] After all, if one cannot predict what the system will do in some scenario, it can hardly be deemed reliable to use in that scenario (for whatever purpose). But since predictability is user-dependent, and thus reliability thereby also user-dependent (at least to the extent predictability is), it will be impossible to have general reliability assessments which are to be treated as authoritative for all potential users. And this is a good thing.
Competent users of any system may be expected to reliably get better results out of using these systems
Competent users of any system may be expected to reliably get better results out of using these systems. In this way of putting it, reliability is not attributed to the system itself, but to the use of the system. And in many ways, that is arguably a more appropriate measure. Returning to the simple example of a rifle, if we imagine giving a rifle to some random individual in laboratory conditions and also giving a rifle to a trained marine operating in the harshest environmental conditions (say, a sandy desert), the reliability of the rifle is going to be dependent on not just the weapon, nor even just the weapon and its environment of operation, but rather on the weapon, the environment, and the user using that weapon. To make it more concrete, we might expect that the marine regularly clears sand from the rifle’s breach, or takes extra precautions to prevent sand from entering sensitive portions of the weapon, etc. Beyond this, in using the rifle for some given aim, the marine may be strongly expected to more reliably achieve that aim, even given the harsher conditions which reduce the simpliciter reliability assessment of the weapon. The main point is that that simpliciter assessment tells us too little, and in fact will tell us unhelpful things much of the time, as how, where, when, and by whom a system is being used will be just as if not more important than the system itself.
Finally, building on these points, it is also important to be clear about the fact that some systems may not always do as we would wish them to without this necessarily undermining even their generic reliability assessment. This is because we can reasonably differentiate between systems which behave in a reliable way, in that they follow certain rules and behave in predictable ways, but where that reliability may in certain cases be apt to result in predictable misbehavior. For example, consider anti-radiation missiles used for targeting radar stations and jammers. Anti-radiation missiles follow comparatively simple and predictable protocols; in general, they are designed to lock onto the strongest radio source within the missile’s target acquisition cone, and then autonomously/automatically engage and destroy that emission source. However, they generally have no further programming or means to differentiate civilian from military targets. Because of this, one might see these as being “unreliable” to the extent that in mixed combat environments with both military and civilian objects, the latter may be expectably, yet still mistakenly (from a moral and legal standpoint) targeted. But this is too quick. Anti-radiation missiles function in highly predictable ways, and the scenario just presented is not one where the missiles “go rogue” in some manner. Rather, it is one where the missiles predictably misbehave, and do so because they are being negligently used in an environment where they ought not be used due to limitations in the system. They are reliably and predictably seeking out sources of radio waves, and in the example it is the ineptitude of human users creating an incorrect sense that the system itself is making some sort of mistake.
The human component
This brings us to the final point, namely that however advanced autonomous and AI-enabled systems may become, there is almost certain to remain a human component to warfare, and this human component will persist in altering how we can and should view reliability assessments for emerging weapon systems.
While some may imagine that autonomous weapons will make war a wholly robotic enterprise where combat requires only the push of a button in some faraway command post, the reality is that “the introduction of robotic and autonomous systems into the force is liable to increase both the number of people and the diversity of skills necessary within the force”.[10] This is because there will have to be ground teams maintaining the physical components of an AWS, troubleshooting its soft- and hardware in the field as needed, and ensuring that it is fueled and armed for each mission as required. Concretely, where you need one human with a rifle in order to have one functional rifleman, to have one functional robotic rifleman will entail having, at the least, one roboticist to ensure the system is in working order, one computer scientist in case of processing malfunctions, one armaments specialist keeping the system fully loaded, as well as any support crew needed to transport the system or necessary communications arrays to its area of operations/launch.
All of these various individuals will have discrete jobs to do in order for the AWS to function in the field. And critically, the jobs of each of these individuals will deeply impact on how “reliable” the system is in combat. Their training, their understanding of the limitations of the system, and their ability to quickly counteract enemy measures to break communication, jam the AWS, or otherwise interfere with its operations will all be critical in maintaining reliability for the weapon system. And all of these facts are ones which cannot be determined in a lab, corroborated through sterile testing and evaluation, or guaranteed by some generic “reliability assessment”. In order for us to be sure of how reliable a system is or can be, we must look at the system, its intended use (in discrete instances), and the human crews who are responsible for maintaining and effectively deploying the system.
The central lesson which we must internalize if we are to responsibly and effectively make use of AWS is that no artifact or system exists in a vacuum
The central lesson which we must internalize if we are to responsibly and effectively make use of AWS is that no artifact or system exists in a vacuum, and no artifact or system can be judged based only on its internal capacities. Rather, we must pay heed to the socio-technicity of autonomous weapon systems, that is, the fact that these are technological objects which are deeply and intimately embedded within broader social and institutional frameworks.[11] And these larger human systems significantly impact on the reliability of autonomous weapons, shaping what we can and should reasonably expect them to be capable of. Just as we cannot answer the question, “Will this rifle jam when fired?” without knowing where it’s being used and by whom, we cannot answer, “How reliable is this autonomous weapon?” without knowing what operation it is being tasked with, what environmental conditions it is expected to be confronted with, and what team of humans is maintaining it and preparing it for its operations. Everything is contextual and everything is user-dependent, and generic reliability assessments will not tell us how reliable a system will be when actually deployed to fields of battle.
Conclusion: assault rifles and atom bombs
A simple rifle may be highly reliable, in that it usually shoots straight and doesn’t jam. Similarly, a hand grenade may be reliable, in that it always creates a predictable blast, the dangers of which can be communicated to soldiers and incorporated into appropriate training regimen. However, a rifle may be generally reliable, yet expectably fail when used in certain conditions, for example, in a sandy desert. A rifle may also fail when used by a soldier unacquainted with that particular weapon and who happens to misuse the weapon in some way. The same points may hold equally well with regards to hand grenades. Or tanks. Or missiles. Or atom bombs. The simple fact is that virtually all technological artifacts will have some basic level of reliability which is determined by rigorous testing and evaluation. However, though T&E will generally involve broad assessments of a system’s reliability across myriad likely or foreseeable battlegrounds and theaters of operation, one usually cannot fit the entirety of a testing regimen’s data into a reliability assessment. More than this, any generalized statement about how reliable a system is, simpliciter, is inevitably going to gloss over how reliable a system is, say, when there is a sandstorm, or when snow on the ground creates glare hindering optical sensors, or when thick foliage masks heat signatures of potential targets. Quite simply, generalized reliability assessments will fail to communicate the particularities of a system’s reliability. More than this, generalized reliability statements will never speak to how reliable a given system is for some particular user.
When using technological artifacts, be they rifles, bombs, or autonomous platforms, it is not just important that the devices themselves are generally reliable (under a variety of conditions). It is equally critical that they be used by competent individuals who are able to reliably employ them. This central fact must be remembered when discussing autonomous weapons, and we must ensure that we do not confuse “reliability under laboratory conditions” for “reliability as such”. No weapon can be depended upon in every environment or when used by any user. And weapons which are “less reliable” with regards to some capability may not be “less reliable” or “unreliable” as such, so long as the users deploying these weapons are competent and capable enough to understand and appropriately respond to the system’s limitations.
This research was funded by the Czech Science Foundation (GAČR) under grant number 24-12638I. The article was added to the issue subsequently and is not included in the PDF for download.
[1] There are additional consequential and deontological constraints on weapons development and use as well, but the focus of this article will be exclusively with reliability/predictability concerns.
[2] See Geneva Protocol I Additional to the Geneva Conventions (hereafter AP I), Art. 36.
[4] Sharkey, N. (2010): Saying “no!” to lethal autonomous targeting. In: Journal of Military Ethics, 9(4), pp. 369–383; Guarini, M. and Bello, P. (2012): Robotic warfare: Some challenges in moving from noncivilian to civilian theaters. In: Lin, P., Abney, K., and Bekey, G. A. (editors): Robot Ethics: The Ethics and Social Implications of Robotics, Cambridge, MA, pp. 129–144; Human Rights Watch (2012): Losing humanity: The case against killer robots. Technical report, Human Rights Watch; Sparrow, R. (2015): Twenty seconds to comply: Autonomous weapon systems and the recognition of surrender. In: International Law Studies, 91(1), pp. 699–728; Sparrow, R. (2016): Robots and respect: Assessing the case against autonomous weapon systems. In: Ethics & International Affairs, 30(1), pp. 93–116; Human Rights Watch (2016): Making the case: The dangers of killer robots and the need for a preemptive ban. Technical report, Human Rights Watch; Winter, E. (2020): The compatibility of autonomous weapons with the principle of distinction in the law of armed conflict. In: International & Comparative Law Quarterly, 69(4):845–876; Stop Killer Robots (2022): Negotiating a treaty on autonomous weapons systems: The way forward. Technical report, Stop Killer Robots.
[5] For extensive discussion of the legal nuances surrounding weapons benchmarking and deployment, see Boothby, W. H. (2016): Weapons and the Law of Armed Conflict. 2nd edition, Oxford, UK.
[6] Eriskin, L. and Gunal, M. M. (2019): Test and evaluation for weapon systems: concepts and processes. In: Operations Research for Military Organizations, pp. 98–110.
[7] Importantly, “responsible use” does not necessarily imply that there be humans in- or on-the-loop for autonomous and AI-enabled weapons. Rather, it will imply that humans have ensured that a particular deployment is expected to be in line with the ethics and laws of war, but this does not necessitate contemporaneous control. As an example, deploying a fully autonomous system to clear an enemy trenchworks, where the system is geographically and temporally limited to a narrow engagement window restricted purely to that purely military installation presents a rather clear case where lack of direct human oversight may be permissible (and perhaps beneficial, militarily speaking). The reason this would (arguably) present no problem is that the indiscriminate shelling of a trenchworks (or other clearly and exclusively military installation) does not require one to make additional judgements during the course of firing shells from an artillery position. The same arguably holds when deploying autonomous drones clearing such positions.
[8] See Wood, N. G. (2024): Explainable AI in the military domain. In: Ethics and Information Technology, 26(2), pp. 1–13, especially pp. 9–11.
[9] Some might argue that opaque AI systems are inherently unpredictable, given that one cannot know for sure what they may do given certain inputs. This is a misleading presentation though. An opaque system will always have a potential for unpredictable action, but such systems may still be highly reliable. For example, combat assault dogs are fundamentally opaque, just as human combatants are, but animal and human combatants may both be highly reliable, despite both having a clear potentiality for unpredictable action. Awareness of that potential is critical to us responsibly deploying opaque AI-enabled systems, but opacity does not inherently imply that unpredictable actions will or necessarily must occur.
Nathan Wood is a Postdoctoral Fellow of the Institute of Philosophy of the Czech Academy of Sciences and the Center for Environmental and Technology Ethics – Prague, as well as an External Fellow of the Ethics + Emerging Sciences Group at California Polytechnic State University San Luis Obispo. His research focuses on the ethics and laws of war, especially as these relate to emerging technologies, autonomous weapon systems, outer space warfare, and other aspects of future conflict. He has previously published in “Ethics and Information Technology”, “War on the Rocks”, “Philosophical Studie”s, “The Journal of Military Ethics”,and numerous other journals.