Skip to main content

“Meaningful Human Control” and Complex Human-Machine Assemblages – On the Limits of Ethical AI Principles in the Context of Autonomous Weapons Systems

What does the term “autonomous weapons systems” (AWS) evoke? Usually, it is associated with a high degree of independence in these weapons systems. The control of military vehicles (drones, tanks, submarines, etc.) as well as the identification, selection and attacking of targets are envisioned as processes that machines or computers can carry out “autonomously”, i.e. without human intervention. Part of these imaginings of AWS is the promise that autonomy will enable military operations to be carried out faster and with greater precision, without unnecessarily endangering the lives of one’s own soldiers. On the other hand, there are political, moral and (international) legal demands to enable and maintain sufficient human control (“meaningful human control”) over these systems in order to prevent errors and the dehumanization of enemies. The tacit assumption here is that operators of AWS are able to assert their autonomy when making life-or-death decisions. If meaningful human control cannot be ensured, AWS will need to be banned. Both in the technoscientific promise of AWS and in criticism of them, what is generally ignored is that the concept of autonomy is itself problematic – with regard to humans as well as to machines.[1]

Autonomy

Since the 1980s, critics in the field of feminist science and technology studies, such as Lucy Suchman[2] and Donna Haraway,[3] have pointed out that autonomy is not an attribute of entities, but rather of human-machine assemblages, and is an effect of discourses and material practices. At the same time, the question of human responsibility in human-machine assemblages with distributed sociomaterial agency arises. Technologies, in this view, are materialized figurations that are ambivalent and ambiguous. And of course, they require interpretation. Technology researcher Charis Cussins, for example, sees human-machine interactions as constantly changing “choreographies”.[4] However, it is not only the interaction between humans and machines, but also the actors themselves that should be thought of as unstable and changing, as they influence and alter themselves in the process of use. New developments such as machine learning and learning algorithms in turn show very clearly that machines and programs react dynamically and “adaptively” to users. Self-optimizing algorithms, the usage processes in general, the adaptations of the software brought about by users in real time, and the underlying data change the meaning and material basis of an AI-based machine. Attributions of meaning, material bases and technical logics become dynamized in the process. At the design level, too, technologies and human-machine interfaces are to be understood as dynamic processes of iterative materialization, in which certain norms and values become established. Humans and machines constitute each other through their experiences, appropriation strategies, and via instructions or scripts[5] written into the machines, or their “grammatology of actions”.[6] There is a constant engagement with each other’s processes and practices. In this way, sociomaterial agency is produced. With this understanding of humans and machines, along with rapid technological developments in the field of AWS, the concept of autonomous and strictly separable entities is increasingly undermined. Yet this is the very concept on which the terms AWS and meaningful human control are based. It is only through the mutual adoption of specific rules (for example, of language) that actors obtain their agency, as a jointly developed understanding arises collaboratively and repeatedly in the respective context-based interaction. Note that “understanding” here is not meant in an empathetic sense, but rather as an interaction.

We should not lose sight of the fact that humans communicate very differently than machines. A machine always has a framework, defined by its programming, within which it can communicate. At the same time, these interpretation spaces are ambivalent, and users can also improvise within them with workarounds. The machine can perceive the human’s behavior only to a very limited extent and can interact only within this framework: “The machines were tracking the user’s actions through a very small keyhole.”[7] This means that the relationship between humans and machines is reciprocal, asymmetrical, and dynamic.

The open question remains as to how, in light of the above, we can conceive of human responsibility and accountability without having to abandon our insight that humans are inseparable from the sociomaterial network that constitutes human-machine interaction.

Already in drone warfare, work has been done on the targeted tracking and killing of people – either based on a pattern (“signature strikes”) or with the help of a kill list. Using surveillance and tracking technologies – with data mining and machine learning, facial recognition software, real-time tracking via video systems on drones, satellite links, etc. – suspects are produced as central nodes in “terrorist” networks, and as military targets, and are possibly also killed. The context and situatedness of the targets are largely disregarded. Categorization takes place via complex human-machine assemblages, whose performative effects are complex and unclear, can hardly be considered attributable, and whose scopic or visibility regimes[8] generate specific (in)visibilities. This means that the “fog of war” – contrary to what is often claimed – is not clearing, but getting thicker. Complex human-machine assemblages make it difficult to attribute responsibilities, especially when the decision in a war has to be made in fractions of a second. In the debate on AWS, this problem is mostly discussed only at the surface level.

With current drone technology and highly automated decision-making, targeting and killing systems, the contextuality of technologies and the complexity of human-machine assemblages and interfaces becomes particularly significant. In particular, this takes on a geopolitical dimension, as their unregulated use does not lead to greater precision, responsibility or security, but above all to reciprocal violence.[9] In general, it is not easy to make the norms and values inscribed in technology visible – to this day, many consider technology to be neutral. The new, highly complex systems complicate the problem. The very claim that technology acts “autonomously” obscures design and engineering decisions and processes, as well as, for example, the production and selection of the underlying data volumes (which are rarely evaluated). Maintenance and updating processes, and the associated infrastructures, are made invisible. It is completely unclear how and, above all, whether design practices and interfaces can be made transparent for designers, users and stakeholders in the age of opaque, self-learning algorithms. The question that remains unanswered is whether the processes and effects of “autonomous” human-machine assemblages can be made comprehensible, and how accountability can be attributed.[10]

Instead of reflecting on the problems of complex human-machine assemblages – and whether there is any real possibility of meaningful human control involving more than just pressing a green or red button at the last second – we are increasingly seeing evasive maneuvers on the part of the major powers, the military, the arms industry, and in some cases the technosciences. Faced with criticism of AWS, they resort to the rhetoric of “responsible AI in the military domain”, as is familiar to us from other AI discourses. Thus a number of military strategies now include a voluntary commitment to the guiding principle of responsible, explainable, reliable and governable use of AI.[11]

What the implementation of such ethical AI principles might look like in the context of an AWS, and why they do not do justice to the complex processes and effects of human-machine assemblages, can be shown with the example of the Future Combat Air System (FCAS). From around 2040, FCAS is supposed to be at the core of the European air forces.

The Future Combat Air System

The FCAS is currently being developed as part of a transnational project involving Germany, France, Spain and Belgium. Rather than a single weapons system, FCAS is a future vision of a network of several weapons systems comprising existing weapons systems along with new developments such as the Eurodrone and in particular the Next Generation Weapon System (NGWS). The NGWS in turn comprises the Next Generation Fighter (NGF), remote carrier vehicles that carry a range of payloads (both sensors and weapons), and digital infrastructure that is intended to connect all elements of the FCAS – the so-called Combat Cloud or Multi-Domain Combat Cloud.[12] The Combat Cloud is to include an algorithmic decision support system that would enable the OODA loop (Observe, Orient, Decide, Act) to be completed more quickly.[13]

Another special feature of this project is the institutionalization of the addressing of ethical and legal issues by means of a forum set up specifically for this purpose: the “Technology Responsibility Working Group” (AG Technikverantwortung für ein FCAS). This expert panel was founded in 2019 by the German Fraunhofer Institute for Communication, Information Processing and Ergonomics (FKIE) and Airbus Defence and Space. It brings together authorities such as the German Federal Ministry of Defence and Federal Foreign Office, as well as stakeholders from the scientific and academic communities, think tanks and church institutions. So far, the forum is the only institutionalized form of ethical reflection within the FCAS framework. However, despite its aim of discussing ethical issues on behalf of the entire project, the working group to date remains a purely German affair.

The working group considers a certain degree of technological “autonomy” to be unavoidable, as FCAS would be ineffective and therefore useless in future military conflicts with faster-acting opponents and more complex situations without the high degree of automation that this enables.[14] For the FCAS, however, it is possible “to go for a European way that keeps the overall system under control of an informed, aware, and accountable human operator, which is equipped with means of control that are meaningful to the required and specified level”.[15] What is expressed here is an understanding of human autonomy that makes meaningful human control over the highly automated FCAS appear possible in principle and only requires appropriate means of control for its realization. The Ethical AI-Demonstrator (E-AID) illustrates what these means could look like.

The E-AID simulates the use of AI in the FCAS in the form of a decision support system. The simulations are conducted in close consultation with the German armed forces, in the form of scenarios. The stated aim is to use concrete examples to obtain a realistic picture of the possibilities, limitations and ethical implications of AI in the defense sector, and thus take a first “hands-on” step toward an “ethics-by-design” methodology that can then be integrated into an overarching FCAS design process.[16] As a test environment, E-AID is meant to determine which system design is best suited to provide human operators with “reflective assistance”, i.e. enabling human operators to make “balanced and conscious decisions” regarding the use of AI-based weaponry.[17] In the E-AID scenario “Find Fix Track Application with AI for Automated Target Recognition”, the task is to eliminate enemy air defenses using remote-controlled drones equipped with sensors that collect data on the positions of the enemy air defenses. The results of the automated target recognition are displayed in a graphical user interface that highlights relevant objects and provides basic contextual information (for example, the type of vehicle detected). The key question in this scenario is how tasks can be delegated to AI in an accelerated decision loop without violating applicable military rules of engagement and ethical guidelines. Figure 1 is a screenshot of this human-machine interface in the E-AID.[18] The screenshot shows an aerial view in which several objects have been identified. Each object has been assigned a label and a corresponding probability value. Details of the identified objects (ID1) are shown enlarged in a smaller browser window. With a probability of 83 percent, the object is a Russian SA22 Greyhound (Pantsir-S1), which in the given scenario represents an enemy weapons system. In an enlarged image, two details of the object are marked with red squares and classified as “cannon” and “radar”. Green ticks appear next to “RoE” (rules of engagement) and “SIGINT” (signal intelligence). In addition, several buttons can be clicked by the user, for example “Edit”, “Review” or “Investigate”. This design is said to allow human operators not only to confirm the selected target, but also to understand why it was selected.[19]

Below, we will show that this promise cannot be kept, and that there is no meaningful human control. The possibility for users to obtain background information on the individual outputs of the system is an essential element of the ethics-by-design approach. It is intended to ensure both the trustworthiness of the output generated by the AI, and the accountability and comprehensibility of human decisions. However, if users of the system perform all (or at least many) of the possible background checks for all (or at least many) of the identified objects (by clicking on the corresponding buttons), the speed advantages of automating the OODA loop are largely lost. On the other hand, if users forgo these options, they just have to trust the system.

This would be all the more problematic because it means there would be a leap of faith that the system cannot at all justify. There is no reason to assume that FCAS is less prone to data analysis errors than other AWS. The only difference is that human operators are to be given the opportunity not to implement these erroneous recommendations in their own actions – which is only possible at the expense of the system’s overall performance. So, despite the ethics-by-design approach, there would be a conflict of interests between human responsibility and the efficiency of the AWS. Even more fundamental, however, is the question of the extent to which these background checks would actually enable human operators to understand the performance of the system. Or to put it another way, what does it mean to understand or explain the output of an algorithmic decision support system?

This question is by no means easy to answer, and certainly not in our FCAS scenario. Is it sufficient to highlight the radar and the cannon in the enlarged view as an explanation for the classification as an SA22? How is the human operator then supposed to understand the system’s conclusion that the details shown in the image are actually a radar and a gun or – even more complicated – that the radar and gun belong to an SA22 and not to another military vehicle? A human understanding of this output would ultimately require a detailed explanation of the data processing methods and data involved. Yet whether such an explanation is actually “understandable” depends not least on the complexity of the algorithms and the expertise of the human operators. In the case of machine learning algorithms (which will play a key role in FCAS), even providing open-source code and training data might not be enough to make an output fully understandable – especially for military end users, who are not usually experts in computer science. Furthermore, if so-called artificial neural nets are used, which is often the case in image processing applications such as target acquisition, even experts are not able to understand in detail how the system works. Therefore, even with full transparency, an AI-based system such as the one presented in E-AID will cause difficulties with regard to responsibility and accountability.[20]

Another critical problem is the manner in which the algorithmic decision support system reduces complexity. The situation depicted in figure 1 is set up in such a way that only two vehicles are visible, both of which are classified as targets with a relatively high probability (83 and 79 percent). However, if the targets were not military vehicles but human combatants, it would be difficult to imagine how a trustworthy output can be generated without a deeper understanding of the situation and, even more so, how the output can be made understandable for human operators. But also an urban situation with a busy street where numerous vehicles can be seen, which could be civilian or military, and in the latter case could belong to friendly or enemy forces, poses a completely different challenge than the given scenario. In this case, the question arises as to whether all vehicles should be automatically classified and presented accordingly, or whether an automatic pre-selection should be made. Above a certain number of vehicles and in view of the need to reduce complexity and the hoped-for speed advantage in the OODA loop, the situation could probably only be handled with pre-selection. But this creates another problem:

In order to generate a manageable number of possible targets for the human operator, an arbitrary decision threshold has to be defined to separate “probable targets” from “improbable targets”, of which only those above the threshold (the “probable” ones) are then highlighted. Yet not only is this choice arbitrary, it also affects the accuracy of the system. If the threshold is set low, more objects will be classified as “probable targets” that are not actually targets (false positive results). If the threshold is set high, more real targets will not be highlighted by the system (false negative results).[21] Depending on the context or use case, the system developers or operators choose a threshold value that they consider appropriate.

The next step is to evaluate the system’s output. At this point, the threshold definition also has an impact on human judgment. Empirical studies on the use of algorithmic decision-making systems[22] have shown that users rarely question the output of the systems, and even tend to regard them as infallible. The users are therefore subject to “automation bias”. According to Cummings,[23] the effects of automation bias in interaction with automated decision support systems have contributed to several fatal decisions, including a deployment of the U.S. Army’s Patriot missile system, which shot down a British Tornado and an American F/A-18 during the Iraq War in 2004.

Parasuraman and Manzey[24] found that automation bias depends, among other things, on the degree of automation. It cannot be prevented simply through training or explicit instructions to review the recommendations given by the system. And it can influence the decision-making of teams as well as individuals. So multiple-person control is not a solution per se. In the case of FCAS, the calculation of numerical probabilities (83 percent!) could even exacerbate this effect, because it suggests objectivity.

Automation bias leads to two types of errors. In an “error of commission”, users follow an incorrect recommendation from an automated decision support system. Applied to the given scenario, this would mean that they regard false positive reports as true positive reports, act accordingly, and so make potentially fatal decisions. In an “error of omission”, users overlook critical situations if they are not recognized by the system. Applied to the given scenario, this would mean that they do not recognize a “real” target, and focus only on the “probable” targets highlighted in the user interface – which can also have fatal consequences (in this case for their own troops). Depending on how high the threshold is set, either errors of commission or errors of omission are more likely. It is therefore the interaction between humans and algorithms that produces “right” or “wrong” decisions in a complex way.

Given the problems outlined above – the quality of the underlying data, the verifiability of conclusions and recommendations, the influencing of the output with arbitrary threshold values, and the influence of the system (design) on perception – it seems inappropriate to hold human operators fully responsible for the consequences of these potentially fatal errors.

The FCAS example also makes it clear that meaningful human control of complex human-machine assemblages is not possible unless we are prepared to forgo the benefits of acceleration through increasing automation. But even if we were prepared to do this, real control over the decisions of AI-assisted decision-making, targeting and killing systems does not seem feasible.

 


[1] Suchman, L. and Weber, J. (2016): Human-Machine Autonomies. In: Bhuta, Nehal et al. (eds.): Autonomous Weapons Systems. Law, Ethics, Policy. Cambridge, pp. 75-102.

[2] Suchman, L. (1987): Plans and Situated Actions. The Problem of Human-Machine-Communication. Cambridge/New York.

[3] Haraway, D. (1985 / 2016): A Cyborg Manifesto. Science, Technology and Socialist-Feminism in the late Twentieth Century. Minneapolis. warwick.ac.uk/fac/arts/english/currentstudents/undergraduate/modules/fictionnownarrativemediaandtheoryinthe21stcentury/manifestly_haraway_----_a_cyborg_manifesto_science_technology_and_socialist-feminism_in_the_....pdf.

[4] Cussins, C. M. (1998): Ontological choreography: Agency for women in an infertility clinic. In: Berg, M. and Mol, A. (eds.): Differences in Medicine: Unraveling Practices, Techniques, and Bodies. Durham, NC: Duke University Press, pp. 166-201.

[5] Akrich, M. (1992): The de-scription of technical objects. In: Bijker, Wiebe E. and Law, John (eds.): Shaping technology/building society. Studies in sociotechnical change. Cambridge, pp. 205-224.

[6] Agre, P. E. (1994): Surveillance and Capture: Two Models of Privacy. In: The Information Society 10 (2), pp. 101-27.

[7] Suchman, Lucy (2007): Human-Machine reconfigurations. Plans and Situated Actions. Cambridge, p. 11.

[8] Gregory, Derek (2011): From a view to a kill. Drones and late modern war. In: Theory, Culture & Society 28 (7-8), pp. 188-215.

[9] Khan, Azmat (2021): The Human Toll of America’s Air Wars. New York Times Magazine. December 19. https://www.nytimes.com/2021/12/19/magazine/victims-airstrikes-middle-east-civilians.html?campaign_id=9&emc=edit_nn_20211220&instance_id=48274&nl=the-morning&regi_id=63432905&segment_id=77475&te=1&user_id=d675b6755d40e7228d41462bf4c564c5 (all internet references accessed June 3, 2024).

[10] Cf. also Pentenrieder, A. and Weber, J. (2020): Lucy Suchman. In: Heßler, Martina and Liggieri, Kevin (eds.): Technikanthropologie. Handbuch für Wissenschaft und Studium. Baden-Baden, pp. 215-225; Hälterlein, J. (2021): Epistemologies of predictive policing: Mathematical social science, social physics and machine learning. In: Big Data & Society 8 (1). journals.sagepub.com/doi/epdf/10.1177/20539517211003118.

[11] French Ministry of Armed Forces (2019): Artificial Intelligence in Support of Defence. Report of the AI Task Force. https://www.defense.gouv.fr/sites/default/files/aid/Report%20of%20the%20AI%20Task%20Force%20September%202019.pdf; U.S. Department of Defense (2020): DOD Adopts Ethical Principles for Artificial Intelligence. https://www.defense.gov/News/Releases/Release/Article/2091996/dod-adopts-ethical-principles-for-artificial-intelligence/; NATO (2021): An Artificial Intelligence Strategy for NATO. https://www.nato.int/docu/review/articles/2021/10/25/an-artificial-intelligence-strategy-for-nato/index.html.

[12] Bundesverband der Deutschen Luft- und Raumfahrtindustrie e.V. (2021): Das Future Combat Air System: Übersicht. www.bdli.de/publikationen/fcas-i-uebersicht.

[13] Klauke, S. (2021): Multi-Domain Combat Cloud – Infrastruktur und Innovationstreiber für europäische Wettbewerbsfähigkeit. In: Lichtenthaler, U. (ed.): Künstliche Intelligenz erfolgreich umsetzen. Wiesbaden, pp. 15-39.

[14] Koch, W. and Keisinger, F. (2020): Verteidigung und Verantwortung: Nutzung neuer Technologien in einem “Future Combat Air System”. In: Behördenspiegel 36 (4), p. 44.

[15] Azzano, M. et al. (2021): The Responsible Use of Artificial Intelligence in FCAS: An Initial Assessment. White Paper, p. 9.

[16] FCAS Forum (2021): Protocol FCAS Forum. www.fcas-forum.eu/protocols/protocol3/.

[17] Koch, W. (2022): Elements of an Ethical AI Demonstrator for Responsibly Designing Defence Systems. In: 25th International Conference on Information Fusion (FUSION): Linköping, Sweden, 4-7 July 2022, pp. 1-8, p. 5.

[18] FCAS Forum (2021), see endnote 16.

[19] Koch, W. (2022), see endnote 17.

[20] Hälterlein, J. (2021), see endnote 10.

[21] Weber, J. (2016): Keep adding. On kill lists, drone warfare and the politics of databases. In: Environment and Planning D. Society and Space, 34 (1), pp. 107-125; Hälterlein, J. (2023): Facial Recognition in Law Enforcement. In: Borch, C. and Pardo-Guerra, J. P. (eds.): The Oxford Handbook of the Sociology of Machine Learning. doi.org/10.1093/oxfordhb/9780197653609.013.25.

[22] Skitka, L. J., Mosier, K. L. and Burdick, M. (1999): Does automation bias decision-making? In: International Journal of Human-Computer Studies 51 (5), pp. 991-1006.

[23] Cummings, M. L. (2015): Automation Bias in Intelligent Time Critical Decision Support Systems. In: Harris, D. and Li, W.-C. (eds.): Decision Making in Aviation. London, pp. 289-294. https://doi.org/10.4324/9781315095080-17.

[24] Parasuraman, R. and Manzey, D. H. (2010): Complacency and Bias in Human Use of Automation: An Attentional Integration. In: Human Factors 52 (3), pp. 381-410.

Summary

Jens Hälterlein is a research associate in the Department of Media Studies at Paderborn University. He works with the project “Meaningful Human Control – Autonomous Weapon Systems between Regulation and Reflexion” (MEHUCO) and is coordinating the research network together with Jutta Weber. He has been the principal investigator of the project “AI and Civil Security” and has been working in several other projects on security technologies.

Jutta Weber is a technology researcher and Professor of Media Sociology at the Institute of Media Studies at Paderborn University. Her research analyzes the entanglement of human practices and machine processes, especially in the field of artificial intelligence and robotics. She is currently leading two BMBF research networks: “‘Being Taggedʼ: The Digital Reorganization of the World” (Ubitag) and “Meaningful Human Control. Autonomous Weapon Systems between Regulation and Reflection” (MEHUCO). She has been a visiting professor, among others, at the universities of Uppsala, Twente and Vienna.


Download PDF here

All articles in this issue

Advocating for A Legally Binding Instrument on Autonomous Weapons Systems: Which Way Ahead
Catherine Connolly
Autonomous Weapons Systems – Current International Discussions
Andreas Bilgeri
Digital Escalation Potential: How Does AI Operate at the Limits of Reason?
Axel Siegemund
AI for the Armed Forces Does not Need a Special Morality! A brief argument concerning the regulation of autonomous weapons systems
Erny Gillen
Human Dignity and “Autonomous” Robotics: What is the Problem?
Bernhard Koch
Burden of Proof in the Autonomous Weapons Debate – Why Ban Advocates Have Not Met It (Yet)
Maciek Zając
Reliability Standards for (Autonomous) Weapons: The Enduring Relevance of Humans
Nathan Wood
Is There a Right to Use Military Force Below the Threshold of War? Emerging Technologies and the Debate Surrounding jus ad vim
Bernhard Koch
“Meaningful Human Control” and Complex Human-Machine Assemblages – On the Limits of Ethical AI Principles in the Context of Autonomous Weapons Systems
Jens Hälterlein, Jutta Weber
Humanity in War? The Importance of Meaningful Human Control for the Regulation of Autonomous Weapons Systems
Susanne Beck, Schirin Barlag
Meaningful Human Control of Autonomous Systems
Daniel Giffhorn, Reinhard Gerndt
AI in the Age of Distrust
Henning Lahmann

Specials

Ansgar Rieks, Wolfgang Koch ChatGPT