Who wants to be the human in the loop?
A neglected question that is about to explode
Last week, I was in San Francisco for the HumanX conference. Listening to people there pushed me to ask a question that’s been bouncing around in my head with increasing insistency:
What’s the psychological impact of being the human in the loop?
I feel like this issue is a time bomb that could destroy current plans of how AI will be governed. If you listen to any AI policy conversation for more than a few minutes, you’re likely to hear the phrase “human-in-the-loop” (HITL). It’s a catch-all term that provides reassurance and allow us carry on with the technical discussion. Like in the workplace, if we just keep the right people “in the loop,” all will be well.
The idea evokes an image of a capable, watchful person who will intervene expertly if the system goes wrong. Whole governance frameworks are built on top of this comforting picture. For example, Article 14 of the EU AI Act tries to put a set of requirements on humans to “prevent or minimise the risks to health, safety or fundamental rights”.
But the Act says nothing about whether these humans will have the skills, attention, or motivation to perform this oversight. Or, even if they can, for how long. Or what the experience would be like.
In other words, we’re not thinking enough about what it actually feels like to be the human in the loop.
I find that gap increasingly hard to ignore because billions (?) of humans-in-the-loop may soon face two contrasting problems that we’ve been neglecting:
Verification burdens caused by too much cognitive stimulus;1
Vigilance atrophy caused by too little stimulus.
The tricky thing is that these two risks can affect the same person on the same day. Moreover, they call for almost opposite responses. Even trickier! Here I suggest how we should start tackling this problem.
Humans supervising machines: What we already know
The foundational research on the psychology of supervisory control goes back to Tom Sheridan and William Verplank in the late 1970s. They were trying to understand the levels of control humans could and should have over undersea vehicles that could operate partially autonomously. They came up with ten ‘levels’ of control, which you can find here (the report is surprisingly fascinating and definitely feels like it’s from another era). The scale has held up pretty well in domains from aviation to nuclear power to manufacturing.
So what can go wrong?
Automation complacency. Our vigilance starts to crumble if we have to monitor a system over long periods. Reliability breeds trust, which breeds complacency. This is part of a bigger problem of risk habituation in workplaces.
Skill erosion. Operators who supervise automated systems gradually lose the manual ability required to take over when the system fails. This is one of the key “ironies of automation”. One of the most dramatic examples is the crash of Air France Flight 447 in 2009, where the autopilot suddenly failed, meaning the pilots had to fly manually at high altitude. The pilot mistakenly sent the plane into a stall and then failed to read the instruments correctly. Under cognitive load, the pilots were unable to remember skills they hadn’t used for years.
Weakened sense of accountability. This one is a bit subtler. If we feel we have some ability to control outcomes, then our motivation rises - even if the task itself is repetitive and boring. But if we feel that agency has been removed, our motivation and (crucially) our sense of responsibility weaken.
These insights are well-evidenced and may translate fairly well to generative AI. But I think generative AI raises a new risk that needs to be identified.
A homunculus on a knife’s edge
There is a tempting story about classical automation: the system announces its failures, alarms sound when something is wrong, and the supervisor’s job is to respond. In contrast, GenAI doesn’t announce its failures clearly. That’s the narrative I used in an earlier draft of this piece, before I realized how stupid I was being.
Anyone who has looked seriously at industrial disasters knows the picture is messier, as I’ve noted before. At the Texas City refinery in 2005, faulty level indicators misrepresented what was happening inside a distillation tower as it overfilled, causing a catastrophe. On Deepwater Horizon in 2010, ambiguous pressure readings were misinterpreted before the rig exploded. During the Three Mile Island meltdown, a critical valve indicator showed closed when the valve was in fact stuck open.
The opposite problem occurs as well. Stanislav Petrov, sitting in a Soviet early-warning command center on the night of 26 September 1983, looked at alarms telling him the United States had launched a nuclear attack and judged that they must be false. He was correct.
So, instruments can lie and what prevents disaster is that supervisors know this. That knowledge enables expert oversight: the intuition that notices when readings don’t hang together and senses the patterns of incipient failure before any alarm has sounded. Arguably, that ability is the main value that a HITL brings.
The new part with LLMs is that they can produce fluent, convincing explanations. They can also hedge and caveat. If you ask Claude or Gemini if it’s sure about something, it will respond in ways that look like careful self-analysis.
In other words, LLMs seem like they have some kind of homunculus supervisor, an internal monitor who is watching over the system’s outputs and will raise a hand if something is off.
Yet as I’ve pointed out before, metacognition is a weak point for LLMs. Often they are bad at knowing when they are wrong; their confidence statements are more like improvisations that simulate metacognition. It’s like staying in the control room and opining rather than going down into the lower decks of the rig to inspect the machinery. (Recognizing that depth metaphors may be misleading!)
I don’t want to overemphasize this point, since automated systems often have backup processes that signal when the first line of alarms may be faulty. But nothing compares to having the appearance of a convincing co-supervisor alongside you. And even if the internal monitoring of LLMs is not completely off, it’s still less accurate than it appears.
So I think we have a new danger: the appearance of self-regulation invites the HITL to relax the intuition they would otherwise develop. After all, their partnering homunculus supervisor sounds so thoughtful and reassuring. And wouldn’t they know?
Yet what happens if you find out that your co-supervisor is a bullshitter? Then the situation becomes more taxing: now you don’t trust what they’re doing, but you can’t see the flaws right away. There are echoes here of how people can suddenly flip from algorithmic appreciation to algorithmic aversion. Hence the knife’s edge.
Both of the two HITL risks that follow come from the challenge of trying to supervise a system that can give a convincing yet deceptive impression of self-regulation.
Verification burdens, or “black box cognitive exhaustion”
Last month, researchers from Boston Consulting Group published a study in the Harvard Business Review of 1,488 workers using AI. They were looking at the cognitive effects of working with AI tools and claimed to identify a phenomenon of “AI brain fry”. They defined this as mental fatigue that comes from overseeing AI systems, leading to increased errors and intentions to quit.
Although the clickbait nature of this term makes me recoil slightly, the brain fry article provides interesting insights into the emerging experience of being a HITL for generative AI:
But what is AI brain fry? Many participants used the words “fog” or “buzzing.” They described intensive back-and-forth with the tools, followed by an inability to think clearly, like a mental hangover, comprised of difficulty focusing, slower decision-making, and headaches, requiring several to physically step away from their computer to “reset.”
What makes this particularly relevant for thinking about HITL is that “brain fry” was closely linked to how much oversight you were having to perform (a high degree of oversight led to 12% more mental fatigue). On the other hand, people who could confidently outsource repetitive tasks to AI felt much better!
In one sense, this situation can be explained by the Jevons Paradox. Rather than reducing our output, we use the efficiency of AI to ramp it up, thereby testing our ability to stay on top of things. In this world, the outputs AI is producing may be valuable to us: they interest us; we have to inspect them, possibly with gratitude.
Yet the sheer volume of material to review becomes a burden. The HBR article suggests that productivity may start to decline once people start using four or more AI tools at once. One of the managers interviewed put it this way:
It was like I had a dozen browser tabs open in my head, all fighting for attention. I caught myself rereading the same stuff, second-guessing way more than usual, and getting weirdly impatient.
The homunculus makes things worse: you have convincing accounts about why everything is great, plus the sneaking suspicion that all might not be right. As we’ve seen, LLMs may lack transparency - they are grown not built - so it may be impossible to understand exactly why a certain output has been produced. The LLM itself may not be able to tell you, so you expend precious cognitive effort trying to get inside the black box and piece together what it’s been up to. All without producing much yourself.
So the verification burden is the cognitive work required to evaluate an output you did not create, using reasoning you cannot inspect, against a system that sounds self-regulating but probably isn’t. A good chunk of that is unique to being a HITL for generative AI, and a good chunk of it is coming our way.
Vigilance atrophy: Even less fun
Consider the likely future for a different set of workers. Radiologists may have scans pre-read by an AI and then sign off on most of them. Compliance officers may get flagged transactions pre-triaged by a model and wave through the great majority. Teachers may have their grading drafted by AI and then apply a light edit - perhaps.
This is the opposite problem: rather than brain fry, we have a potential brain freeze. (I didn’t use an LLM to come up with this slightly annoying sentence - more on LLMs being annoying in a second.) Of course, the existing literature on automation covers this pretty well - complacency, skill erosion, outsourced accountability, etc. The supervisor is nominally in control but becomes passive and inattentive; errors creep in.
As I outlined above, it’s possible that the homunculus illusion accelerates this tendency. The LLM produces fluent, qualified, and reasonable-sounding text, thereby reassuring the operator that there is nothing to check. But I wonder if there’s another possibility as well.
Suppose you are someone who used to gain purpose through your job. You are part of a high-status profession, like a doctor or lawyer. Now there is an interloper who has taken all the interesting parts of your job. But you haven’t been fired - you have to sit there, maybe fuming, as it chirpily and fluently talks you through the tasks you used to do and you. Want. To. Kill. It.
People may get irritated with cutesy greetings from LLMs even if they don’t resent them. But in this HITL situation, the supplanted employee could end up in a toxic space where they start calling out the LLM through boredom, mischief or ill will, not because they actually think it’s incorrect.
This point raises the issue that HITLs will often be supervising the exact work they used to do themselves. That’s a familiar process in human history; the Industrial Revolution was full of handloom weavers watching power looms. But I wonder - and I’m happy to be challenged - if the coming HITL transition will be different because it squarely targets the judgment of workers.

Even if the weaver no longer worked with cloth, they still had the eye for how well it was made. When compositors used Linotype, they retained their judgment about layouts. The humans still judged whether the machines were doing a good job.
If generative AI is different, it’s different because it replaces the practitioner’s judgment rather than their manual execution. Radiologists will be supervising a machine interpreting X-rays, a core part of their professional identity. Lawyers will be watching AI draft the arguments that were central to their role (although we are likely to still need a human to stand up in court, for now…).
I’m reminded of a line by Philip Larkin: “Something is pushing them / To the side of their own lives.” He was talking about becoming a parent, but I feel it captures the loss that many people will feel as they move from being judgers to the overseers of synthetic judgments. We should be looking at how people coped in the past when their skill, craft and dexterity was supplanted, while recognizing that some aspects of the coming shift are completely new.
Note that these psychological effects may be felt even if the LLM is extremely reliable. We’ve moved on from debating concerns about the business outcomes (will the power plant melt down) to concerns about the human outcomes (will the power plant’s employees melt down). But, even if you aren’t so bothered about the human implications, the two may not be separable: a new paper on the “Human-AI contracting paradox” suggests that as AI gets more and more accurate, it costs more and more to pay someone to supervise it. In that perverse situation, getting a less good AI may be the most rational option.
Coders as canaries
For all these reasons, it’s worth paying attention to what software engineers are experiencing, since they are likely to be seeing the future first. They are maybe the largest cohort of knowledge workers that have been quickly repositioned as AI supervisors. They are technically sophisticated, have a strong pre-existing professional identity, and form vocal communities capable of generating evidence about what the transition feels like.
When I was in the Bay Area, I met many people grappling with the transition. They still had jobs, but their job had become transformed. One positive frame they had was that it was like being promoted into management: they no longer had to do everything themselves, but were supervising dozens of agents instead. One negative frame was summed up by the question on Stack Overflow’s blog: “Are you a real coder, or are you using AI?”
The ambivalence is also found in a recent report from Anthropic. It showed plenty of positive stuff: a tremendous amount more was getting done. Alongside that, it showed evidence of the exact concerns I outlined in terms of skill erosion and vigilance atrophy. As one employee put:
Honestly, I worry much more about the oversight and supervision problem than I do about my skill set specifically… having my skills atrophy or fail to develop is primarily gonna be problematic with respect to my ability to safely use AI for the tasks that I care about versus my ability to independently do those tasks.
What coders are experiencing now will arrive for many other professions, in slightly different forms. And many of those groups will have less technical fluency to evaluate AI outputs, weaker community structures, and less support from institutions. So the preview from coders may be one of the better scenarios.
The way forward
Verification burdens and vigilance atrophy may not be alternatives. It may not be as simple as overwhelmed marketers vs disengaging compliance officers. You could see a nightmare combination where people get too exhausted to catch errors, which means they also end up not caring if they do or not. That insight also highlights the fact that more human oversight is not always better - we need to avoid both brain fry and brain freeze.
To do that, we need to start seeing HITL as an issue of psychological management, rather than just task allocation. We need to go beyond just positioning a human in a governance diagram, but also understand what that position is doing to their beliefs, emotions and behaviors. So what are some ways of handling that management?
For the verification burden, the obvious priority is about measuring and managing cognitive load. That might include recommended limits on the number of AI systems that one person is asked to supervise simultaneously; designing interfaces that help people evaluate the AI outputs in a structured way, rather than having to figure it out themselves every time; maybe even protecting time in the working day that does not involve AI oversight at all - if the goal is sustainable engagement.
For vigilance atrophy, one direction would be about maintaining engagement and skills. That might include: incentivizing employees to deliberately practice the tasks the AI is replacing in certain cases; strengthening professional support networks for mentorship and peer learning; and also - a genuine attempt to reconstitute professional meaning.
In both cases, we need to continue being honest about what AI can and can’t do - to reject the illusion of the homunculus AI supervisor. If people have a well-calibrated trust of AI, then they can remain on the productive edge between dangerous complacency and exhausted suspicion. They will also retain the expert skeptical intuition that made supervisors of automated systems so effective.
I’m conscious that there’s much more thinking to be done. But I’m also conscious that we’re about to deploy HITL at a massive scale without understanding its psychological impacts. So, if we want human oversight to be both effective and bearable (maybe even enjoyable?), we need to do that thinking quickly.
Next: My follow-up post on how to improve alignment between humans and AI has sat in my drafts, 95% complete, for two weeks now. AI is meant to stop this kind of thing happening; the bottleneck is still human.






This is coming for physicians and medicine too, and I fear your same concerns.
This is such a great post, thank you for giving me the vocabulary to explain some of the things I feel on a daily basis as an AI user/HITL! I agree this is a time bomb, although I worry it's one that will be shouldered by the individual employee, more so than those making the decision to put workers in that position. I can see it might just result in more dissatisfaction, stress, depression, loss of expertise, and general loss of progress if workers start to switch jobs, while skills erode and institutional knowledge is lost.