Most AI Training in the Humanitarian Sector is Teaching the Wrong Thing

Three weeks ago I wrote here about the three principles we teach at AidGPT: treat AI like a 21-year-old intern, use its mind not its memory, and remember that specificity is safety. That piece landed well. It also made a set of claims I couldn't fully prove at the time. That verification training delivers bigger behaviour change than prompting training. That data safety confidence shifts more than any other competency when you take it seriously. That the AI Workflow Card catches unsafe use in a way policy documents don't.

I couldn't prove those claims in March because the programme we'd just finished with Action Against Hunger UK was still being evaluated. The post-training survey results were not yet in. I wrote the article anyway because I believed the claims, and because the longer story of what's happening with AI in humanitarian work couldn't wait another month.

The numbers are now in.

I want to share them with you, because they matter for anyone making decisions right now about whether to commission AI training for their team, what kind of training to commission, and what to expect from it. It also closes a loop I opened in the November Shadow AI piece, again in the February Liar's Dividend piece, and again in March. If you've been following along, this is chapter four.

What we delivered

Between 10 and 27 February 2026, we ran a six-session Responsible AI in Practice programme for Action Against Hunger UK. Ten staff from nutrition and health, programme funding, advocacy, and operations. Six ninety-minute virtual sessions, twice a week over three weeks. Microsoft Teams, because that's what the client uses. The curriculum was the same one I described in the March piece: prompting in Session 1, safe use and the Workflow Card in Session 2, hands-on practice in Session 3, two-agent verification in Session 4, reusable tools in Session 5, sustained adoption practices in Session 6.

Before I share the numbers, one honest caveat. Eight of ten participants completed the baseline confidence survey. Six of ten completed the post-training evaluation. Five participants completed both, which is the number that supports the individual-level comparison. Five people is not a study. It is a signal. But it is a signal consistent with what we've seen in earlier cohorts at NRC Sudan, Caritas Switzerland, IRC Myanmar, and the Estonian Refugee Council, and it is the first cohort we have had clean enough matched data from to share publicly. Read the numbers below in that spirit.

What the data shows

Average participant confidence rose from roughly 2.0 to 4.2 out of 5 across five core competencies. Every comparable competency showed a gain of at least 1.9 points. The average recommendation score was 9.2 out of 10, with every respondent scoring 8 or above. No post-training respondent remained at the "tried once or twice" usage level. These are the headline numbers.

The more interesting numbers are in the breakdown.

The two largest gains were in the two areas of greatest baseline concern. Protecting sensitive data rose by 2.4 points. Verifying AI-generated content rose by 2.2 points. Writing clear prompts, the headline skill that most AI training sells as its main offer, rose by 1.9 points. That was the smallest gain of the five measured competencies.

Let me say that again, because it is the whole argument in one sentence. The skill the sector's AI training market is organised around was the skill that moved least, not because the teaching was weak but because it was the easiest starting point. The harder skills, verification and data safety, were where the real shifts happened.

You can see the same pattern in what participants said afterwards. Before the training, every respondent cited data privacy as a concern. Afterwards, most did not. That is close to a total collapse in the thing the sector is most worried about when staff use AI. Most respondents still cited accuracy as a concern, which is the right answer. We spent six sessions teaching people to distrust AI outputs, and the fact that they still distrust them is the point.

Alexandra Rutishauser-Perera, Director of Nutrition and Health at Action Against Hunger UK, put it well in the testimonial she wrote us after the programme. I'm quoting her own words here, not mine:

"Whatever any of us may think about AI, the reality is that it is here to stay, and we need to use it smartly to remain effective and relevant in our work. I particularly appreciated that the trainers were objective and transparent about both the advantages and the limitations of AI, helping us navigate the topic with clarity rather than hype. As a result, our team now feels more confident and empowered to use AI tools responsibly and productively in our day-to-day work."

Clarity rather than hype is the standard I want us to hold training to. If the training your team has done, or is about to do, leans on productivity gains and does not spend serious time on verification, data classification, and sustained adoption, I would treat that as a warning sign.

The moment that proved the point

I wrote in March about using AI's mind, not its memory. The ACF programme gave us a teaching moment that I will use in every cohort from now on, because it showed the principle working in real time.

Session 4 was about the two-agent verification workflow. One agent generates, a second agent verifies, and the human decides. We built a worked example around a real donor reporting scenario. The generator agent produced a draft programme update that looked professional and well-structured. I could see the participants reading it, nodding. It was the kind of output a junior staffer might send up the line without a second glance.

Then we ran the verifier.

It caught a fabricated statistic. The draft reported a recovery rate of 91.3 percent. The actual figure from the source data was 89.2 percent. Not a rounding error. A number the AI had confidently made up because it fitted the shape of the sentence it was writing. The verifier also flagged a derived calculation with no methodology and a missing default rate.

You could feel the shift on the call. The participants had been told, in theory, that AI makes things up. They had nodded along. Now they had watched it happen, caught in real time by a workflow they could run themselves. Two-agent verification stopped being an abstract concept in a slide deck and became a practice they were going to use the following week on their actual work.

This is what "use AI's mind, not its memory" looks like when it becomes operational. It is why verification training produces bigger confidence gains than prompting training. And it is why, when you look at the ACF data, the verification competency is one of the two that moved furthest.

The quieter failure mode

The fabricated statistic at ACF was the teachable version of the AI risk. It was caught, in the room, by a workflow the team had just learned. Everyone saw the error and everyone understood what to do about it. "Use AI's mind, not its memory" worked exactly the way it was supposed to. That is the best possible outcome of an AI failure, and it is also the rarest.

In a donor report, the made-up number is a credibility problem. A funding relationship, a director's signature, an eventual correction, an awkward email. Real, but recoverable.

The failure modes that actually worry me are the ones further upstream, and the ones that are never caught, because they do not look like errors on the surface. An AI summarising a forty-page needs assessment can drop a location from a list of prioritised districts without flagging the omission. It can describe a community as stable when the source document describes it as deteriorating. It can describe a community as in need when the source document describes it as recovering. It can aggregate caseload figures across two different reporting periods and produce a confident total that is structurally wrong. None of these look like errors on the surface. They look like clean, well-written summaries, produced in seconds, of documents nobody has time to re-read.

Now put that inside a rapid response planning cycle. A cluster meeting in the morning. A targeting decision by lunchtime. An operational plan submitted to the country director by the end of the day. Decisions like these are made on summaries, not source documents, because nobody in a moving response has time to read the source documents. The summary is treated as the truth. The summary becomes the input to the next meeting, the next decision, the next plan. And if the summary quietly dropped a village, or mislabelled a caseload, or inverted a trend, the error does not show up as an error. It shows up as a location nobody discussed, a district nobody raised, a trend nobody questioned. The decision gets made on the incomplete picture, and the incomplete picture becomes the record of what was decided.

This is the quiet version of the AI risk in humanitarian work. It is not rogue algorithms making autonomous targeting calls. It is one fabricated number, one dropped location, one inverted trend, inside one summary document that a stretched programme officer relies on at eleven at night because they have seven other deadlines and the alternative is reading the original forty pages they do not have time to read. Nobody checks. The village that needed the intervention does not make it onto the list. The household that was already recovering gets scheduled for a follow-up that was not needed. The caseload figure that was wrong shapes next month's planning. Nobody ever connects the downstream decision back to the upstream summary, because the downstream decision looks exactly like every other decision made that day.

This is why verification is not a nice-to-have. It is the single most important skill an AI-using humanitarian team can build. The two-agent workflow we taught at ACF is one way to do it. Two-chat cross-checking is another. Structured verification prompts that force the AI to flag anything uncertain is a third. The method matters less than the principle, which is that an AI output is never finished until a human has checked it against the source.

The bit the sector keeps skipping

Here is the question that sits behind every training programme and only becomes visible a month after it ends. It is the question I haven't written about before, and it is arguably more important than any of the numbers I've just shared.

Individual confidence gains fade within four to six weeks if they are not embedded in how the team actually works. This is not speculation. It is the pattern we see across every cohort, and it is well documented in the research on adult learning. You can teach someone a skill in six sessions. You cannot keep the skill alive in their daily practice without team-level structures: shared resources, regular check-ins, clear ownership of who maintains the prompt library, explicit permission to raise AI failures as learning moments rather than individual mistakes.

This is why the final session of every AidGPT cohort is designed around sustained adoption rather than new content. At ACF UK, the team ended the programme by committing to four things. Monthly check-in meetings where people share wins and failures. A policy working group to draft AI guidance for the organisation. Continued development of a shared prompt library. Identification of internal AI champions.

I can't promise these commitments will all land. I can promise that a training programme that doesn't surface these commitments at all is delivering half the job. The March piece ended with the claim that Shadow AI adoption is happening whether organisations guide it or not. The more precise claim is that individual AI skills fade whether organisations reinforce them or not. A verification habit that slides back is the thing that lets the quieter failure modes from the previous section start showing up in your decision-making, quietly, one summary at a time. Training without sustained practice structures is a six-week injection of confidence followed by a six-week slide back to where you started.

This is also why I'm increasingly cautious about the AI training market more broadly. I see a lot of content being sold as "AI training for the humanitarian sector" that is really one-off productivity workshops. No baseline measurement. No follow-up. No team-level adoption design. No honest conversation about what the training cannot do. The sector deserves better than that, and so do the teams who will use these tools whether the training is good or bad.

The next cohort

What happens when an AI-assisted summary shapes a targeting decision, a caseload close, or a rapid response plan, and nobody checks the source? It is already happening, in teams across the sector, whether the staff involved have been trained or not. The question is only whether they have the skills to catch it.

We're running the next open AidGPT Responsible AI in Practice cohort for humanitarian actors starting Tuesday 21 July 2026, with sessions on 21, 23, 28, 30 July and 4, 6 August. Six interactive ninety-minute sessions run at 16:00 East Africa Time. EUR 350 per person. Cohorts are capped at twenty to keep the learning interactive.

It is for individuals or small teams from UN agencies, Red Cross and Red Crescent societies, and humanitarian NGOs who want to learn how to use AI well rather than guess at it. If you have already done an AI training that focused mostly on prompting and left the harder questions unanswered, this is especially for you.

To register, use the AidGPT application page at aidgpt.org/training/apply. A full anonymised case study from the ACF UK programme will be published on aidgpt.org with the client's review and sign-off.

As always, thanks for reading.

Tom

Tom's Aid and Dev Dispatches is a weekly newsletter on humanitarian and development trends, read by 9,000+ people working in the sector. If someone forwarded this to you and you'd like the next one in your inbox, you can subscribe on LinkedIn.

What we delivered

What the data shows

The moment that proved the point

The quieter failure mode

The bit the sector keeps skipping

The next cohort

Enjoyed this article?