Building trust in AI-assisted exam marking for The Mercian Trust

The Mercian Trust wanted to explore how AI could support one of the most demanding and judgement-led tasks in education – exam marking.

Teachers face sustained pressure to apply marking criteria consistently, provide clear feedback and work to tight deadlines. Even with well-defined marking schemes, variation is inevitable and workload is high.

Before considering wider adoption, The Mercian Trust wanted to understand whether AI could act as a reliable assistant. Not to replace teachers, but to support consistency, explain decisions clearly, and reduce avoidable effort, without undermining professional judgement or accountability.

Marra partnered with The Mercian Trust to run a focused, time-boxed proof of concept to test what responsible, trustworthy AI actually looks like in practice.

A real-world test for AI in judgement-led work

Exam marking was chosen deliberately. It is a high-stakes process where trust, transparency and explainability matter as much as accuracy.

The proof of concept focused on whether AI agents could:

Apply real marking schemes consistently

Provide clear reasoning for awarded marks

Highlight why higher marks were not given

Fit into existing processes without removing human oversight

Rather than designing a generic solution, the work prioritised learning, testing assumptions and understanding design constraints in a real educational context.

A practical, constrained approach

Marra worked closely with stakeholders at The Mercian Trust to design and test a multi-agent AI approach that mirrored how teachers assess work in practice.

Key design principles included:

Explainability over automation

The system was required to show its reasoning clearly, not just output a score.

Simple, governable architecture

Early experiments showed that simpler agent designs were more reliable and easier to control.

Subject-specific logic

Agents were designed around individual subjects and exam boards rather than forcing a generic approach.

Human-in-the-loop decision making

Conservative guardrails ensured final judgement always remained with teachers.

This allowed the Trust to explore AI capability safely, without introducing unnecessary risk or complexity.

What changed in practice

The proof of concept demonstrated several important outcomes.

Closer alignment with human marking

When tested using a real GCSE History paper and official mark scheme, the AI-generated result differed by just one mark from a human marker.

Clear, reusable feedback

Teachers valued explanations that linked directly back to marking criteria and could be reused in learning conversations with students.

Improved confidence in AI output

Teacher feedback highlighted that trust increased when the system showed how and why decisions were made, rather than just presenting a score.

Reduced cognitive load

With conservative guardrails applied, the AI

behaved more like an experienced teaching assistant, supporting consistency while leaving judgement with professionals.

Just as importantly, the work highlighted where AI should not operate independently, particularly when assigning precise marks within a band.

Lessons with relevance beyond education

While this engagement focused on exam marking, the lessons extend far beyond schools.

Any organisation exploring AI in high-stakes, judgement-led processes faces similar challenges. Accuracy alone is not enough. Systems must be explainable, governable and designed around how people actually work.

This case study reinforces that AI delivers the most value when it:

Removes unnecessary effort

Increases consistency

Explains its reasoning clearly

Leaves final accountability with professionals

A foundation for responsible adoption

The proof of concept gave The Mercian Trust confidence in how AI could be explored responsibly. Rather than rushing to scale, the focus remains on validating approaches, engaging end users and defining clear boundaries for safe adoption.

For Marra, this work reflects a broader approach to AI delivery. Start with real problems, test ideas in practice and design systems that people can trust.

Let’s talk

If you are exploring how AI could support complex, judgement-led work in your organisation, we would love to talk. Our work with The Mercian Trust focused on understanding what responsible, trusted AI looks like before scaling.

If you are facing similar questions around workload, consistency or adoption, get in touch to explore what this could mean for your organisation.

Get in touch.

Building trust in AI-assisted exam marking for The Mercian Trust

Building trust in AI-assisted exam marking for The Mercian Trust

Empowering education trusts with AI and Microsoft Copilot

A real-world test for AI in judgement-led work

A practical, constrained approach

What changed in practice

Lessons with relevance beyond education

A foundation for responsible adoption

Let’s talk

Share

Speak with a member of the team today

Marra. Forward Together.