Building trust in AI-assisted exam marking for The Mercian Trust

Building trust in AI-assisted exam marking for The Mercian Trust

Empowering education trusts with AI and Microsoft Copilot

The Mercian Trust wanted to explore how AI could support one of the most demanding and judgement-led tasks in education – exam marking. 

Teachers face sustained pressure to apply marking criteria consistently, provide clear feedback and work to tight deadlines. Even with well-defined marking schemes, variation is inevitable and workload is high. 

Before considering wider adoption, The Mercian Trust wanted to understand whether AI could act as a reliable assistant. Not to replace teachers, but to support consistency, explain decisions clearly, and reduce avoidable effort, without undermining professional judgement or accountability. 

Marra partnered with The Mercian Trust to run a focused, time-boxed proof of concept to test what responsible, trustworthy AI actually looks like in practice.

A real-world test for AI in judgement-led work  

Exam marking was chosen deliberately. It is a high-stakes process where trust, transparency and explainability matter as much as accuracy. 

The proof of concept focused on whether AI agents could: 

  • Apply real marking schemes consistently 
  • Provide clear reasoning for awarded marks 
  • Highlight why higher marks were not given 
  • Fit into existing processes without removing human oversight 

Rather than designing a generic solution, the work prioritised learning, testing assumptions and understanding design constraints in a real educational context. 

A practical, constrained approach 

Marra worked closely with stakeholders at The Mercian Trust to design and test a multi-agent AI approach that mirrored how teachers assess work in practice. 

Key design principles included: 

  • Explainability over automation 

The system was required to show its reasoning clearly, not just output a score. 

  • Simple, governable architecture 

Early experiments showed that simpler agent designs were more reliable and  easier to control. 

  • Subject-specific logic 

Agents were designed around individual subjects and exam boards rather than  forcing a generic approach. 

  • Human-in-the-loop decision making 

Conservative guardrails ensured final judgement always remained with teachers. 

This allowed the Trust to explore AI capability safely, without introducing unnecessary risk or complexity.

What changed in practice 

The proof of concept demonstrated several important outcomes. 

  • Closer alignment with human marking 

When tested using a real GCSE History paper and official mark scheme, the AI-generated result differed by just one mark from a human marker.

  • Clear, reusable feedback 

Teachers valued explanations that linked directly back to marking criteria and  could be reused in learning conversations with students. 

  • Improved confidence in AI output 

Teacher feedback highlighted that trust increased when the system showed how and why decisions were made, rather than just presenting a score.

  • Reduced cognitive load 

With conservative guardrails applied, the AI

behaved more like an experienced teaching assistant, supporting  consistency while leaving judgement with professionals. 

Just as importantly, the work highlighted where AI should not operate independently, particularly when assigning precise marks within a band. 

Lessons with relevance beyond education 

While this engagement focused on exam marking, the lessons extend far beyond schools. 

Any organisation exploring AI in high-stakes, judgement-led processes faces similar challenges. Accuracy alone is not enough. Systems must be explainable, governable and designed around how people actually work. 

This case study reinforces that AI delivers the most value when it: 

  • Removes unnecessary effort 
  • Increases consistency 
  • Explains its reasoning clearly 
  • Leaves final accountability with professionals 

A foundation for responsible adoption 

The proof of concept gave The Mercian Trust confidence in how AI could be explored responsibly. Rather than rushing to scale, the focus remains on validating approaches, engaging end users and defining clear boundaries for safe adoption. 

For Marra, this work reflects a broader approach to AI delivery. Start with real problems, test ideas in practice and design systems that people can trust. 

Let’s talk

If you are exploring how AI could support complex, judgement-led work in your organisation, we would love to talk. Our work with The Mercian Trust focused on understanding what responsible, trusted AI looks like before scaling. 

If you are facing similar questions around workload, consistency or adoption, get in touch to explore what this could mean for your organisation. 

Get in touch.

Share

LinkedInX

Ready To Move Forward?

Speak with a member of the team today