The Mercian Trust wanted to explore how AI could support one of the most demanding and judgement-led tasks in education – exam marking.
Teachers face sustained pressure to apply marking criteria consistently, provide clear feedback and work to tight deadlines. Even with well-defined marking schemes, variation is inevitable and workload is high.
Before considering wider adoption, The Mercian Trust wanted to understand whether AI could act as a reliable assistant. Not to replace teachers, but to support consistency, explain decisions clearly, and reduce avoidable effort, without undermining professional judgement or accountability.
Marra partnered with The Mercian Trust to run a focused, time-boxed proof of concept to test what responsible, trustworthy AI actually looks like in practice.
A real-world test for AI in judgement-led work
Exam marking was chosen deliberately. It is a high-stakes process where trust, transparency and explainability matter as much as accuracy.
The proof of concept focused on whether AI agents could:
- Apply real marking schemes consistently
- Provide clear reasoning for awarded marks
- Highlight why higher marks were not given
- Fit into existing processes without removing human oversight
Rather than designing a generic solution, the work prioritised learning, testing assumptions and understanding design constraints in a real educational context.
A practical, constrained approach
Marra worked closely with stakeholders at The Mercian Trust to design and test a multi-agent AI approach that mirrored how teachers assess work in practice.
Key design principles included:
- Explainability over automation
The system was required to show its reasoning clearly, not just output a score.
- Simple, governable architecture
Early experiments showed that simpler agent designs were more reliable and easier to control.
- Subject-specific logic
Agents were designed around individual subjects and exam boards rather than forcing a generic approach.
- Human-in-the-loop decision making
Conservative guardrails ensured final judgement always remained with teachers.
This allowed the Trust to explore AI capability safely, without introducing unnecessary risk or complexity.
What changed in practice
The proof of concept demonstrated several important outcomes.
- Closer alignment with human marking
When tested using a real GCSE History paper and official mark scheme, the AI-generated result differed by just one mark from a human marker.
- Clear, reusable feedback
Teachers valued explanations that linked directly back to marking criteria and could be reused in learning conversations with students.
- Improved confidence in AI output
Teacher feedback highlighted that trust increased when the system showed how and why decisions were made, rather than just presenting a score.
- Reduced cognitive load
With conservative guardrails applied, the AI
behaved more like an experienced teaching assistant, supporting consistency while leaving judgement with professionals.
Just as importantly, the work highlighted where AI should not operate independently, particularly when assigning precise marks within a band.
Lessons with relevance beyond education
While this engagement focused on exam marking, the lessons extend far beyond schools.
Any organisation exploring AI in high-stakes, judgement-led processes faces similar challenges. Accuracy alone is not enough. Systems must be explainable, governable and designed around how people actually work.
This case study reinforces that AI delivers the most value when it:
- Removes unnecessary effort
- Increases consistency
- Explains its reasoning clearly
- Leaves final accountability with professionals
A foundation for responsible adoption
The proof of concept gave The Mercian Trust confidence in how AI could be explored responsibly. Rather than rushing to scale, the focus remains on validating approaches, engaging end users and defining clear boundaries for safe adoption.
For Marra, this work reflects a broader approach to AI delivery. Start with real problems, test ideas in practice and design systems that people can trust.
Let’s talk
If you are exploring how AI could support complex, judgement-led work in your organisation, we would love to talk. Our work with The Mercian Trust focused on understanding what responsible, trusted AI looks like before scaling.
If you are facing similar questions around workload, consistency or adoption, get in touch to explore what this could mean for your organisation.