AI is powerful, but it’s not magic. Simply dropping a Large Language Model (LLM) into your organisation without a plan won’t deliver results. Success comes from integrating AI into your culture and aligning it with processes, people, and goals. Adoption is about strategy, not shortcuts.
Why readiness matters
Our Copilot Readiness Agent exists for a reason. Readiness gaps often slip through unnoticed, creating friction in delivery. When roles, data, or success measures aren’t defined, delivery slows and confidence erodes. That belief shapes Marra’s own AI initiatives. Even with a healthy starting point and clear readiness measures in place, we still faced challenges developing our Agent. Here’s what we learned.
Challenges we faced
We’ve learned when AI adds value and when it doesn’t. These lessons shaped our approach, and they can help shape yours. We’ve been through the journey in real-world deployments. Even when creating our Copilot Readiness Agent, we faced challenges:
Challenge 1: Orchestrating AI to follow a set list of questions was harder than expected
Generative AIs generate text outputs by predicting the next most probable word, based on the input prompt and their training data. This makes them feel more conversational, but it also means they naturally deviate from rigid scripts. Making Agents follow a strict question flow requires careful prompt engineering, and a highly curated instruction set. You might assume that AI models behave similarly, but differences between products such as Anthropic’s Claude and OpenAI’s GPT result in different outcomes. Even different versions of a model can have widely different behaviour.
Choosing the right model, crafting precise prompts, and providing the right context are all key players to ensure consistent behaviour aligned with performance expectations. We restructured our prompt instructions multiple times across 3 different models before landing on a solution, using a strict Markdown based format.
Interestingly, we learned that what we excluded from the instructions mattered as much as what we included. Reducing complexity lowered the risk of unintended reasoning and improved reliability.
Challenge 2: Marrying Marra’s readiness model with generative AI required fine-tuning
One of the biggest challenges was balancing the LLM’s pre-trained knowledge with the unique data we needed it to reason over. For our Agent, we relied on Copilot’s general knowledge for orchestration and backend integration, but its assessments had to be grounded in Marra’s Copilot Readiness Model. To achieve this, we scoped and bounded the Agent’s generative capabilities so they aligned with the readiness evaluation. We tackled authentication complexities, ensuring the Agent accessed data securely based on context. Rigorous testing was essential to peer into the “black box” of generated responses and confirm that our model was being applied as intended. Regression testing ensured that as we balanced our data, the Agent continued to orchestrate the assessment as envisioned.
Copilot Studio’s Evaluation tool was critical here. It allowed us to inspect how the Agent built its responses, including the rationale behind decisions, giving us confidence that the Agent was functioning correctly and using our data as designed.
Challenge 3: A public Agent open to anonymous users posed key risks that needed to be addressed
Exposing an AI Agent to the public, especially without end-user authentication, introduces a new set of challenges. The biggest risk is data leakage, where the Agent inadvertently reveals sensitive logic. Generative AI models are designed to be helpful and conversational, which can sometimes lead to oversharing if guardrails aren’t in place.
We leaned on our own readiness principles and answered the critical question: “Have you defined who will use this Copilot and what data they can access?” Our data was fit for public use, but we implemented strict controls to prevent disclosure of underlying data structures and processes.
Another challenge was cost control. A public Agent means anyone in the world can engage with it, so rate limiting and monitoring became essential. Copilot Studio solutions are based on billed messages, so we set fair use limits, implemented alerts, and enforced strict cost policies to avoid budget overruns. If the Agent reaches its monthly limit, it politely informs users it’s at capacity. We felt this behaviour was right for us, but for enterprise scenarios, we’d recommend handling this gracefully on the hosting page. Users can be redirected to helpful static content or provided an alternative contact route.
Your next step
As a consultancy, we don’t just talk about AI, we live it. We’ve built solutions, faced challenges, and refined our methods. We have the lessons, the experience, and the confidence to guide you, and we’re ready to help. If you’re considering Copilot or any AI initiative, start with readiness. It’s the difference between a smooth rollout and a costly misstep. Our Readiness Agent is designed to shine a light on gaps before they become roadblocks, giving you clarity and confidence to move forward.
Take the readiness assessment here to get an instant view of your project’s AI health.
Written by Luke Hill, Lead Developer