Anand Naidu brings a wealth of development experience to the complex intersection of enterprise software and artificial intelligence. As an expert proficient in both frontend and backend architectures, he understands that the bridge between a powerful algorithm and a functional business tool isn’t just about writing better code—it’s about environmental awareness. In the world of Enterprise Resource Planning (ERP), where the stakes involve millions of dollars and rigorous regulatory oversight, the generic capabilities of modern AI models often fall short. Today, we explore why the race for the “highest benchmark” might be a distraction for ERP teams who actually need autonomous agents rooted in the specific realities of their organizational data and financial controls.
Our discussion centers on the fundamental shift from using AI as a simple question-answering tool to deploying it as an active participant in business workflows. We explore the limitations of current AI benchmarks that prioritize math and coding over compliance-driven decision-making. We also break down the four pillars of “agentic context”—data, memory, practices, and transparency—which serve as the essential infrastructure for any AI looking to manage complex processes like procure-to-pay or record-to-report. Finally, we look at why success in the next era of ERP will be defined by how well a company builds the surrounding informational layer rather than which specific model they choose to plug into their systems.
General AI benchmarks focus heavily on mathematics and coding, yet ERP environments require strict adherence to financial controls and compliance. Why is there such a significant gap between what models can do on paper and how they perform in a live business setting?
The disconnect exists because most foundation models are evaluated against reasoning and logic puzzles that, while impressive, don’t account for the rigid hierarchies of an enterprise. ERP teams operate within a world of procurement policies, operational constraints, and approval hierarchies that a generic model simply hasn’t been designed to navigate. When we provision these tools, we often find they are confident and coherent but fundamentally wrong because they lack the specific rule-abiding nature required for financial decision-making. This results in a familiar frustration where the AI generates a response that sounds professional but fails to respect the complex, interdependent workflows like procure-to-pay or order-to-cash. Ultimately, a high benchmark score in mathematics doesn’t translate to knowing how a specific company handles its intercompany policy or its unique chart of accounts.
We often hear that a better prompt can solve most AI hallucinations, but you argue that “agentic context” is something much deeper. How would you describe the difference between a well-guided model and one that truly understands an organization’s specific environment?
Think of it as the difference between asking a brilliant stranger for a journal entry recommendation and asking a seasoned colleague who has spent three years inside your specific organization. A prompt is just a set of instructions, but agentic context is the full informational layer that surrounds an agent while it works. It allows the AI to trace every posting back to its source and understand internal policies that aren’t written in a single prompt or manual. For an ERP team, this context is built on data, memory, and practices, which ensure the AI isn’t just guessing based on generic patterns it learned during training. Without this context, the model is essentially a high-performance engine trying to drive through a city without a map or a set of traffic laws.
You mentioned that “Data” is the first pillar of this context. How do live connections to systems like the general ledger or procurement databases prevent an AI agent from becoming a “plausible but dangerous” tool?
Every agent needs authoritative, real-time sources to be effective, which means it must have live connections to the general ledger, inventory systems, and supplier records. If an agent is only operating on a fragment of the truth or a static dataset, its output might look plausible on the surface while containing errors that introduce massive financial risk. We need agents that are wired directly into CRM, contracts, and compliance frameworks so they can see the actual state of the business at any given second. The most effective agents will be the ones connected to the richest data sources, not necessarily the ones trained on the largest general dataset. Without deep, current data access, you are essentially asking the AI to make high-stakes decisions based on old news, which is a recipe for a compliance disaster.
Many AI interactions feel like a “reset” every time a new session starts, which seems inefficient for finance. How does the “Memory” pillar transform a transactional interaction into a long-term relationship for an ERP team?
Without memory, every period close starts from zero, and the AI fails to learn from the institutional experience that humans build up over years. Memory allows an agent to accumulate knowledge from prior approvals, past exceptions, and the specific feedback provided by finance controllers during previous quarters. It begins to learn which cost centers routinely flag for review and which specific vendors might require extended payment terms, allowing it to move beyond simple transactions. Over time, the agent recognizes which approval chains tend to collapse under time pressure, and it can adjust its planning accordingly to meet deadlines. This persistence of information is what allows the AI to grow with the company, becoming more efficient with every period-end close rather than repeating the same mistakes.
Organizational “Practices” vary wildly from one company to another, even within the same industry. How does an agentic system distinguish between the specific needs of a CFO versus those of an audit committee when performing a variance analysis?
This is where the distinction between “what” was asked and “how” the work is supposed to be done becomes the defining factor for success. One finance team might want a lean, data-heavy summary designed for a CFO’s quick review, while another might require a massive document built specifically to satisfy an audit committee’s standards. Practices capture these nuances, including individual style, organizational policy, and the specific risk appetite of the leadership team. If an agent doesn’t align with these governance structures and operating norms, it becomes operationally useless, regardless of how “smart” its reasoning might be. It must understand the guardrails—the document templates and the approval rules—that keep its output in line with how the business actually functions.
In high-stakes environments like ERP, a “black box” approach is often a dealbreaker for leadership. How does the “Transparency” pillar allow an AI agent to move from being a simple tool to becoming an accountable partner?
In a world of strict regulators and high-stakes audits, transparency is not just a nice feature to have; it is a prerequisite for the AI to exist in the workflow. An agent must be able to show its work by citing the specific GL line it drew from or flagging the exact internal policy it applied to a specific recommendation. When an AI can explain its reasoning and ground its output in verifiable sources, it moves from being a mysterious black box to an accountable partner that a human can trust. This is especially vital when an agent is recommending a hedge position or flagging a duplicate invoice, as the human in the loop needs to see the “why” behind the action. Providing a clear trail of logic and a confidence level for every recommendation is what satisfies auditors and builds long-term institutional trust.
What happens to an ERP team that focuses only on the power of the AI model while neglecting the surrounding infrastructure of data and memory?
ERP operations are layered, policy-bound, and highly consequential, meaning a single error or policy breach can compound quickly across the entire organization. A team that only prioritizes model selection—the “engine”—might see some speed improvements, but they are essentially building a system that can introduce risk at an accelerated scale. What they actually need is a unifying layer that brings together data, memory, and practices into one reliable environment the AI can work from. Without that infrastructure, the AI might close a period or recommend a position while quietly eroding governance or accumulating audit risk. The teams that truly win will be the ones that give their models the best context, ensuring that every decision made by the AI adheres to micro-level operational rules while supporting macro-level business objectives.
What is your forecast for the future of ERP teams who successfully bridge this gap between raw AI power and agentic context?
I believe we are entering a phase where the measure of success will be “capability delivered without negative consequence.” We are going to see a shift away from model-centric discussions toward a focus on “agentic readiness,” where the quality of an organization’s internal data and practice frameworks becomes their greatest competitive advantage. In the near future, ERP agents will plan, decide, and act autonomously across workflows like record-to-report, but only if they are grounded in a transparent and persistent memory layer. The teams that successfully build this agentic context will not only accelerate their operations but will also strengthen their governance, making them far more resilient than those still chasing the latest general AI benchmarks. Ultimately, the AI will stop being a tool you talk to and start being a partner that understands the history and future of your general ledger as well as your best controller does.
