Home / AI & Trends / Why Is Data the First Step in Enterprise AI?

Why Is Data the First Step in Enterprise AI?

Dec 16, 2025

It is becoming increasingly difficult for organizations to separate the genuine signal from the deafening noise in the ever-accelerating world of artificial intelligence. Each day seems to bring a new benchmark, a more advanced state-of-the-art model, or a bold claim that yesterday’s groundbreaking architecture is now obsolete. For the developers and engineers tasked with building their first AI application, especially within the complex ecosystem of a large enterprise, this relentless barrage of announcements can create a profound paralysis of choice. The fear of building on today’s model only to find it considered legacy code before it even reaches production is a palpable concern. This anxiety is compounded by the narrative that if an organization is not already deploying fully autonomous agentic systems that plan, reason, and execute complex workflows, it is already lagging dangerously behind. The reality, however, has very little to do with the fluctuating rankings of a public chatbot arena. It is instead rooted in the unglamorous yet essential work of data engineering, governance, and integration as AI transitions from a phase of magical thinking into one of industrialization. The true challenge is not picking the “smartest” model but building a robust system that can withstand the complexities and imperfections of the real world.

1. Sidestepping the Leaderboard Illusion

It is remarkably easy to get caught up in the “Leaderboard Illusion,” where a model scoring a marginal 1% higher on a niche math benchmark is suddenly perceived as the only viable choice for all applications. This approach, often described as “vibes-based evaluation,” serves as a decent proxy for which chatbot feels more intelligent in a casual conversation but is an exceptionally poor method for selecting a foundation for a production workload. The industry must move beyond viewing AI through the lens of past software wars, where one dominant platform was expected to capture the entire market. For the vast majority of enterprise tasks, perhaps as many as 90%, the level of intelligence offered by leading models from providers like Anthropic, OpenAI, or even open-weights alternatives like Llama is more than sufficient. The practical differences are often marginal, especially for an initial product version. In this context, the “best” model is frequently the one an organization can actually access securely, reliably, and in compliance with its existing data policies.

The weights and architectures of foundational models are rapidly becoming a form of undifferentiated heavy lifting—boring but essential infrastructure that everyone needs but no one particularly wants to manage themselves. As AI luminary Andrew Ng has advised, organizations should worry much more about building something valuable than about having the absolute top-ranked model. The real, defensible value resides at the application layer, not the model layer. If a tool is built that solves a genuine business problem, such as automatically reconciling invoices, summarizing complex legal briefs, or streamlining customer support inquiries, neither the developers nor the end-users will care whether the underlying model is ranked first or third on a public leaderboard that will change again next week. The physics of AI are fundamentally different from traditional software. In the open-source world, the code is the asset. In the AI world, the model is a transient commodity; the enduring asset is the organization’s proprietary data and the sophisticated pipelines built to feed it to that commodity model.

2. Thinking Like a Database Administrator

Once a model has been chosen, the temptation is to immediately leap toward building a sophisticated “agent” capable of browsing the web, querying databases, and making autonomous decisions. This ambition, while understandable, is often premature. Most enterprises are not yet ready for agents, not because the AI isn’t intelligent enough or because their developers lack experience, but because their data is not clean, structured, or governed enough to support such a system. An AI agent’s memory is, at its core, a database problem. If that agent is stripped of its memory, it becomes little more than a very expensive and sophisticated random number generator, producing outputs that lack context and consistency. Agents operate at machine speed but are fed with human-generated data. If that data is messy, inconsistent, or lacks proper governance, the agent will simply be confidently wrong at a massive and potentially disastrous scale. This is the critical bottleneck that must be addressed before any advanced AI initiatives can succeed.

Most enterprises are still in the foundational stages of figuring out where all their critical data lives, let alone how to safely and effectively expose it to a large language model. There is a tendency to treat the concept of memory in AI as a magical, infinitely expanding context window that somehow intuits the right information. It is not. It is a database, and it demands the same level of rigor, discipline, and architectural planning that is applied to mission-critical transaction logs and financial systems. This includes establishing clear schemas, implementing granular access controls, and building robust firewalls that prevent the AI from hallucinating incorrect facts or, even worse, leaking sensitive information to an unauthorized user. When designing a first AI system, the process must begin with the memory layer. An organization must first decide precisely what the AI is allowed to know, where that knowledge physically and logically resides, and what the processes are for updating and maintaining it. Only after this data foundation is firmly in place should attention turn to crafting the perfect prompt.

3. Beginning with Inference, Not Abstraction

Historically, the discourse around AI was dominated by the massive, prohibitive costs associated with training foundational models. For the modern enterprise, however, this is largely an irrelevant concern. The focus has shifted decisively to inference, which is the practical application of a pre-trained model’s knowledge to power real-world applications. AI will deliver tangible business value as organizations learn to apply these powerful models to their own governed, proprietary data sets. Consequently, the best place to begin building institutional AI muscle is not with a moonshot agentic system but with a simple, practical retrieval-augmented generation (RAG) pipeline. This approach grounds the project in immediate business reality and forces the team to confront the most pressing challenges head-on. By starting with a focused and achievable goal, the organization can develop the core competencies required for more ambitious projects in the future while delivering value in the short term.

In practice, this means identifying a corpus of boring, messy, yet valuable internal documents—such as HR policies, technical documentation, or historical customer support logs—and building a system that allows a user to ask a question and receive an answer based exclusively on that data. This seemingly simple task forces a team to solve the hard problems that actually create a competitive moat for the company. These challenges include mastering data ingestion, which involves figuring out how to properly chunk and index various document formats like PDFs so the model can understand their content and structure. It requires establishing robust governance to ensure the model does not answer questions a user is not authorized to ask, thereby protecting sensitive information. Finally, it necessitates a focus on latency, as the system must be fast enough for people to actually use it in their daily workflows. This is the essential plumbing that makes advanced AI possible and useful within an enterprise setting.

4. Creating a Golden Path for Development

For platform engineering teams, the initial instinct when introducing a powerful new technology like AI might be to lock it down tightly. This often involves picking one model and one API and forcing every developer within the organization to use that single, prescribed stack. This approach, however, is a strategic mistake. Platform teams should not position themselves as the “Department of No.” When overly restrictive gates are built, developers, driven by the need to innovate and solve problems, will simply find ways to route around them, often using personal credit cards and unmonitored public APIs. This shadow IT creates significant security risks, compliance issues, and unforeseen costs, ultimately undermining the very control the platform team sought to establish. A far more effective strategy is to build a “golden path” that channels this innovative energy productively rather than attempting to stifle it.

This golden path consists of creating a set of composable services, standardized templates, and automated guardrails that make the right way to build AI applications also the easiest and fastest way. Instead of mandating a specific model, a team can standardize on a flexible interface, such as the widely supported OpenAI-compatible API format, which allows the back-end model to be swapped out later as technology evolves or business needs change. The goal is to provide developers with a safe, compliant sandbox where data governance, security, and logging are baked in, allowing them to experiment and build rapidly without introducing serious risk to the organization. A critical component of this golden path is designing applications that keep a human in the loop. The AI should be used to generate a first draft of a report or a first pass at a complex SQL query, but a human expert must be required to review, validate, and ultimately execute the final action. This mitigates the risk of hallucinations and ensures the technology augments human intelligence rather than replacing it with unreliable robot drivel.

5. Establishing a Bespoke Evaluation Framework

If an organization commits to ignoring the public rankings and the endless cycle of AI hype, how can it determine if its chosen model is actually good enough for its specific needs? The answer is not to guess but to test systematically. Both OpenAI and Anthropic have long emphasized the importance of “eval-driven development,” but a team does not need a complex, expensive framework to get started. The process can begin with a simple yet powerful asset: a curated set of 50 to 100 real-world examples that are directly representative of the tasks the model is expected to perform. This evaluation set should consist of specific questions paired with their known correct answers, drawn from the actual business domain in which the AI will operate. This internal benchmark becomes the definitive source of truth for model performance, tailored precisely to the organization’s unique requirements and data. It shifts the focus from abstract capabilities to concrete, measurable business outcomes.

This curated set of examples forms the basis of a repeatable, objective evaluation process. Whenever a new model is released that promises to revolutionize the industry and top the leaderboards, the team can simply run its 50 to 100 examples against the new model’s API. The decision of whether to switch from the existing model becomes a straightforward, data-driven analysis. If the new model solves the organization’s specific problems more accurately, faster, or at a lower cost, then a migration may be justified. If it does not offer a significant improvement on these practical metrics, then the announcement can be safely ignored as irrelevant noise. This disciplined approach ensures that technology adoption is guided by tangible business value rather than marketing hype. Ultimately, the organization’s own leaderboard, based on its own data and its own use cases, is the only one that truly matters.

A Legacy of Pragmatic Innovation

In the end, the history of enterprise AI adoption was not written by the organizations that frantically chased the fleeting occupant of the top spot on a public leaderboard. Instead, success was found by those who focused on their own data, established robust governance, and committed to solving specific, often boring, problems for users within their company who were drowning in documentation or mired in repetitive tasks. These organizations understood that the real, sustainable advantage in the AI era was forged by making intelligence on top of governed data cheap, easy, and safe to use. This pragmatic approach may not have generated a viral thread on social media, but it resulted in the creation of applications that delivered real value and survived the harsh realities of the enterprise environment.